Motivated by a class of nonlinear imaging inverse problems, for instance, multispectral computed tomography (MSCT), this paper studies the convergence theory of the nonlinear Kaczmarz method (NKM) for solving the system of nonlinear equations with component-wise convex mapping, namely, the function corresponding to each equation being convex. However, such kind of nonlinear mapping may not satisfy the commonly used component-wise tangential cone condition (TCC). For this purpose, we propose a novel condition named relative gradient discrepancy condition (RGDC), and make use of it to prove the convergence and even the convergence rate of the NKM with several general index selection strategies, where these strategies include cyclic strategy and maximum residual strategy. Particularly, we investigate the application of the NKM for solving nonlinear systems in MSCT image reconstruction. We prove that the nonlinear mapping in this context fulfills the proposed RGDC rather than the component-wise TCC, and provide a global convergence of the NKM based on the previously obtained results. Numerical experiments further illustrate the numerical convergence of the NKM for MSCT image reconstruction.
This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios. BLaIR is trained to learn correlations between item metadata and potential natural language context, which is useful for retrieving and recommending items. To pretrain BLaIR, we collect Amazon Reviews 2023, a new dataset comprising over 570 million reviews and 48 million items from 33 categories, significantly expanding beyond the scope of previous versions. We evaluate the generalization ability of BLaIR across multiple domains and tasks, including a new task named complex product search, referring to retrieving relevant items given long, complex natural language contexts. Leveraging large language models like ChatGPT, we correspondingly construct a semi-synthetic evaluation set, Amazon-C4. Empirical results on the new task, as well as conventional retrieval and recommendation tasks, demonstrate that BLaIR exhibit strong text and item representation capacity. Our datasets, code, and checkpoints are available at: //github.com/hyp1231/AmazonReviews2023.
This paper provides rigorous error bounds for physics-informed neural networks approximating the semilinear wave equation. We provide bounds for the generalization and training error in terms of the width of the network's layers and the number of training points for a tanh neural network with two hidden layers. Our main result is a bound of the total error in the $H^1([0,T];L^2(\Omega))$-norm in terms of the training error and the number of training points, which can be made arbitrarily small under some assumptions. We illustrate our theoretical bounds with numerical experiments.
Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Code and models will be released soon.
Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of "closed missingness mechanisms" and show that under these mechanisms an available case analysis is admissible for consistent estimation for any type of statistical and causal query, even if the underlying missingness mechanism is of missing not at random (MNAR) type. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses are possibly the first to show the applicability of missingness DAGs (m-DAGs) to complex longitudinal real-world data, while highlighting the sensitivity with respect to the assumed causal model.
This paper investigates the convergence properties and applications of the three-operator splitting method, also known as Davis-Yin splitting (DYS) method, integrated with extrapolation and Plug-and-Play (PnP) denoiser within a nonconvex framework. We first propose an extrapolated DYS method to effectively solve a class of structural nonconvex optimization problems that involve minimizing the sum of three possible nonconvex functions. Our approach provides an algorithmic framework that encompasses both extrapolated forward-backward splitting and extrapolated Douglas-Rachford splitting methods.To establish the convergence of the proposed method, we rigorously analyze its behavior based on the Kurdyka-{\L}ojasiewicz property, subject to some tight parameter conditions. Moreover, we introduce two extrapolated PnP-DYS methods with convergence guarantee, where the traditional regularization prior is replaced by a gradient step-based denoiser. This denoiser is designed using a differentiable neural network and can be reformulated as the proximal operator of a specific nonconvex functional. We conduct extensive experiments on image deblurring and image super-resolution problems, where our results showcase the advantage of the extrapolation strategy and the superior performance of the learning-based model that incorporates the PnP denoiser in terms of achieving high-quality recovery images.
In this paper, the problem of joint transmission and computation resource allocation for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. In the considered model, the base station (BS) needs to transmit a large amount of data to multiple users with RSMA. Due to limited communication resources, the BS is required to utilize semantic communication techniques to compress the large-sized data. The semantic communication is enabled by shared probability graphs between the BS and the users. The probability graph can be used to further compress the transmission data at the BS, while the received compressed semantic information can be recovered through using the same shared probability graph at each user side. The semantic information compression progress consumes additional computation power at the BS, which inevitably decreases the transmission power due to limited total power budget. Considering both the effect of semantic compression ratio and computation power, the semantic rate expression for RSMA is first obtained. Then, based on the obtained rate expression, an optimization problem is formulated with the aim of maximizing the sum of semantic rates of all users under total power, semantic compression ratio, and rate allocation constraints. To tackle this problem, an iterative algorithm is proposed, where the rate allocation and transmit beamforming design subproblem is solved using a successive convex approximation method, and the semantic compression ratio subproblem is addressed using a greedy algorithm. Numerical results validate the effectiveness of the proposed scheme.
In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effective to enhance the system performance. Among the evaluating techniques, the narrow frequency bands presents a significant impact. Indeed, our proposed model, which focuses on the narrow frequency bands, outperforms the DCASE baseline on the benchmark dataset of DCASE 2022 Task 2 Development set. The important role of the narrow frequency bands indicated in this paper inspires the research community on the task of Acoustic Anomaly Detection of Machines to further investigate and propose novel network architectures focusing on the frequency bands.
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
We address the task of automatically scoring the competency of candidates based on textual features, from the automatic speech recognition (ASR) transcriptions in the asynchronous video job interview (AVI). The key challenge is how to construct the dependency relation between questions and answers, and conduct the semantic level interaction for each question-answer (QA) pair. However, most of the recent studies in AVI focus on how to represent questions and answers better, but ignore the dependency information and interaction between them, which is critical for QA evaluation. In this work, we propose a Hierarchical Reasoning Graph Neural Network (HRGNN) for the automatic assessment of question-answer pairs. Specifically, we construct a sentence-level relational graph neural network to capture the dependency information of sentences in or between the question and the answer. Based on these graphs, we employ a semantic-level reasoning graph attention network to model the interaction states of the current QA session. Finally, we propose a gated recurrent unit encoder to represent the temporal question-answer pairs for the final prediction. Empirical results conducted on CHNAT (a real-world dataset) validate that our proposed model significantly outperforms text-matching based benchmark models. Ablation studies and experimental results with 10 random seeds also show the effectiveness and stability of our models.
Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in an issue-specific manner. We discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog system modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model in both pipeline and end-to-end models. We also review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey can shed a light on future research in task-oriented dialog systems.