This paper deals with asymptotic errors, limit theorems for errors between numerical and exact solutions of stochastic differential equation (SDE) driven by one-dimensional fractional Brownian motion (fBm). The Euler-Maruyama, higher-order Milstein, and Crank-Nicolson schemes are among the most studied numerical schemes for SDE (fSDE) driven by fBm. Most previous studies of asymptotic errors have derived specific asymptotic errors for these schemes as main theorems or their corollary. Even in the one-dimensional case, the asymptotic error was not determined for the Milstein or the Crank-Nicolson method when the Hurst exponent is less than or equal to $1/3$ with a drift term. We obtained a new evaluation method for convergence and asymptotic errors. This evaluation method improves the conditions under which we can prove convergence of the numerical scheme and obtain the asymptotic error under the same conditions. We have completely determined the asymptotic error of the Milstein method for arbitrary orders. In addition, we have newly determined the asymptotic error of the Crank-Nicolson method for $1/4<H\leq 1/3$.
This paper introduces a modeling approach that employs multi-level global processing, encompassing both short-term frame-level and long-term sample-level feature scales. In the initial stage of shallow feature extraction, various scales are employed to extract multi-level features, including Mel-Frequency Cepstral Coefficients (MFCC) and pre-Fbank log energy spectrum. The construction of the identification network model involves considering the input two-dimensional temporal features from both frame and sample levels. Specifically, the model initially employs one-dimensional convolution-based Convolutional Long Short-Term Memory (ConvLSTM) to fuse spatiotemporal information and extract short-term frame-level features. Subsequently, bidirectional long Short-Term Memory (BiLSTM) is utilized to learn long-term sample-level sequential representations. The transformer encoder then performs cross-scale, multi-level processing on global frame-level and sample-level features, facilitating deep feature representation and fusion at both levels. Finally, recognition results are obtained through Softmax. Our method achieves an impressive 99.6% recognition accuracy on the CCNU_Mobile dataset, exhibiting a notable improvement of 2% to 12% compared to the baseline system. Additionally, we thoroughly investigate the transferability of our model, achieving an 87.9% accuracy in a classification task on a new dataset.
This paper delves into the continuous post-training optimization methods for small language models, and proposes a continuous post-training alignment data construction method for small language models. The core of this method is based on the data guidance of large models, optimizing the diversity and accuracy of alignment data. In addition, to verify the effectiveness of the methods in this paper, we used Qwen2-0.5B-Instruct model as the baseline model for small language models, using the alignment dataset constructed by our proposed method, we trained and compared several groups of experiments, including SFT (Supervised Fine Tuning) post-training experiment and KTO (Kahneman Tversky optimization) post-training experiment, as well as SFT-KTO two-stage post-training experiment and model weight fusion experiment. Finally, we evaluated and analyzed the performance of post-training models, and confirmed that the continuous post-training optimization method proposed by us can significantly improve the performance of small language models.
Rational design of next-generation functional materials relied on quantitative predictions of their electronic structures beyond single building blocks. First-principles quantum mechanical (QM) modeling became infeasible as the size of a material grew beyond hundreds of atoms. In this study, we developed a new computational tool integrating fragment-based graph neural networks (FBGNN) into the fragment-based many-body expansion (MBE) theory, referred to as FBGNN-MBE, and demonstrated its capacity to reproduce full-dimensional potential energy surfaces (FD-PES) for hierarchic chemical systems with manageable accuracy, complexity, and interpretability. In particular, we divided the entire system into basic building blocks (fragments), evaluated their single-fragment energies using a first-principles QM model and attacked many-fragment interactions using the structure-property relationships trained by FBGNNs. Our development of FBGNN-MBE demonstrated the potential of a new framework integrating deep learning models into fragment-based QM methods, and marked a significant step towards computationally aided design of large functional materials.
The paper formalizes a version of parallel online directed acyclic graph (DAG) exploration, general enough to be readily mapped to many computational scenarios. In both the offline and online versions, vertices are weighted with the work units required for their processing, at least one parent must be completely processed before a child is processed, and at any given time only one processor can work on any given vertex. The online version has the following additional natural restriction: only after a vertex is processed, are its required work units or its children known. Using the Actor Model of parallel computation, it is shown that a natural class of parallel online algorithms meets a simple competitive ratio bound. We demonstrate and focus on the problem's occurrence in the scenario of energy landscape roadmapping or atlasing under pair-potentials, a highly compute-and-storage intensive modeling component integral to diverse applications involving soft-matter assembly. The method is experimentally validated using a C++ Actor Framework (CAF) software implementation built atop EASAL (Efficient Atlasing and Search of Assembly Landscapes), a substantial opensource software suite, running on multiple CPU cores of the HiperGator supercomputer, demonstrating linear speedup results.
Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision-making by uncovering relationships between different simulated systems and between a system's inputs and outputs. We focus on clustering multivariate empirical distributions of simulation outputs to identify patterns and trade-offs among performance measures. We present a novel agglomerative clustering algorithm that utilizes the regularized Wasserstein distance to cluster these multivariate empirical distributions. This framework has several important use cases, including anomaly detection, pre-optimization, and online monitoring. In numerical experiments involving a call-center model, we demonstrate how this methodology can identify staffing plans that yield similar performance outcomes and inform policies for intervening when queue lengths signal potentially worsening system performance.
This paper proposes a distributed pseudo-likelihood method (DPL) to conveniently identify the community structure of large-scale networks. Specifically, we first propose a block-wise splitting method to divide large-scale network data into several subnetworks and distribute them among multiple workers. For simplicity, we assume the classical stochastic block model. Then, the DPL algorithm is iteratively implemented for the distributed optimization of the sum of the local pseudo-likelihood functions. At each iteration, the worker updates its local community labels and communicates with the master. The master then broadcasts the combined estimator to each worker for the new iterative steps. Based on the distributed system, DPL significantly reduces the computational complexity of the traditional pseudo-likelihood method using a single machine. Furthermore, to ensure statistical accuracy, we theoretically discuss the requirements of the worker sample size. Moreover, we extend the DPL method to estimate degree-corrected stochastic block models. The superior performance of the proposed distributed algorithm is demonstrated through extensive numerical studies and real data analysis.
Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.