Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
A non-linear complex system governed by multi-spatial and multi-temporal physics scales cannot be fully understood with a single diagnostic, as each provides only a partial view and much information is lost during data extraction. Combining multiple diagnostics also results in imperfect projections of the system's physics. By identifying hidden inter-correlations between diagnostics, we can leverage mutual support to fill in these gaps, but uncovering these inter-correlations analytically is too complex. We introduce a groundbreaking machine learning methodology to address this issue. Our multimodal approach generates super resolution data encompassing multiple physics phenomena, capturing detailed structural evolution and responses to perturbations previously unobservable. This methodology addresses a critical problem in fusion plasmas: the Edge Localized Mode (ELM), a plasma instability that can severely damage reactor walls. One method to stabilize ELM is using resonant magnetic perturbation to trigger magnetic islands. However, low spatial and temporal resolution of measurements limits the analysis of these magnetic islands due to their small size, rapid dynamics, and complex interactions within the plasma. With super-resolution diagnostics, we can experimentally verify theoretical models of magnetic islands for the first time, providing unprecedented insights into their role in ELM stabilization. This advancement aids in developing effective ELM suppression strategies for future fusion reactors like ITER and has broader applications, potentially revolutionizing diagnostics in fields such as astronomy, astrophysics, and medical imaging.
The ability to perform complex tasks from detailed instructions is a key to many remarkable achievements of our species. As humans, we are not only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs (either text or vision), narrowing the scope of multimodal assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap between these models' ability to precisely follow instructions with that of humans.The code of iWISDM is available on GitHub at //github.com/BashivanLab/iWISDM.
In recent years, deep learning has gained increasing popularity in the fields of Partial Differential Equations (PDEs) and Reduced Order Modeling (ROM), providing domain practitioners with new powerful data-driven techniques such as Physics-Informed Neural Networks (PINNs), Neural Operators, Deep Operator Networks (DeepONets) and Deep-Learning based ROMs (DL-ROMs). In this context, deep autoencoders based on Convolutional Neural Networks (CNNs) have proven extremely effective, outperforming established techniques, such as the reduced basis method, when dealing with complex nonlinear problems. However, despite the empirical success of CNN-based autoencoders, there are only a few theoretical results supporting these architectures, usually stated in the form of universal approximation theorems. In particular, although the existing literature provides users with guidelines for designing convolutional autoencoders, the subsequent challenge of learning the latent features has been barely investigated. Furthermore, many practical questions remain unanswered, e.g., the number of snapshots needed for convergence or the neural network training strategy. In this work, using recent techniques from sparse high-dimensional function approximation, we fill some of these gaps by providing a new practical existence theorem for CNN-based autoencoders when the parameter-to-solution map is holomorphic. This regularity assumption arises in many relevant classes of parametric PDEs, such as the parametric diffusion equation, for which we discuss an explicit application of our general theory.
Neural networks posses the crucial ability to generate meaningful representations of task-dependent features. Indeed, with appropriate scaling, supervised learning in neural networks can result in strong, task-dependent feature learning. However, the nature of the emergent representations, which we call the `coding scheme', is still unclear. To understand the emergent coding scheme, we investigate fully-connected, wide neural networks learning classification tasks using the Bayesian framework where learning shapes the posterior distribution of the network weights. Consistent with previous findings, our analysis of the feature learning regime (also known as `non-lazy', `rich', or `mean-field' regime) shows that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity. In linear networks, an analog coding scheme of the task emerges. Despite the strong representations, the mean predictor is identical to the lazy case. In nonlinear networks, spontaneous symmetry breaking leads to either redundant or sparse coding schemes. Our findings highlight how network properties such as scaling of weights and neuronal nonlinearity can profoundly influence the emergent representations.
In this paper, we combine convolutional neural networks (CNNs) with reduced order modeling (ROM) for efficient simulations of multiscale problems. These problems are modeled by partial differential equations with high-dimensional random inputs. The proposed method involves two separate CNNs: Basis CNNs and Coefficient CNNs (Coef CNNs), which correspond to two main parts of ROM. The method is called CNN-based ROM. The former one learns input-specific basis functions from the snapshots of fine-scale solutions. An activation function, inspired by Galerkin projection, is utilized at the output layer to reconstruct fine-scale solutions from the basis functions. Numerical results show that the basis functions learned by the Basis CNNs resemble data, which help to significantly reduce the number of the basis functions. Moreover, CNN-based ROM is less sensitive to data fluctuation caused by numerical errors than traditional ROM. Since the tests of Basis CNNs still need fine-scale stiffness matrix and load vector, it can not be directly applied to nonlinear problems. The Coef CNNs can be applied to nonlinear problems and designed to determine the coefficients for linear combination of basis functions. In addition, two applications of CNN-based ROM are presented, including predicting MsFEM basis functions within oversampling regions and building accurate surrogates for inverse problems.
Various methods have emerged for conducting mediation analyses with multiple correlated mediators, each with distinct strengths and limitations. However, a comparative evaluation of these methods is lacking, providing the motivation for this paper. This study examines six mediation analysis methods for multiple correlated mediators that provide insights to the contributors for health disparities. We assessed the performance of each method in identifying joint or path-specific mediation effects in the context of binary outcome variables varying mediator types and levels of residual correlation between mediators. Through comprehensive simulations, the performance of six methods in estimating joint and/or path-specific mediation effects was assessed rigorously using a variety of metrics including bias, mean squared error, coverage and width of the 95$\%$ confidence intervals. Subsequently, these methods were applied to the REasons for Geographic And Racial Differences in Stroke (REGARDS) study, where differing conclusions were obtained depending on the mediation method employed. This evaluation provides valuable guidance for researchers grappling with complex multi-mediator scenarios, enabling them to select an optimal mediation method for their research question and dataset.
Structural identifiability is an important property of parametric ODE models. When conducting an experiment and inferring the parameter value from the time-series data, we want to know if the value is globally, locally, or non-identifiable. Global identifiability of the parameter indicates that there exists only one possible solution to the inference problem, local identifiability suggests that there could be several (but finitely many) possibilities, while non-identifiability implies that there are infinitely many possibilities for the value. Having this information is useful since, one would, for example, only perform inferences for the parameters which are identifiable. Given the current significance and widespread research conducted in this area, we decided to create a database of linear compartment models and their identifiability results. This facilitates the process of checking theorems and conjectures and drawing conclusions on identifiability. By only storing models up to symmetries and isomorphisms, we optimize memory efficiency and reduce query time. We conclude by applying our database to real problems. We tested a conjecture about deleting one leak of the model states in the paper 'Linear compartmental models: Input-output equations and operations that preserve identifiability' by E. Gross et al., and managed to produce a counterexample. We also compute some interesting statistics related to the identifiability of linear compartment model parameters.
Fourier partial sum approximations yield exponential accuracy for smooth and periodic functions, but produce the infamous Gibbs phenomenon for non-periodic ones. Spectral reprojection resolves the Gibbs phenomenon by projecting the Fourier partial sum onto a Gibbs complementary basis, often prescribed as the Gegenbauer polynomials. Noise in the Fourier data and the Runge phenomenon both degrade the quality of the Gegenbauer reconstruction solution, however. Motivated by its theoretical convergence properties, this paper proposes a new Bayesian framework for spectral reprojection, which allows a greater understanding of the impact of noise on the reprojection method from a statistical point of view. We are also able to improve the robustness with respect to the Gegenbauer polynomials parameters. Finally, the framework provides a mechanism to quantify the uncertainty of the solution estimate.
Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, the trade-off between accuracy and computational demand leaves much space for improvement. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, even for multimodal posteriors, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature and on a biological model of the translation kinetics after mRNA transfection.
This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments.