The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling. With silver (Ag) NPs as a representative example, we demonstrate that the Bayesian optimizer implemented with the early stopping criterion can efficiently produce Ag NPs precisely possessing the desired absorption spectra within only 200 iterations (when optimizing among five synthetic reagents). In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis. The amount of citrate is a key to controlling the competitions between spherical and plate-shaped NPs and, as a result, affects the shapes of the absorption spectra as well. Our study highlights both capabilities of the platform to enhance search efficiencies and to provide a novel chemical knowledge by analyzing datasets accumulated from the autonomous experimentations.
We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
Sequential transfer optimization (STO), which aims to improve the optimization performance on a task of interest by exploiting the knowledge captured from several previously-solved optimization tasks stored in a database, has been gaining increasing research attention over the years. However, despite the remarkable advances in algorithm design, the development of a systematic benchmark suite for comprehensive comparisons of STO algorithms received far less attention. Existing test problems are either simply generated by assembling other benchmark functions or extended from specific practical problems with limited scalability. The relationships between the optimal solutions of the source and target tasks in these problems are also often manually configured, limiting their ability to model different similarity relationships presented in real-world problems. Consequently, the good performance achieved by an algorithm on these problems might be biased and hard to be generalized to other problems. In light of the above, in this study, we first introduce four concepts for characterizing STO problems and present an important problem feature, namely similarity distribution, which quantitatively delineates the relationship between the optima of the source and target tasks. Then, we present the general design guidelines of STO problems and a particular STO problem generator with good scalability. Specifically, the similarity distribution of a problem can be easily customized, enabling a continuous spectrum of representation of the diverse similarity relationships of real-world problems. Lastly, a benchmark suite with 12 STO problems featured by a variety of customized similarity relationships is developed using the proposed generator. The source code of the problem generator is available at //github.com/XmingHsueh/STOP-G.
Byzantine agreement tasks have been studied extensively through many diverse frameworks ranging from epistemic modal logic to combinatorial, topological and even game theoretical approaches. Among byzantine agreement tasks, firing rebels with relay is of particular interest, since it is a basic primitive necessary for solving more complex tasks such as Consistent Broadcast. The epistemic logic approach has yielded both necessary conditions and sufficient knowledge conditions for solving firing rebels with relay. However, these conditions are stated in terms of knowledge and in principle do not explore the conditions on the communication structure which is often assumed to be complete. That is, any process is assumed to be capable of communicating with any other process at any time. In this paper, we characterize byzantine firing rebels solvability with and without relay in terms of the communication structure of the system. We define a relation between asynchronous message schedules and directed graph sequences, which we call network abstractions. This allows us to capture the message relay capabilities of the system into a combinatorial object. Although there are some similarities between network abstractions and causal cones, there is a fundamental difference. Namely, causal cones allow only to look at events in the past, while network abstractions allow us to reason about future possibilities. Thus enabling us to reason about liveness properties, which can not be expressed by looking only at past events. Furthermore, we formalize our characterization by using a temporal epistemic logic for byzantine systems. Such formulation constitutes the necessary and sufficient a-priori knowledge regarding network connectivity for solving firing rebels with relay.
Bayesian hypothesis testing leverages posterior probabilities, Bayes factors, or credible intervals to assess characteristics that summarize data. We propose a framework for power curve approximation with such hypothesis tests that assumes data are generated using statistical models with fixed parameters for the purposes of sample size determination. We present a fast approach to explore the sampling distribution of posterior probabilities when the conditions for the Bernstein-von Mises theorem are satisfied. We extend that approach to facilitate targeted sampling from the approximate sampling distribution of posterior probabilities for each sample size explored. These sampling distributions are used to construct power curves for various types of posterior analyses. Our resulting method for power curve approximation is orders of magnitude faster than conventional power curve estimation for Bayesian hypothesis tests. We also prove the consistency of the corresponding power estimates and sample size recommendations under certain conditions.
Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.
Mesh optimization procedures are generally a combination of node smoothing and discrete operations which affect a small number of elements to improve the quality of the overall mesh. These procedures are useful as a post-processing step in mesh generation procedures and in applications such as fluid simulations with severely deforming domains. In order to perform high-order mesh optimization, these ingredients must also be extended to high-order (curved) meshes. In this work, we present a method to perform local element operations on curved meshes. The mesh operations discussed in this work are edge/face swaps, edge collapses, and edge splitting (more generally refinement) for triangular and tetrahedral meshes. These local operations are performed by first identifying the patch of elements which contain the edge/face being acted on, performing the operation as a straight-sided one by placing the high-order nodes via an isoparametric mapping from the master element, and smoothing the high-order nodes on the elements in the patch by minimizing a Jacobian-based high-order mesh distortion measure. Since the initial straight-sided guess from the placement of the nodes via the isoparametric mapping frequently results in invalid elements, the distortion measure must be regularized which allows for mesh untangling for the optimization to succeed. We present several examples in 2D and 3D to demonstrate these local operations and how they can be combined with a high-order node smoothing procedure to maintain mesh quality when faced with severe deformations.
Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while at the same time bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and Berry-Esseen-type bound for SGD.
Gaussian mixtures are widely used for approximating density functions in various applications such as density estimation, belief propagation, and Bayesian filtering. These applications often utilize Gaussian mixtures as initial approximations that are updated recursively. A key challenge in these recursive processes stems from the exponential increase in the mixture's order, resulting in intractable inference. To overcome the difficulty, the Gaussian mixture reduction (GMR), which approximates a high order Gaussian mixture by one with a lower order, can be used. Although existing clustering-based methods are known for their satisfactory performance and computational efficiency, their convergence properties and optimal targets remain unknown. In this paper, we propose a novel optimization-based GMR method based on composite transportation divergence (CTD). We develop a majorization-minimization algorithm for computing the reduced mixture and establish its theoretical convergence under general conditions. Furthermore, we demonstrate that many existing clustering-based methods are special cases of ours, effectively bridging the gap between optimization-based and clustering-based techniques. Our unified framework empowers users to select the most appropriate cost function in CTD to achieve superior performance in their specific applications. Through extensive empirical experiments, we demonstrate the efficiency and effectiveness of our proposed method, showcasing its potential in various domains.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.