It is well known that the class of rotation invariant algorithms are suboptimal even for learning sparse linear problems when the number of examples is below the "dimension" of the problem. This class includes any gradient descent trained neural net with a fully-connected input layer (initialized with a rotationally symmetric distribution). The simplest sparse problem is learning a single feature out of $d$ features. In that case the classification error or regression loss grows with $1-k/n$ where $k$ is the number of examples seen. These lower bounds become vacuous when the number of examples $k$ reaches the dimension $d$. We show that when noise is added to this sparse linear problem, rotation invariant algorithms are still suboptimal after seeing $d$ or more examples. We prove this via a lower bound for the Bayes optimal algorithm on a rotationally symmetrized problem. We then prove much lower upper bounds on the same problem for simple non-rotation invariant algorithms. Finally we analyze the gradient flow trajectories of many standard optimization algorithms in some simple cases and show how they veer toward or away from the sparse targets. We believe that our trajectory categorization will be useful in designing algorithms that can exploit sparse targets and our method for proving lower bounds will be crucial for analyzing other families of algorithms that admit different classes of invariances.
The rise of machine learning has fueled the discovery of new materials and, especially, metamaterials--truss lattices being their most prominent class. While their tailorable properties have been explored extensively, the design of truss-based metamaterials has remained highly limited and often heuristic, due to the vast, discrete design space and the lack of a comprehensive parameterization. We here present a graph-based deep learning generative framework, which combines a variational autoencoder and a property predictor, to construct a reduced, continuous latent representation covering an enormous range of trusses. This unified latent space allows for the fast generation of new designs through simple operations (e.g., traversing the latent space or interpolating between structures). We further demonstrate an optimization framework for the inverse design of trusses with customized mechanical properties in both the linear and nonlinear regimes, including designs exhibiting exceptionally stiff, auxetic, pentamode-like, and tailored nonlinear behaviors. This generative model can predict manufacturable (and counter-intuitive) designs with extreme target properties beyond the training domain.
Motivated by applications to group synchronization and quadratic assignment on random data, we study a general problem of Bayesian inference of an unknown ``signal'' belonging to a high-dimensional compact group, given noisy pairwise observations of a featurization of this signal. We establish a quantitative comparison between the signal-observation mutual information in any such problem with that in a simpler model with linear observations, using interpolation methods. For group synchronization, our result proves a replica formula for the asymptotic mutual information and Bayes-optimal mean-squared-error. Via analyses of this replica formula, we show that the conjectural phase transition threshold for computationally-efficient weak recovery of the signal is determined by a classification of the real-irreducible components of the observed group representation(s), and we fully characterize the information-theoretic limits of estimation in the example of angular/phase synchronization over $SO(2)$/$U(1)$. For quadratic assignment, we study observations given by a kernel matrix of pairwise similarities and a randomly permutated and noisy counterpart, and we show in a bounded signal-to-noise regime that the asymptotic mutual information coincides with that in a Bayesian spiked model with i.i.d. signal prior.
Generalization to new samples is a fundamental rationale for statistical modeling. For this purpose, model validation is particularly important, but recent work in survey inference has suggested that simple aggregation of individual prediction scores does not give a good measure of the score for population aggregate estimates. In this manuscript we explain why this occurs, propose two scoring metrics designed specifically for this problem, and demonstrate their use in three different ways. We show that these scoring metrics correctly order models when compared to the true score, although they do underestimate the magnitude of the score. We demonstrate with a problem in survey research, where multilevel regression and poststratification (MRP) has been used extensively to adjust convenience and low-response surveys to make population and subpopulation estimates.
Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm$^3$ on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a "universal MLFF" capable of simulating properties of common organic liquids.
Determining the complexity of computing Gr\"{o}bner bases is an important problem both in theory and in practice, and for that the solving degree plays a key role. In this paper, we study the solving degrees of affine semi-regular sequences and their homogenized sequences. Some of our results are considered to give mathematically rigorous proofs of the correctness of methods for computing Gr\"{o}bner bases of the ideal generated by an affine semi-regular sequence. This paper is a sequel of the authors' previous work and gives additional results on the solving degrees and important behaviors of Gr\"obner basis computation.
This work presents a comparative review and classification between some well-known thermodynamically consistent models of hydrogel behavior in a large deformation setting, specifically focusing on solvent absorption/desorption and its impact on mechanical deformation and network swelling. The proposed discussion addresses formulation aspects, general mathematical classification of the governing equations, and numerical implementation issues based on the finite element method. The theories are presented in a unified framework demonstrating that, despite not being evident in some cases, all of them follow equivalent thermodynamic arguments. A detailed numerical analysis is carried out where Taylor-Hood elements are employed in the spatial discretization to satisfy the inf-sup condition and to prevent spurious numerical oscillations. The resulting discrete problems are solved using the FEniCS platform through consistent variational formulations, employing both monolithic and staggered approaches. We conduct benchmark tests on various hydrogel structures, demonstrating that major differences arise from the chosen volumetric response of the hydrogel. The significance of this choice is frequently underestimated in the state-of-the-art literature but has been shown to have substantial implications on the resulting hydrogel behavior.
The continental plates of Earth are known to drift over a geophysical timescale, and their interactions have lead to some of the most spectacular geoformations of our planet while also causing natural disasters such as earthquakes and volcanic activity. Understanding the dynamics of interacting continental plates is thus significant. In this work, we present a fluid mechanical investigation of the plate motion, interaction, and dynamics. Through numerical experiments, we examine the coupling between a convective fluid and plates floating on top of it. With physical modeling, we show the coupling is both mechanical and thermal, leading to the thermal blanket effect: the floating plate is not only transported by the fluid flow beneath, it also prevents the heat from leaving the fluid, leading to a convective flow that further affects the plate motion. By adding several plates to such a coupled fluid-structure interaction, we also investigate how floating plates interact with each other and show that, under proper conditions, small plates can converge into a supercontinent.
We propose a method for obtaining parsimonious decompositions of networks into higher order interactions which can take the form of arbitrary motifs.The method is based on a class of analytically solvable generative models, where vertices are connected via explicit copies of motifs, which in combination with non-parametric priors allow us to infer higher order interactions from dyadic graph data without any prior knowledge on the types or frequencies of such interactions. Crucially, we also consider 'degree--corrected' models that correctly reflect the degree distribution of the network and consequently prove to be a better fit for many real world--networks compared to non-degree corrected models. We test the presented approach on simulated data for which we recover the set of underlying higher order interactions to a high degree of accuracy. For empirical networks the method identifies concise sets of atomic subgraphs from within thousands of candidates that cover a large fraction of edges and include higher order interactions of known structural and functional significance. The method not only produces an explicit higher order representation of the network but also a fit of the network to analytically tractable models opening new avenues for the systematic study of higher order network structures.
The beneficial role of noise-injection in learning is a consolidated concept in the field of artificial neural networks, suggesting that even biological systems might take advantage of similar mechanisms to optimize their performance. The training-with-noise algorithm proposed by Gardner and collaborators is an emblematic example of a noise-injection procedure in recurrent networks, which can be used to model biological neural systems. We show how adding structure to noisy training data can substantially improve the algorithm performance, allowing the network to approach perfect retrieval of the memories and wide basins of attraction, even in the scenario of maximal injected noise. We also prove that the so-called Hebbian Unlearning rule coincides with the training-with-noise algorithm when noise is maximal and data are stable fixed points of the network dynamics.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.