Transition metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and non-toxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have optimal target absorption energies in the visible region as well as well-defined ground states. Machine learning (ML) accelerated discovery could overcome such challenges by enabling screening of a larger space, but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of Jacobs ladder. To accelerate the discovery of complexes with absorption energies in the visible region while minimizing MR character, we use 2D efficient global optimization to sample candidate low-spin chromophores from multi-million complex spaces. Despite the scarcity (i.e., approx. 0.01\%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., > 10\%) of computational validation as the ML models improve during active learning, representing a 1,000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.
Evolutionary algorithms (EAs) have gained attention recently due to their success in neural architecture search (NAS). However, whereas traditional EAs draw much power from crossover operations, most evolutionary NAS methods deploy only mutation operators. The main reason is the permutation problem: The mapping between genotype and phenotype in traditional graph representations is many-to-one, leading to a disruptive effect of standard crossover. This work conducts the first theoretical analysis of the behaviors of crossover and mutation in the NAS context, and proposes a new crossover operator based on the shortest edit path (SEP) in graph space. The SEP crossover is shown to overcome the permutation problem, and as a result, offspring generated by the SEP crossover is theoretically proved to have a better expected improvement in terms of graph edit distance to global optimum, compared to mutation and standard crossover. Experiments further show that the SEP crossover significantly outperforms mutation and standard crossover on three state-of-the-art NAS benchmarks. The SEP crossover therefore allows taking full advantage of evolution in NAS, and potentially other similar design problems as well.
The utilization of renewable energy technologies, particularly hydrogen, has seen a boom in interest and has spread throughout the world. Ethanol steam reformation is one of the primary methods capable of producing hydrogen efficiently and reliably. This paper provides an in-depth study of the reformulated system both theoretically and numerically, as well as a plan to explore the possibility of converting the system into its conservation form. Lastly, we offer an overview of several numerical approaches for solving the general first-order quasi-linear hyperbolic equation to the particular model for ethanol steam reforming (ESR). We conclude by presenting some results that would enable the usage of these ODE/PDE solvers to be used in non-linear model predictive control (NMPC) algorithms and discuss the limitations of our approach and directions for future work.
Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the models typically used in statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically-based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.
Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.
Understanding structure-property relations is essential to optimally design materials for specific applications. Two-scale simulations are often employed to analyze the effect of the microstructure on a component's macroscopic properties. However, they are typically computationally expensive and infeasible in multi-query contexts such as optimization and material design. To make such analyses amenable, the microscopic simulations can be replaced by surrogate models that must be able to handle a wide range of microstructural parameters. This work focuses on extending the methodology of a previous work, where an accurate surrogate model was constructed for microstructures under varying loading and material parameters using proper orthogonal decomposition and Gaussian process regression, to treat geometrical parameters. To this end, a method that transforms different geometries onto a parent domain is presented. We propose to solve an auxiliary problem based on linear elasticity to obtain the geometrical transformations. Using these transformations, combined with the nonlinear microscopic problem, we derive a fast-to-evaluate surrogate model with the following key features: (1) the predictions of the effective quantities are independent of the auxiliary problem, (2) the predicted stress fields fulfill the microscopic balance laws and are periodic, (3) the method is non-intrusive, (4) the stress field for all geometries can be recovered, and (5) the sensitivities are available and can be readily used for optimization and material design. The proposed methodology is tested on several composite microstructures, where rotations and large variations in the shape of inclusions are considered. Finally, a two-scale example is shown, where the surrogate model achieves a high accuracy and significant speed up, demonstrating its potential in two-scale shape optimization and material design problems.
We propose a stochastic first-order trust-region method with inexact function and gradient evaluations for solving finite-sum minimization problems. Using a suitable reformulation of the given problem, our method combines the inexact restoration approach for constrained optimization with the trust-region procedure and random models. Differently from other recent stochastic trust-region schemes, our proposed algorithm improves feasibility and optimality in a modular way. We provide the expected number of iterations for reaching a near-stationary point by imposing some probability accuracy requirements on random functions and gradients which are, in general, less stringent than the corresponding ones in literature. We validate the proposed algorithm on some nonconvex optimization problems arising in binary classification and regression, showing that it performs well in terms of cost and accuracy, and allows to reduce the burdensome tuning of the hyper-parameters involved.
We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ``double sparsity robustness'': either the approximation to the regression or the approximation to the representer can be ``completely dense'' as long as the other is sufficiently ``sparse''. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters.
In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to auxiliary information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.