Gibbs sampling methods are standard tools to perform posterior inference for mixture models. These have been broadly classified into two categories: marginal and conditional methods. While conditional samplers are more widely applicable than marginal ones, they may suffer from slow mixing in infinite mixtures, where some form of truncation, either deterministic or random, is required. In mixtures with random number of components, the exploration of parameter spaces of different dimensions can also be challenging. We tackle these issues by expressing the mixture components in the random order of appearance in an exchangeable sequence directed by the mixing distribution. We derive a sampler that is straightforward to implement for mixing distributions with tractable size-biased ordered weights, and that can be readily adapted to mixture models for which marginal samplers are not available. In infinite mixtures, no form of truncation is necessary. As for finite mixtures with random dimension, a simple updating of the number of components is obtained by a blocking argument, thus, easing challenges found in trans-dimensional moves via Metropolis-Hastings steps. Additionally, sampling occurs in the space of ordered partitions with blocks labelled in the least element order, which endows the sampler with good mixing properties. The performance of the proposed algorithm is evaluated in a simulation study.
Our objective is to calculate the derivatives of data corrupted by noise. This is a challenging task as even small amounts of noise can result in significant errors in the computation. This is mainly due to the randomness of the noise, which can result in high-frequency fluctuations. To overcome this challenge, we suggest an approach that involves approximating the data by eliminating high-frequency terms from the Fourier expansion of the given data with respect to the polynomial-exponential basis. This truncation method helps to regularize the issue, while the use of the polynomial-exponential basis ensures accuracy in the computation. We demonstrate the effectiveness of our approach through numerical examples in one and two dimensions.
Chebyshev spectral methods are widely used in numerical computations. When the underlying function has a singularity, it has been observed by L. N. Trefethen in 2011 that its Chebyshev interpolants exhibit an error localization property, that is, their errors in a neighborhood of the singularity are obviously larger than elsewhere. In this paper, we first present a pointwise error analysis for Chebyshev projections of functions with a singularity and prove that the rate of convergence of Chebyshev projections of degree $n$ at each point away from the singularity is one power of $n$ faster than that of at the singularity. This gives a rigorous justification for the error localization of Chebyshev projections. We then extend the framework of our analysis to Chebyshev interpolants, Chebyshev spectral differentiations and Legendre projections and justify their error localization using similar arguments. As a result, we find that Chebyshev spectral differentiations converge faster than their best counterparts except in a neighborhood of the singularity and, in the particular case where the singularity is located in the interior of interval, they converge even faster than their best counterparts in the maximum norm.
Inspired by biological and cultural evolution, there have been many attempts to explore and elucidate the necessary conditions for open-endedness in artificial intelligence and artificial life. Using a continuous cellular automata called Lenia as the base system, we built large-scale evolutionary simulations using parallel computing framework JAX, in order to achieve the goal of never-ending evolution of self-organizing patterns. We report a number of system design choices, including (1) implicit implementation of genetic operators, such as reproduction by pattern self-replication, and selection by differential existential success; (2) localization of genetic information; and (3) algorithms for dynamically maintenance of the localized genotypes and translation to phenotypes. Simulation results tend to go through a phase of diversity and creativity, gradually converge to domination by fast expanding patterns, presumably a optimal solution under the current design. Based on our experimentation, we propose several factors that may further facilitate open-ended evolution, such as virtual environment design, mass conservation, and energy constraints.
Large language models (LLMs) are able to do accurate classification with zero or only a few examples (in-context learning). We show a prompting system that enables regression with uncertainty for in-context learning with frozen LLM (GPT-3, GPT-3.5, and GPT-4) models, allowing predictions without features or architecture tuning. By incorporating uncertainty, our approach enables Bayesian optimization for catalyst or molecule optimization using natural language, eliminating the need for training or simulation. Here, we performed the optimization using the synthesis procedure of catalysts to predict properties. Working with natural language mitigates difficulty synthesizability since the literal synthesis procedure is the model's input. We showed that in-context learning could improve past a model context window (maximum number of tokens the model can process at once) as data is gathered via example selection, allowing the model to scale better. Although our method does not outperform all baselines, it requires zero training, feature selection, and minimal computing while maintaining satisfactory performance. We also find Gaussian Process Regression on text embeddings is strong at Bayesian optimization. The code is available in our GitHub repository: //github.com/ur-whitelab/BO-LIFT
The most relevant problems in discounted reinforcement learning involve estimating the mean of a function under the stationary distribution of a Markov reward process, such as the expected return in policy evaluation, or the policy gradient in policy optimization. In practice, these estimates are produced through a finite-horizon episodic sampling, which neglects the mixing properties of the Markov process. It is mostly unclear how this mismatch between the practical and the ideal setting affects the estimation, and the literature lacks a formal study on the pitfalls of episodic sampling, and how to do it optimally. In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor. Then, we provide a statistical analysis on a set of notable estimators and the corresponding sampling procedures, which includes the finite-horizon estimators often used in practice. Crucially, we show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties w.r.t. the alternative estimators, as it matches the lower bound without requiring a careful tuning of the episode horizon.
We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q, which can be either acyclic or cyclic, and contain binary and non-binary relations. - Guarantee an arbitrary small error with a high probability always in \~O(AGM/OUT) time, where AGM is the AGM bound OUT (an upper bound of OUT), and \~O hides the polylogarithmic factor of input size. We also explain previous join size estimators in a unified framework. All methods including ours rely on certain indexes on relations in D, which take linear time to build offline. Additionally, we extend our method using generalized hypertree decompositions (GHDs) to achieve a lower complexity than \~O(AGM/OUT) when OUT is small, and present optimization techniques for improving estimation efficiency and accuracy.
Large-scale linear, time-invariant (LTI) dynamical systems are widely used to characterize complicated physical phenomena. We propose a two-stage algorithm to reduce the order of a large-scale LTI system given samples of its transfer function for a target degree $k$ of the reduced system. In the first stage, a modified adaptive Antoulas--Anderson (AAA) algorithm is used to construct a degree $d$ rational approximation of the transfer function that corresponds to an intermediate system, which can be numerically stably reduced in the second stage using ideas from the theory on Hankel norm approximation (HNA). We also study the numerical issues of Glover's HNA algorithm and provide a remedy for its numerical instabilities. A carefully computed rational approximation of degree $d$ gives us a numerically stable algorithm for reducing an LTI system, which is more efficient than SVD-based algorithms and more accurate than moment-matching algorithms.
This paper reconsiders end-to-end learning approaches to the Optimal Power Flow (OPF). Existing methods, which learn the input/output mapping of the OPF, suffer from scalability issues due to the high dimensionality of the output space. This paper first shows that the space of optimal solutions can be significantly compressed using principal component analysis (PCA). It then proposes Compact Learning, a new method that learns in a subspace of the principal components before translating the vectors into the original output space. This compression reduces the number of trainable parameters substantially, improving scalability and effectiveness. Compact Learning is evaluated on a variety of test cases from the PGLib with up to 30,000 buses. The paper also shows that the output of Compact Learning can be used to warm-start an exact AC solver to restore feasibility, while bringing significant speed-ups.
In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.