We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member. We provide a theoretical analysis based on a PAC-Bayes bound which guarantees that if we fit such a labeling on unlabeled data, and the true labels on the training data, we obtain low negative log-likelihood and high ensemble diversity on testing samples. Empirically, through detailed experiments, we find that for low to moderately-sized training sets, our ensembles are more diverse and provide better calibration than standard ensembles, sometimes significantly.
The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations and for formulating invariant visual operations at higher levels. This paper defines and proves a joint covariance property under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, which makes it possible to characterize how different types of image transformations interact with each other. Specifically, the derived relations show how the receptive field parameters need to be transformed, in order to match the output from spatio-temporal receptive fields with the underlying spatio-temporal image transformations.
The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations and for formulating invariant visual operations at higher levels. This paper defines and proves a joint covariance property under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, which makes it possible to characterize how different types of image transformations interact with each other. Specifically, the derived relations show the receptive field parameters need to be transformed, in order to match the output from spatio-temporal receptive fields with the underlying spatio-temporal image transformations.
We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi-Layer Perceptrons (MLP) generator to approximate the solutions under various settings and perturbations. The generator approach can speed up the fitting process when we need to solve the problem repeatedly, such as when constructing confidence bands using bootstrap. We evaluate our method against several existing methods, some of which do not use the monotonicity constraint, on some monotone curves with varying noise levels. We demonstrate that our method outperforms the other methods, especially in high-noise scenarios. We also apply our method to analyze the polarization-hole phenomenon during star formation in astrophysics. The source code is accessible at \texttt{\url{//github.com/szcf-weiya/MonotoneSplines.jl}}.
Urban traffic congestion remains a pressing challenge in our rapidly expanding cities, despite the abundance of available data and the efforts of policymakers. By leveraging behavioral system theory and data-driven control, this paper exploits the DeePC algorithm in the context of urban traffic control performed via dynamic traffic lights. To validate our approach, we consider a high-fidelity case study using the state-of-the-art simulation software package Simulation of Urban MObility (SUMO). Preliminary results indicate that DeePC outperforms existing approaches across various key metrics, including travel time and CO$_2$ emissions, demonstrating its potential for effective traffic management
This work tackles the task of extractive text summarization in a limited labeled data scenario using a semi-supervised approach. Specifically, we propose a prompt-based pseudolabel selection strategy using GPT-4. We evaluate our method on three text summarization datasets: TweetSumm, WikiHow, and ArXiv/PubMed. Our experiments show that by using an LLM to evaluate and generate pseudolabels, we can improve the ROUGE-1 by 10-20\% on the different datasets, which is akin to enhancing pretrained models. We also show that such a method needs a smaller pool of unlabeled examples to perform better.
Tracking the fundamental frequency (f0) of a monophonic instrumental performance is effectively a solved problem with several solutions achieving 99% accuracy. However, the related task of automatic music transcription requires a further processing step to segment an f0 contour into discrete notes. This sub-task of note segmentation is necessary to enable a range of applications including musicological analysis and symbolic music generation. Building on CREPE, a state-of-the-art monophonic pitch tracking solution based on a simple neural network, we propose a simple and effective method for post-processing CREPE's output to achieve monophonic note segmentation. The proposed method demonstrates state-of-the-art results on two challenging datasets of monophonic instrumental music. Our approach also gives a 97% reduction in the total number of parameters used when compared with other deep learning based methods.
Despite decades of practice, finite-size errors in many widely used electronic structure theories for periodic systems remain poorly understood. For periodic systems using a general Monkhorst-Pack grid, there has been no comprehensive and rigorous analysis of the finite-size error in the Hartree-Fock theory (HF) and the second order M{\o}ller-Plesset perturbation theory (MP2), which are the simplest wavefunction based method, and the simplest post-Hartree-Fock method, respectively. Such calculations can be viewed as a multi-dimensional integral discretized with certain trapezoidal rules. Due to the Coulomb singularity, the integrand has many points of discontinuity in general, and standard error analysis based on the Euler-Maclaurin formula gives overly pessimistic results. The lack of analytic understanding of finite-size errors also impedes the development of effective finite-size correction schemes. We propose a unified analysis to obtain sharp convergence rates of finite-size errors for the periodic HF and MP2 theories. Our main technical advancement is a generalization of the result of [Lyness, 1976] for obtaining sharp convergence rates of the trapezoidal rule for a class of non-smooth integrands. Our result is applicable to three-dimensional bulk systems as well as low dimensional systems (such as nanowires and 2D materials). Our unified analysis also allows us to prove the effectiveness of the Madelung-constant correction to the Fock exchange energy, and the effectiveness of a recently proposed staggered mesh method for periodic MP2 calculations [Xing, Li, Lin, J. Chem. Theory Comput. 2021]. Our analysis connects the effectiveness of the staggered mesh method with integrands with removable singularities, and suggests a new staggered mesh method for reducing finite-size errors of periodic HF calculations.
This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton. Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\log\epsilon|^{d})$-term separable approximation on well-separated domains with error $\epsilon>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\log\epsilon|^{d+1})$-term separable approximation was proved for Poisson-type problems.
Gaussian processes (GPs) are a popular class of Bayesian nonparametric models, but its training can be computationally burdensome for massive training datasets. While there has been notable work on scaling up these models for big data, existing methods typically rely on a stationary GP assumption for approximation, and can thus perform poorly when the underlying response surface is non-stationary, i.e., it has some regions of rapid change and other regions with little change. Such non-stationarity is, however, ubiquitous in real-world problems, including our motivating application for surrogate modeling of computer experiments. We thus propose a new Product of Sparse GP (ProSpar-GP) method for scalable GP modeling with massive non-stationary data. The ProSpar-GP makes use of a carefully-constructed product-of-experts formulation of sparse GP experts, where different experts are placed within local regions of non-stationarity. These GP experts are fit via a novel variational inference approach, which capitalizes on mini-batching and GPU acceleration for efficient optimization of inducing points and length-scale parameters for each expert. We further show that the ProSpar-GP is Kolmogorov-consistent, in that its generative distribution defines a valid stochastic process over the prediction space; such a property provides essential stability for variational inference, particularly in the presence of non-stationarity. We then demonstrate the improved performance of the ProSpar-GP over the state-of-the-art, in a suite of numerical experiments and an application for surrogate modeling of a satellite drag simulator.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.