Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.
Trajectory optimization is a powerful tool for robot motion planning and control. State-of-the-art general-purpose nonlinear programming solvers are versatile, handle constraints effectively and provide a high numerical robustness, but they are slow because they do not fully exploit the optimal control problem structure at hand. Existing structure-exploiting solvers are fast, but they often lack techniques to deal with nonlinearity or rely on penalty methods to enforce (equality or inequality) path constraints. This work presents Fatrop: a trajectory optimization solver that is fast and benefits from the salient features of general-purpose nonlinear optimization solvers. The speed-up is mainly achieved through the integration of a specialized linear solver, based on a Riccati recursion that is generalized to also support stagewise equality constraints. To demonstrate the algorithm's potential, it is benchmarked on a set of robot problems that are challenging from a numerical perspective, including problems with a minimum-time objective and no-collision constraints. The solver is shown to solve problems for trajectory generation of a quadrotor, a robot manipulator and a truck-trailer problem in a few tens of milliseconds. The algorithm's C++-code implementation accompanies this work as open source software, released under the GNU Lesser General Public License (LGPL). This software framework may encourage and enable the robotics community to use trajectory optimization in more challenging applications.
The quantum dense output problem is the process of evaluating time-accumulated observables from time-dependent quantum dynamics using quantum computers. This problem arises frequently in applications such as quantum control and spectroscopic computation. We present a range of algorithms designed to operate on both early and fully fault-tolerant quantum platforms. These methodologies draw upon techniques like amplitude estimation, Hamiltonian simulation, quantum linear Ordinary Differential Equation (ODE) solvers, and quantum Carleman linearization. We provide a comprehensive complexity analysis with respect to the evolution time $T$ and error tolerance $\epsilon$. Our results demonstrate that the linearization approach can nearly achieve optimal complexity $\mathcal{O}(T/\epsilon)$ for a certain type of low-rank dense outputs. Moreover, we provide a linearization of the dense output problem that yields an exact and finite-dimensional closure which encompasses the original states. This formulation is related to the Koopman Invariant Subspace theory and may be of independent interest in nonlinear control and scientific machine learning.
We present an extended validation of semi-analytical, semi-empirical covariance matrices for the two-point correlation function (2PCF) on simulated catalogs representative of Luminous Red Galaxies (LRG) data collected during the initial two months of operations of the Stage-IV ground-based Dark Energy Spectroscopic Instrument (DESI). We run the pipeline on multiple effective Zel'dovich (EZ) mock galaxy catalogs with the corresponding cuts applied and compare the results with the mock sample covariance to assess the accuracy and its fluctuations. We propose an extension of the previously developed formalism for catalogs processed with standard reconstruction algorithms. We consider methods for comparing covariance matrices in detail, highlighting their interpretation and statistical properties caused by sample variance, in particular, nontrivial expectation values of certain metrics even when the external covariance estimate is perfect. With improved mocks and validation techniques, we confirm a good agreement between our predictions and sample covariance. This allows one to generate covariance matrices for comparable datasets without the need to create numerous mock galaxy catalogs with matching clustering, only requiring 2PCF measurements from the data itself. The code used in this paper is publicly available at //github.com/oliverphilcox/RascalC.
In this paper, we propose the Minimum Regularized Covariance Trace (MRCT) estimator, a novel method for robust covariance estimation and functional outlier detection. The MRCT estimator employs a subset-based approach that prioritizes subsets exhibiting greater centrality based on the generalization of the Mahalanobis distance, resulting in a fast-MCD type algorithm. Notably, the MRCT estimator handles high-dimensional data sets without the need for preprocessing or dimension reduction techniques, due to the internal smoothening whose amount is determined by the regularization parameter $\alpha > 0$. The selection of the regularization parameter $\alpha$ is automated. The proposed method adapts seamlessly to sparsely observed data by working directly with the finite matrix of basis coefficients. An extensive simulation study demonstrates the efficacy of the MRCT estimator in terms of robust covariance estimation and automated outlier detection, emphasizing the balance between noise exclusion and signal preservation achieved through appropriate selection of $\alpha$. The method converges fast in practice and performs favorably when compared to other functional outlier detection methods.
Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for linear feature learning with non-parametric prediction, which simultaneously estimates the prediction function and the linear subspace. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By utilising alternative minimisation, we iteratively rotate the data to improve alignment with leading directions and accurately estimate the relevant dimension in practical settings. We establish that our method yields a consistent estimator of the prediction function with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.
Coping with distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.
Analysis of high-dimensional data, where the number of covariates is larger than the sample size, is a topic of current interest. In such settings, an important goal is to estimate the signal level $\tau^2$ and noise level $\sigma^2$, i.e., to quantify how much variation in the response variable can be explained by the covariates, versus how much of the variation is left unexplained. This thesis considers the estimation of these quantities in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no responses $Y$. Our main research question is: how can one use the unlabeled data to better estimate $\tau^2$ and $\sigma^2$? We consider two frameworks: a linear regression model and a linear projection model in which linearity is not assumed. In the first framework, while linear regression is used, no sparsity assumptions on the coefficients are made. In the second framework, the linearity assumption is also relaxed and we aim to estimate the signal and noise levels defined by the linear projection. We first propose a naive estimator which is unbiased and consistent, under some assumptions, in both frameworks. We then show how the naive estimator can be improved by using zero-estimators, where a zero-estimator is a statistic arising from the unlabeled data, whose expected value is zero. In the first framework, we calculate the optimal zero-estimator improvement and discuss ways to approximate the optimal improvement. In the second framework, such optimality does no longer hold and we suggest two zero-estimators that improve the naive estimator although not necessarily optimally. Furthermore, we show that our approach reduces the variance for general initial estimators and we present an algorithm that potentially improves any initial estimator. Lastly, we consider four datasets and study the performance of our suggested methods.
We introduce a multifidelity estimator of covariance matrices formulated as the solution to a regression problem on the manifold of symmetric positive definite matrices. The estimator is positive definite by construction, and the Mahalanobis distance minimized to obtain it possesses properties which enable practical computation. We show that our manifold regression multifidelity (MRMF) covariance estimator is a maximum likelihood estimator under a certain error model on manifold tangent space. More broadly, we show that our Riemannian regression framework encompasses existing multifidelity covariance estimators constructed from control variates. We demonstrate via numerical examples that our estimator can provide significant decreases, up to one order of magnitude, in squared estimation error relative to both single-fidelity and other multifidelity covariance estimators. Furthermore, preservation of positive definiteness ensures that our estimator is compatible with downstream tasks, such as data assimilation and metric learning, in which this property is essential.
This paper studies the problem of forecasting general stochastic processes using a path-dependent extension of the Neural Jump ODE (NJ-ODE) framework. While NJ-ODE was the first framework to establish convergence guarantees for the prediction of irregularly observed time series, these results were limited to data stemming from It\^o-diffusions with complete observations, in particular Markov processes where all coordinates are observed simultaneously. In this work, we generalise these results to generic, possibly non-Markovian or discontinuous, stochastic processes with incomplete observations, by utilising the reconstruction properties of the signature transform. These theoretical results are supported by empirical studies, where it is shown that the path-dependent NJ-ODE outperforms the original NJ-ODE framework in the case of non-Markovian data. Moreover, we show that PD-NJ-ODE can be applied successfully to classical stochastic filtering problems and to limit order book (LOB) data.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.