We study pointwise estimation and uncertainty quantification for a sparse variational Gaussian process method with eigenvector inducing variables. For a rescaled Brownian motion prior, we derive theoretical guarantees and limitations for the frequentist size and coverage of pointwise credible sets. For sufficiently many inducing variables, we precisely characterize the asymptotic frequentist coverage, deducing when credible sets from this variational method are conservative and when overconfident/misleading. We numerically illustrate the applicability of our results and discuss connections with other common Gaussian process priors.
In this study, we address the central issue of statistical inference for Markov jump processes using discrete time observations. The primary problem at hand is to accurately estimate the infinitesimal generator of a Markov jump process, a critical task in various applications. To tackle this problem, we begin by reviewing established methods for generating sample paths from a Markov jump process conditioned to endpoints, known as Markov bridges. Additionally, we introduce a novel algorithm grounded in the concept of time-reversal, which serves as our main contribution. Our proposed method is then employed to estimate the infinitesimal generator of a Markov jump process. To achieve this, we use a combination of Markov Chain Monte Carlo techniques and the Monte Carlo Expectation-Maximization algorithm. The results obtained from our approach demonstrate its effectiveness in providing accurate parameter estimates. To assess the efficacy of our proposed method, we conduct a comprehensive comparative analysis with existing techniques (Bisection, Uniformization, Direct, Rejection, and Modified Rejection), taking into consideration both speed and accuracy. Notably, our method stands out as the fastest among the alternatives while maintaining high levels of precision.
This study presents a novel physics-enhanced machine learning (ML) and optimization framework tailored to address the challenges of designing intricate spinodal metamaterials with customized mechanical properties in scenarios where computational modeling is restricted, and experimental data is sparse. By utilizing sparse experimental data directly, our approach facilitates the inverse design of spinodal structures with precise finite-strain mechanical responses. Leveraging physics-based inductive biases to compensate for limited data availability, the framework sheds light on instability-induced pattern formation in periodic metamaterials, attributing it to nonconvex energetic potentials. Inspired by phase transformation modeling, the method integrates multiple partial input convex neural networks to create nonconvex potentials, effectively capturing complex nonlinear stress-strain behavior, even under extreme deformations.
The arrival of AI techniques in computations, with the potential for hallucinations and non-robustness, has made trustworthiness of algorithms a focal point. However, trustworthiness of the many classical approaches are not well understood. This is the case for feature selection, a classical problem in the sciences, statistics, machine learning etc. Here, the LASSO optimisation problem is standard. Despite its widespread use, it has not been established when the output of algorithms attempting to compute support sets of minimisers of LASSO in order to do feature selection can be trusted. In this paper we establish how no (randomised) algorithm that works on all inputs can determine the correct support sets (with probability $> 1/2$) of minimisers of LASSO when reading approximate input, regardless of precision and computing power. However, we define a LASSO condition number and design an efficient algorithm for computing these support sets provided the input data is well-posed (has finite condition number) in time polynomial in the dimensions and logarithm of the condition number. For ill-posed inputs the algorithm runs forever, hence, it will never produce a wrong answer. Furthermore, the algorithm computes an upper bound for the condition number when this is finite. Finally, for any algorithm defined on an open set containing a point with infinite condition number, there is an input for which the algorithm will either run forever or produce a wrong answer. Our impossibility results stem from generalised hardness of approximation -- within the Solvability Complexity Index (SCI) hierarchy framework -- that generalises the classical phenomenon of hardness of approximation.
This work puts forth low-complexity Riemannian subspace descent algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold. Different from the existing Riemannian gradient descent variants, the proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix. The resulting updates avoid the costly matrix operations like matrix exponentiation and dense matrix multiplication, which are generally required in almost all other Riemannian optimization algorithms on SPD manifold. We further identify a broad class of functions, arising in diverse applications, such as kernel matrix learning, covariance estimation of Gaussian distributions, maximum likelihood parameter estimation of elliptically contoured distributions, and parameter estimation in Gaussian mixture model problems, over which the Riemannian gradients can be calculated efficiently. The proposed uni-directional and multi-directional Riemannian subspace descent variants incur per-iteration complexities of $O(n)$ and $O(n^2)$ respectively, as compared to the $O(n^3)$ or higher complexity incurred by all existing Riemannian gradient descent variants. The superior runtime and low per-iteration complexity of the proposed algorithms is also demonstrated via numerical tests on large-scale covariance estimation and matrix square root problems. MATLAB code implementation is publicly available on GitHub : //github.com/yogeshd-iitk/subspace_descent_over_SPD_manifold
In this article, we study some anisotropic singular perturbations for a class of linear elliptic problems. A uniform estimates for conforming $Q_1$ finite element method are derived, and some other results of convergence and regularity for the continuous problem are proved.
We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual or parallel examples, ParrotTTS can transfer voices across languages while preserving the speaker specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results in monolingual and multi-lingual scenarios. ParrotTTS outperforms state-of-the-art multi-lingual TTS models using only a fraction of paired data as latter.
High-order tensor methods for solving both convex and nonconvex optimization problems have generated significant research interest, leading to algorithms with optimal global rates of convergence and local rates that are faster than Newton's method. On each iteration, these methods require the unconstrained local minimization of a (potentially nonconvex) multivariate polynomial of degree higher than two, constructed using third-order (or higher) derivative information, and regularized by an appropriate power of regularization. Developing efficient techniques for solving such subproblems is an ongoing topic of research, and this paper addresses the case of the third-order tensor subproblem. We propose the CQR algorithmic framework, for minimizing a nonconvex Cubic multivariate polynomial with Quartic Regularisation, by minimizing a sequence of local quadratic models that incorporate simple cubic and quartic terms. The role of the cubic term is to crudely approximate local tensor information, while the quartic one controls model regularization and progress. We provide necessary and sufficient optimality conditions that fully characterise the global minimizers of these cubic-quartic models. We then turn these conditions into secular equations that can be solved using nonlinear eigenvalue techniques. We show, using our optimality characterisations, that a CQR algorithmic variant has the optimal-order evaluation complexity of $\mathcal{O}(\epsilon^{-3/2})$ when applied to minimizing our quartically-regularised cubic subproblem, which can be further improved in special cases. We propose practical CQR variants that use local tensor information to construct the local cubic-quartic models. We test these variants numerically and observe them to be competitive with ARC and other subproblem solvers on typical instances and even superior on ill-conditioned subproblems with special structure.
We study hypothesis testing under communication constraints, where each sample is quantized before being revealed to a statistician. Without communication constraints, it is well known that the sample complexity of simple binary hypothesis testing is characterized by the Hellinger distance between the distributions. We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting and this bound is tight. We develop a polynomial-time algorithm that achieves the aforementioned sample complexity. Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance. Our proofs rely on a new reverse data processing inequality and a reverse Markov inequality, which may be of independent interest. For simple $M$-ary hypothesis testing, the sample complexity in the absence of communication constraints has a logarithmic dependence on $M$. We show that communication constraints can cause an exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive algorithms.
We describe and analyze a quasi-Trefftz DG method for solving boundary value problems for the homogeneous diffusion-advection-reaction equation with piecewise-smooth coefficients. Trefftz schemes are high-order Galerkin methods whose discrete functions are elementwise exact solutions of the underlying PDE. Trefftz basis functions can be computed for many PDEs that are linear, homogeneous and with piecewise-constant coefficients. However, if the equation has varying coefficients, in general, exact solutions are unavailable, hence the construction of discrete Trefftz spaces is impossible. Quasi-Trefftz methods have been introduced to overcome this limitation, relying on discrete spaces of functions that are elementwise "approximate solutions" of the PDE. A space-time quasi-Trefftz DG method for the acoustic wave equation with smoothly varying coefficients has recently been studied; since it has shown excellent results, we propose a related method that can be applied to second-order elliptic equations. The DG weak formulation is derived using an interior penalty parameter and the upwind numerical fluxes. We choose polynomial quasi-Trefftz basis functions, whose coefficients can be computed with a simple algorithm based on the Taylor expansion of the PDE's coefficients. The main advantage of Trefftz and quasi-Trefftz schemes over more classical ones is the higher accuracy for comparable numbers of degrees of freedom. We prove that the dimension of the quasi-Trefftz space is smaller than the dimension of the full polynomial space of the same degree and that yields the same optimal convergence rates. The quasi-Trefftz DG method is well-posed, consistent and stable and we prove its high-order convergence. We present some numerical experiments in two dimensions that show excellent properties in terms of approximation and convergence rate.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.