We propose center-outward superquantile and expected shortfall functions, with applications to multivariate risk measurements, extending the standard notion of value at risk and conditional value at risk from the real line to $\mathbb{R}^d$. Our new concepts are built upon the recent definition of Monge-Kantorovich quantiles based on the theory of optimal transport, and they provide a natural way to characterize multivariate tail probabilities and central areas of point clouds. They preserve the univariate interpretation of a typical observation that lies beyond or ahead a quantile, but in a meaningful multivariate way. We show that they characterize random vectors and their convergence in distribution, which underlines their importance. Our new concepts are illustrated on both simulated and real datasets.
Nowadays, low-rank approximations of matrices are an important component of many methods in science and engineering. Traditionally, low-rank approximations are considered in unitary invariant norms, however, recently element-wise approximations have also received significant attention in the literature. In this paper, we propose an accelerated alternating minimization algorithm for solving the problem of low-rank approximation of matrices in the Chebyshev norm. Through the numerical evaluation we demonstrate the effectiveness of the proposed procedure for large-scale problems. We also theoretically investigate the alternating minimization method and introduce the notion of a $2$-way alternance of rank $r$. We show that the presence of a $2$-way alternance of rank $r$ is the necessary condition of the optimal low-rank approximation in the Chebyshev norm and that all limit points of the alternating minimization method satisfy this condition.
We investigate notions of complete representation by partial functions, where the operations in the signature include antidomain restriction and may include composition, intersection, update, preferential union, domain, antidomain, and set difference. When the signature includes both antidomain restriction and intersection, the join-complete and the meet-complete representations coincide. Otherwise, for the signatures we consider, meet-complete is strictly stronger than join-complete. A necessary condition to be meet-completely representable is that the atoms are separating. For the signatures we consider, this condition is sufficient if and only if composition is not in the signature. For each of the signatures we consider, the class of (meet-)completely representable algebras is not axiomatisable by any existential-universal-existential first-order theory. For 14 expressively distinct signatures, we show, by giving an explicit representation, that the (meet-)completely representable algebras form a basic elementary class, axiomatisable by a universal-existential-universal first-order sentence. The signatures we axiomatise are those containing antidomain restriction and any of intersection, update, and preferential union and also those containing antidomain restriction, composition, and intersection and any of update, preferential union, domain, and antidomain.
We establish guaranteed and practically computable a posteriori error bounds for source problems and eigenvalue problems involving linear Schr{\"o}dinger operators with atom-centered potentials discretized with linear combinations of atomic orbitals. We show that the energy norm of the discretization error can be estimated by the dual energy norm of the residual, that further decomposes into atomic contributions, characterizing the error localized on atoms. Moreover, we show that the practical computation of the dual norms of atomic residuals involves diagonalizing radial Schr{\"o}dinger operators which can easily be precomputed in practice. We provide numerical illustrations of the performance of such a posteriori analysis on several test cases, showing that the error bounds accurately estimate the error, and that the localized error components allow for optimized adaptive basis sets.
With the increase in computational power for the available hardware, the demand for high-resolution data in computer graphics applications increases. Consequently, classical geometry processing techniques based on linear algebra solutions are starting to become obsolete. In this setting, we propose a novel approach for tackling mesh deformation tasks on high-resolution meshes. By reducing the input size with a fast remeshing technique and preserving a consistent representation of the original mesh with local reference frames, we provide a solution that is both scalable and robust in multiple applications, such as as-rigid-as-possible deformations, non-rigid isometric transformations, and pose transfer tasks. We extensively test our technique and compare it against state-of-the-art methods, proving that our approach can handle meshes with hundreds of thousands of vertices in tens of seconds while still achieving results comparable with the other solutions.
A Gaussian process is proposed as a model for the posterior distribution of the local predictive ability of a model or expert, conditional on a vector of covariates, from historical predictions in the form of log predictive scores. Assuming Gaussian expert predictions and a Gaussian data generating process, a linear transformation of the predictive score follows a noncentral chi-squared distribution with one degree of freedom. Motivated by this we develop a noncentral chi-squared Gaussian process regression to flexibly model local predictive ability, with the posterior distribution of the latent GP function and kernel hyperparameters sampled by Hamiltonian Monte Carlo. We show that a cube-root transformation of the log scores is approximately Gaussian with homoscedastic variance, making it possible to estimate the model much faster by marginalizing the latent GP function analytically. A multi-output Gaussian process regression is also introduced to model the dependence in predictive ability between experts, both for inference and prediction purposes. Linear pools based on learned local predictive ability are applied to predict daily bike usage in Washington DC.
Physics-informed methods have gained a great success in analyzing data with partial differential equation (PDE) constraints, which are ubiquitous when modeling dynamical systems. Different from the common penalty-based approach, this work promotes adherence to the underlying physical mechanism that facilitates statistical procedures. The motivating application concerns modeling fluorescence recovery after photobleaching, which is used for characterization of diffusion processes. We propose a physics-encoded regression model for handling spatio-temporally distributed data, which enables principled interpretability, parsimonious computation and efficient estimation by exploiting the structure of solutions of a governing evolution equation. The rate of convergence attaining the minimax optimality is theoretically demonstrated, generalizing the result obtained for the spatial regression. We conduct simulation studies to assess the performance of our proposed estimator and illustrate its usage in the aforementioned real data example.
We propose a fast scheme for approximating the Mittag-Leffler function by an efficient sum-of-exponentials (SOE), and apply the scheme to the viscoelastic model of wave propagation with mixed finite element methods for the spatial discretization and the Newmark-beta scheme for the second-order temporal derivative. Compared with traditional L1 scheme for fractional derivative, our fast scheme reduces the memory complexity from $\mathcal O(N_sN) $ to $\mathcal O(N_sN_{exp})$ and the computation complexity from $\mathcal O(N_sN^2)$ to $\mathcal O(N_sN_{exp}N)$, where $N$ denotes the total number of temporal grid points, $N_{exp}$ is the number of exponentials in SOE, and $N_s$ represents the complexity of memory and computation related to the spatial discretization. Numerical experiments are provided to verify the theoretical results.
Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on Gaussian latent variables for generation. For search-based or creative applications that require additional control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Combination of Gaussian variables (COG) as a general purpose interpolation method that is easy to implement yet outperforms recent sophisticated methods. Moreover, COG naturally addresses the broader task of forming general linear combinations of latent variables, allowing the construction of subspaces of the latent space, dramatically simplifying the creation of expressive low-dimensional spaces of high-dimensional objects.
In many applied sciences a popular analysis strategy for high-dimensional data is to fit many multivariate generalized linear models in parallel. This paper presents a novel approach to address the resulting multiple testing problem by combining a recently developed sign-flip test with permutation-based multiple-testing procedures. Our method builds upon the univariate standardized flip-scores test which offers robustness against misspecified variances in generalized linear models, a crucial feature in high-dimensional settings where comprehensive model validation is particularly challenging. We extend this approach to the multivariate setting, enabling adaptation to unknown response correlation structures. This approach yields relevant power improvements over conventional multiple testing methods when correlation is present.
We develop confidence sets which provide spatial uncertainty guarantees for the output of a black-box machine learning model designed for image segmentation. To do so we adapt conformal inference to the imaging setting, obtaining thresholds on a calibration dataset based on the distribution of the maximum of the transformed logit scores within and outside of the ground truth masks. We prove that these confidence sets, when applied to new predictions of the model, are guaranteed to contain the true unknown segmented mask with desired probability. We show that learning appropriate score transformations on a learning dataset before performing calibration is crucial for optimizing performance. We illustrate and validate our approach on a polpys tumor dataset. To do so we obtain the logit scores from a deep neural network trained for polpys segmentation and show that using distance transformed scores to obtain outer confidence sets and the original scores for inner confidence sets enables tight bounds on tumor location whilst controlling the false coverage rate.