We propose regression models for curve-valued responses in two or more dimensions, where only the image but not the parametrization of the curves is of interest. Examples of such data are handwritten letters, movement paths or outlines of objects. In the square-root-velocity framework, a parametrization invariant distance for curves is obtained as the quotient space metric with respect to the action of re-parametrization, which is by isometries. With this special case in mind, we discuss the generalization of 'linear' regression to quotient metric spaces more generally, before illustrating the usefulness of our approach for curves modulo re-parametrization. We address the issue of sparsely or irregularly sampled curves by using splines for modeling smooth conditional mean curves. We test this model in simulations and apply it to human hippocampal outlines, obtained from Magnetic Resonance Imaging scans. Here we model how the shape of the irregularly sampled hippocampus is related to age, Alzheimer's disease and sex.
Data compression algorithms typically rely on identifying repeated sequences of symbols from the original data to provide a compact representation of the same information, while maintaining the ability to recover the original data from the compressed sequence. Using data transformations prior to the compression process has the potential to enhance the compression capabilities, being lossless as long as the transformation is invertible. Floating point data presents unique challenges to generate invertible transformations with high compression potential. This paper identifies key conditions for basic operations of floating point data that guarantee lossless transformations. Then, we show four methods that make use of these observations to deliver lossless compression of real datasets, where we improve compression rates up to 40 %.
The use of Air traffic management (ATM) simulators for planing and operations can be challenging due to their modelling complexity. This paper presents XALM (eXplainable Active Learning Metamodel), a three-step framework integrating active learning and SHAP (SHapley Additive exPlanations) values into simulation metamodels for supporting ATM decision-making. XALM efficiently uncovers hidden relationships among input and output variables in ATM simulators, those usually of interest in policy analysis. Our experiments show XALM's predictive performance comparable to the XGBoost metamodel with fewer simulations. Additionally, XALM exhibits superior explanatory capabilities compared to non-active learning metamodels. Using the `Mercury' (flight and passenger) ATM simulator, XALM is applied to a real-world scenario in Paris Charles de Gaulle airport, extending an arrival manager's range and scope by analysing six variables. This case study illustrates XALM's effectiveness in enhancing simulation interpretability and understanding variable interactions. By addressing computational challenges and improving explainability, XALM complements traditional simulation-based analyses. Lastly, we discuss two practical approaches for reducing the computational burden of the metamodelling further: we introduce a stopping criterion for active learning based on the inherent uncertainty of the metamodel, and we show how the simulations used for the metamodel can be reused across key performance indicators, thus decreasing the overall number of simulations needed.
A slow decaying Kolmogorov n-width of the solution manifold of a parametric partial differential equation precludes the realization of efficient linear projection-based reduced-order models. This is due to the high dimensionality of the reduced space needed to approximate with sufficient accuracy the solution manifold. To solve this problem, neural networks, in the form of different architectures, have been employed to build accurate nonlinear regressions of the solution manifolds. However, the majority of the implementations are non-intrusive black-box surrogate models, and only a part of them perform dimension reduction from the number of degrees of freedom of the discretized parametric models to a latent dimension. We present a new intrusive and explicable methodology for reduced-order modelling that employs neural networks for solution manifold approximation but that does not discard the physical and numerical models underneath in the predictive/online stage. We will focus on autoencoders used to compress further the dimensionality of linear approximants of solution manifolds, achieving in the end a nonlinear dimension reduction. After having obtained an accurate nonlinear approximant, we seek for the solutions on the latent manifold with the residual-based nonlinear least-squares Petrov-Galerkin method, opportunely hyper-reduced in order to be independent from the number of degrees of freedom. New adaptive hyper-reduction strategies are developed along with the employment of local nonlinear approximants. We test our methodology on two nonlinear time-dependent parametric benchmarks involving a supersonic flow past a NACA airfoil with changing Mach number and an incompressible turbulent flow around the Ahmed body with changing slant angle.
Given samples from two non-negative random variables, we propose a new class of nonparametric tests for the null hypothesis that one random variable dominates the other with respect to second-order stochastic dominance. These tests are based on the Lorenz P-P plot (LPP), which is the composition between the inverse unscaled Lorenz curve of one distribution and the unscaled Lorenz curve of the other. The LPP exceeds the identity function if and only if the dominance condition is violated, providing a rather simple method to construct test statistics, given by functionals defined over the difference between the identity and the LPP. We determine a stochastic upper bound for such test statistics under the null hypothesis, and derive its limit distribution, to be approximated via bootstrap procedures. We also establish the asymptotic validity of the tests under relatively mild conditions, allowing for both dependent and independent samples. Finally, finite sample properties are investigated through simulation studies.
Forward and inverse models are used throughout different engineering fields to predict and understand the behaviour of systems and to find parameters from a set of observations. These models use root-finding and minimisation techniques respectively to achieve their goals. This paper introduces improvements to these mathematical methods to then improve the convergence behaviour of the overarching models when used in highly non-linear systems. The performance of the new techniques is examined in detail and compared to that of the standard methods. The improved techniques are also tested with FEM models to show their practical application. Depending on the specific configuration of the problem, the improved models yielded larger convergence basins and/or took fewer steps to converge.
Particle methods based on evolving the spatial derivatives of the solution were originally introduced to simulate reaction-diffusion processes, inspired by vortex methods for the Navier--Stokes equations. Such methods, referred to as gradient random walk methods, were extensively studied in the '90s and have several interesting features, such as being grid free, automatically adapting to the solution by concentrating elements where the gradient is large and significantly reducing the variance of the standard random walk approach. In this work, we revive these ideas by showing how to generalize the approach to a larger class of partial differential equations, including hyperbolic systems of conservation laws. To achieve this goal, we first extend the classical Monte Carlo method to relaxation approximation of systems of conservation laws, and subsequently consider a novel particle dynamics based on the spatial derivatives of the solution. The methodology, combined with asymptotic-preserving splitting discretization, yields a way to construct a new class of gradient-based Monte Carlo methods for hyperbolic systems of conservation laws. Several results in one spatial dimension for scalar equations and systems of conservation laws show that the new methods are very promising and yield remarkable improvements compared to standard Monte Carlo approaches, either in terms of variance reduction as well as in describing the shock structure.
Partially linear additive models generalize linear ones since they model the relation between a response variable and covariates by assuming that some covariates have a linear relation with the response but each of the others enter through unknown univariate smooth functions. The harmful effect of outliers either in the residuals or in the covariates involved in the linear component has been described in the situation of partially linear models, that is, when only one nonparametric component is involved in the model. When dealing with additive components, the problem of providing reliable estimators when atypical data arise, is of practical importance motivating the need of robust procedures. Hence, we propose a family of robust estimators for partially linear additive models by combining $B-$splines with robust linear regression estimators. We obtain consistency results, rates of convergence and asymptotic normality for the linear components, under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposal with its classical counterpart under different models and contamination schemes. The numerical experiments show the advantage of the proposed methodology for finite samples. We also illustrate the usefulness of the proposed approach on a real data set.
Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some choices of the penalty function. Moreover, we prove that these estimators consistently select variables with probability tending to 1 and derive their asymptotic distribution.
The EM algorithm is a popular tool for maximum likelihood estimation but has not been used much for high-dimensional regularization problems in linear mixed-effects models. In this paper, we introduce the EMLMLasso algorithm, which combines the EM algorithm and the popular and efficient R package glmnet for Lasso variable selection of fixed effects in linear mixed-effects models. We compare the performance of our proposed EMLMLasso algorithm with the one implemented in the well-known R package glmmLasso through the analyses of both simulated and real-world applications. The simulations and applications demonstrated good properties, such as consistency, and the effectiveness of the proposed variable selection procedure, for both $p < n$ and $p > n$. Moreover, in all evaluated scenarios, the EMLMLasso algorithm outperformed glmmLasso. The proposed method is quite general and can be easily extended for ridge and elastic net penalties in linear mixed-effects models.
We show that spectral data of the Koopman operator arising from an analytic expanding circle map $\tau$ can be effectively calculated using an EDMD-type algorithm combining a collocation method of order m with a Galerkin method of order n. The main result is that if $m \geq \delta n$, where $\delta$ is an explicitly given positive number quantifying by how much $\tau$ expands concentric annuli containing the unit circle, then the method converges and approximates the spectrum of the Koopman operator, taken to be acting on a space of analytic hyperfunctions, exponentially fast in n. Additionally, these results extend to more general expansive maps on suitable annuli containing the unit circle.