This paper considers the problem of reconstructing missing parts of functions based on their observed segments. It provides, for Gaussian processes and arbitrary bijective transformations thereof, theoretical expressions for the $L^2$-optimal reconstruction of the missing parts. These functions are obtained as solutions of explicit integral equations. In the discrete case, approximations of the solutions provide consistent expressions of all missing values of the processes. Rates of convergence of these approximations, under extra assumptions on the transformation function, are provided. In the case of Gaussian processes with a parametric covariance structure, the estimation can be conducted separately for each function, and yields nonlinear solutions in presence of memory. Simulated examples show that the proposed reconstruction indeed fares better than the conventional interpolation methods in various situations.
We present a novel, model-free, and data-driven methodology for controlling complex dynamical systems into previously unseen target states, including those with significantly different and complex dynamics. Leveraging a parameter-aware realization of next-generation reservoir computing, our approach accurately predicts system behavior in unobserved parameter regimes, enabling control over transitions to arbitrary target states. Crucially, this includes states with dynamics that differ fundamentally from known regimes, such as shifts from periodic to intermittent or chaotic behavior. The method's parameter-awareness facilitates non-stationary control, ensuring smooth transitions between states. By extending the applicability of machine learning-based control mechanisms to previously inaccessible target dynamics, this methodology opens the door to transformative new applications while maintaining exceptional efficiency. Our results highlight reservoir computing as a powerful alternative to traditional methods for dynamic system control.
We consider the bidimensional Stokes problem for incompressible fluids in stream function-vorticity. For this problem, the classical finite elements method of degree one converges only to order one-half for the L2 norm of the vorticity. We propose to use harmonic functions to approach the vorticity along the boundary. Discrete harmonics are functions that are used in practice to derive a new numerical method. We prove that we obtain with this numerical scheme an error of order one for the L2 norm of the vorticity.
Studying unified model averaging estimation for situations with complicated data structures, we propose a novel model averaging method based on cross-validation (MACV). MACV unifies a large class of new and existing model averaging estimators and covers a very general class of loss functions. Furthermore, to reduce the computational burden caused by the conventional leave-subject/one-out cross validation, we propose a SEcond-order-Approximated Leave-one/subject-out (SEAL) cross validation, which largely improves the computation efficiency. In the context of non-independent and non-identically distributed random variables, we establish the unified theory for analyzing the asymptotic behaviors of the proposed MACV and SEAL methods, where the number of candidate models is allowed to diverge with sample size. To demonstrate the breadth of the proposed methodology, we exemplify four optimal model averaging estimators under four important situations, i.e., longitudinal data with discrete responses, within-cluster correlation structure modeling, conditional prediction in spatial data, and quantile regression with a potential correlation structure. We conduct extensive simulation studies and analyze real-data examples to illustrate the advantages of the proposed methods.
The coherent systems are basic concepts in reliability theory and survival analysis. They contain as particular cases the popular series, parallel and $k$-ou-of-$n$ systems (order statistics). Many results have been obtained for them by assuming that the component lifetimes are independent. In many practical cases, this assumption is unrealistic. In this paper we study them by assuming a Time Transformed Exponential (TTE) model for the joint distribution of the component lifetimes. This model is equivalent to the frailty model which assumes that they are conditionally independent given a common risk parameter (which represents the common environment risk). Under this model, we obtain explicit expressions for the system reliability functions and comparison results for the main stochastic orders. The system residual lifetime (under different assumptions) is studied as well.
The need for statistical models of orientations arises in many applications in engineering and computer science. Orientational data appear as sets of angles, unit vectors, rotation matrices or quaternions. In the field of directional statistics, a lot of advances have been made in modelling such types of data. However, only a few of these tools are used in engineering and computer science applications. Hence, this paper aims to serve as a cheat sheet for those probability distributions of orientations. Models for 1-DOF, 2-DOF and 3-DOF orientations are discussed. For each of them, expressions for the density function, fitting to data, and sampling are presented. The paper is written with a compromise between engineering and statistics in terms of notation and terminology. A Python library with functions for some of these models is provided. Using this library, two examples of applications to real data are presented.
We consider the problem of causal inference based on observational data (or the related missing data problem) with a binary or discrete treatment variable. In that context, we study inference for the counterfactual density functions and contrasts thereof, which can provide more nuanced information than counterfactual means and the average treatment effect. We impose the shape-constraint of log-concavity, a type of unimodality constraint, on the counterfactual densities, and then develop doubly robust estimators of the log-concave counterfactual density based on augmented inverse-probability weighted pseudo-outcomes. We provide conditions under which the estimator is consistent in various global metrics. We also develop asymptotically valid pointwise confidence intervals for the counterfactual density functions and differences and ratios thereof, which serve as a building block for more comprehensive analyses of distributional differences. We also present a method for using our estimator to implement density confidence bands.
Two sequential estimators are proposed for the odds p/(1-p) and log odds log(p/(1-p)) respectively, using independent Bernoulli random variables with parameter p as inputs. The estimators are unbiased, and guarantee that the variance of the estimation error divided by the true value of the odds, or the variance of the estimation error of the log odds, are less than a target value for any p in (0,1). The estimators are close to optimal in the sense of Wolfowitz's bound.
Ideally, all analyses of normally distributed data should include the full covariance information between all data points. In practice, the full covariance matrix between all data points is not always available. Either because a result was published without a covariance matrix, or because one tries to combine multiple results from separate publications. For simple hypothesis tests, it is possible to define robust test statistics that will behave conservatively in the presence on unknown correlations. For model parameter fits, one can inflate the variance by a factor to ensure that things remain conservative at least up to a chosen confidence level. This paper describes a class of robust test statistics for simple hypothesis tests, as well as an algorithm to determine the necessary inflation factor for model parameter fits and Goodness of Fit tests and composite hypothesis tests. It then presents some example applications of the methods to real neutrino interaction data and model comparisons.
This manuscript describes the notions of blocker and interdiction applied to well-known optimization problems. The main interest of these two concepts is the capability to analyze the existence of a combinatorial structure after some modifications. We focus on graph modification, like removing vertices or links in a network. In the interdiction version, we have a budget for modification to reduce as much as possible the size of a given combinatorial structure. Whereas, for the blocker version, we minimize the number of modifications such that the network does not contain a given combinatorial structure. Blocker and interdiction problems have some similarities and can be applied to well-known optimization problems. We consider matching, connectivity, shortest path, max flow, and clique problems. For these problems, we analyze either the blocker version or the interdiction one. Applying the concept of blocker or interdiction to well-known optimization problems can change their complexities. Some optimization problems become harder when one of these two notions is applied. For this reason, we propose some complexity analysis to show when an optimization problem, or the associated decision problem, becomes harder. Another fundamental aspect developed in the manuscript is the use of exact methods to tackle these optimization problems. The main way to solve these problems is to use integer linear programming to model them. An interesting aspect of integer linear programming is the possibility to analyze theoretically the strength of these models, using cutting planes. For most of the problems studied in this manuscript, a polyhedral analysis is performed to prove the strength of inequalities or describe new families of inequalities. The exact algorithms proposed are based on Branch-and-Cut or Branch-and-Price algorithm, where dedicated separation and pricing algorithms are proposed.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.