Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions. Two regression methods based on fusion penalties are presented. The first one is a generalization of the variable fusion methodology based on the 1-nearest neighbor. The second one, called group fusion lasso, assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. A finite sample numerical simulation and an application on EEG data are presented.
Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.
Suitable discretizations through tensor product formulas of popular multidimensional operators (diffusion or diffusion--advection, for instance) lead to matrices with $d$-dimensional Kronecker sum structure. For evolutionary Partial Differential Equations containing such operators and integrated in time with exponential integrators, it is then of paramount importance to efficiently approximate the actions of $\varphi$-functions of the arising matrices. In this work, we show how to produce directional split approximations of third order with respect to the time step size. They conveniently employ tensor-matrix products (the so-called $\mu$-mode product and related Tucker operator, realized in practice with high performance level 3 BLAS), and allow for the effective usage of exponential Runge--Kutta integrators up to order three. The technique can also be efficiently implemented on modern computer hardware such as Graphic Processing Units. The approach has been successfully tested against state-of-the-art techniques on two well-known physical models that lead to Turing patterns, namely the 2D Schnakenberg and the 3D FitzHugh--Nagumo systems, on different architectures.
Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures. The performance and validity of such methods are of great importance for their application in empirical studies. While the prevailing method of Multiple Imputation by Chained Equations (MICE) with Predictive Mean Matching (PMM) is considered standard in the social science literature, the increase in complex datasets may require more advanced approaches based on machine learning. In particular, tree-based imputation methods have emerged as very competitive approaches. However, the performance and validity are not completely understood, particularly compared to the standard MICE PMM. This is especially true for inference in linear models. In this study, we investigate the impact of various imputation methods on coefficient estimation, Type I error, and power, to gain insights that can help empirical researchers deal with missingness more effectively. We explore MICE PMM alongside different tree-based methods, such as MICE with Random Forest (RF), Chained Random Forests with and without PMM (missRanger), and Extreme Gradient Boosting (MIXGBoost), conducting a realistic simulation study using the German National Educational Panel Study (NEPS) as the original data source. Our results reveal that Random Forest-based imputations, especially MICE RF and missRanger with PMM, consistently perform better in most scenarios. Standard MICE PMM shows partially increased bias and overly conservative test decisions, particularly with non-true zero coefficients. Our results thus underscore the potential advantages of tree-based imputation methods, albeit with a caveat that all methods perform worse with an increased missingness, particularly missRanger.
The rise of AI in human contexts places new demands on automated systems to be transparent and explainable. We examine some anthropomorphic ideas and principles relevant to such accountablity in order to develop a theoretical framework for thinking about digital systems in complex human contexts and the problem of explaining their behaviour. Structurally, systems are made of modular and hierachical components, which we abstract in a new system model using notions of modes and mode transitions. A mode is an independent component of the system with its own objectives, monitoring data, and algorithms. The behaviour of a mode, including its transitions to other modes, is determined by functions that interpret each mode's monitoring data in the light of its objectives and algorithms. We show how these belief functions can help explain system behaviour by visualising their evaluation as trajectories in higher-dimensional geometric spaces. These ideas are formalised mathematically by abstract and concrete simplicial complexes. We offer three techniques: a framework for design heuristics, a general system theory based on modes, and a geometric visualisation, and apply them in three types of human-centred systems.
We initiate the study of Boolean function analysis on high-dimensional expanders. We give a random-walk based definition of high-dimensional expansion, which coincides with the earlier definition in terms of two-sided link expanders. Using this definition, we describe an analog of the Fourier expansion and the Fourier levels of the Boolean hypercube for simplicial complexes. Our analog is a decomposition into approximate eigenspaces of random walks associated with the simplicial complexes. Our random-walk definition and the decomposition have the additional advantage that they extend to the more general setting of posets, encompassing both high-dimensional expanders and the Grassmann poset, which appears in recent work on the unique games conjecture. We then use this decomposition to extend the Friedgut-Kalai-Naor theorem to high-dimensional expanders. Our results demonstrate that a constant-degree high-dimensional expander can sometimes serve as a sparse model for the Boolean slice or hypercube, and quite possibly additional results from Boolean function analysis can be carried over to this sparse model. Therefore, this model can be viewed as a derandomization of the Boolean slice, containing only $|X(k-1)|=O(n)$ points in contrast to $\binom{n}{k}$ points in the $(k)$-slice (which consists of all $n$-bit strings with exactly $k$ ones).
Recent studies on reservoir computing essentially involve a high dimensional dynamical system as the reservoir, which transforms and stores the input as a higher dimensional state, for temporal and nontemporal data processing. We demonstrate here a method to predict temporal and nontemporal tasks by constructing virtual nodes as constituting a reservoir in reservoir computing using a nonlinear map, namely logistic map, and a simple finite trigonometric series. We predict three nonlinear systems, namely Lorenz, R\"ossler, and Hindmarsh-Rose, for temporal tasks and a seventh order polynomial for nontemporal tasks with great accuracy. Also, the prediction is made in the presence of noise and found to closely agree with the target. Remarkably, the logistic map performs well and predicts close to the actual or target values. The low values of the root mean square error confirm the accuracy of this method in terms of efficiency. Our approach removes the necessity of continuous dynamical systems for constructing the reservoir in reservoir computing. Moreover, the accurate prediction for the three different nonlinear systems suggests that this method can be considered a general one and can be applied to predict many systems. Finally, we show that the method also accurately anticipates the time series for the future (self prediction).
Open sets are central to mathematics, especially analysis and topology, in ways few notions are. In most, if not all, computational approaches to mathematics, open sets are only studied indirectly via their 'codes' or 'representations'. In this paper, we study how hard it is to compute, given an arbitrary open set of reals, the most common representation, i.e. a countable set of open intervals. We work in Kleene's higher-order computability theory, which was historically based on the S1-S9 schemes and which now has an intuitive lambda calculus formulation due to the authors. We establish many computational equivalences between on one hand the 'structure' functional that converts open sets to the aforementioned representation, and on the other hand functionals arising from mainstream mathematics, like basic properties of semi-continuous functions, the Urysohn lemma, and the Tietze extension theorem. We also compare these functionals to known operations on regulated and bounded variation functions, and the Lebesgue measure restricted to closed sets. We obtain a number of natural computational equivalences for the latter involving theorems from mainstream mathematics.
Pairwise comparison models are used for quantitatively evaluating utility and ranking in various fields. The increasing scale of modern problems underscores the need to understand statistical inference in these models when the number of subjects diverges, which is currently lacking in the literature except in a few special instances. This paper addresses this gap by establishing an asymptotic normality result for the maximum likelihood estimator in a broad class of pairwise comparison models. The key idea lies in identifying the Fisher information matrix as a weighted graph Laplacian matrix which can be studied via a meticulous spectral analysis. Our findings provide the first unified theory for performing statistical inference in a wide range of pairwise comparison models beyond the Bradley--Terry model, benefiting practitioners with a solid theoretical guarantee for their use. Simulations utilizing synthetic data are conducted to validate the asymptotic normality result, followed by a hypothesis test using a tennis competition dataset.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.