In many practical studies, learning directionality between a pair of variables is of great interest while notoriously hard when their underlying relation is nonlinear. This paper presents a method that examines asymmetry in exposure-outcome pairs when a priori assumptions about their relative ordering are unavailable. Our approach utilizes a framework of generative exposure mapping (GEM) to study asymmetric relations in continuous exposure-outcome pairs, through which we can capture distributional asymmetries with no prefixed variable ordering. We propose a coefficient of asymmetry to quantify relational asymmetry using Shannon's entropy analytics as well as statistical estimation and inference for such an estimand of directionality. Large-sample theoretical guarantees are established for cross-fitting inference techniques. The proposed methodology is extended to allow both measured confounders and contamination in outcome measurements, which is extensively evaluated through extensive simulation studies and real data applications.
We address the problem of the best uniform approximation of a continuous function on a convex domain. The approximation is by linear combinations of a finite system of functions (not necessarily Chebyshev) under arbitrary linear constraints. By modifying the concept of alternance and of the Remez iterative procedure we present a method, which demonstrates its efficiency in numerical problems. The linear rate of convergence is proved under some favourable assumptions. A special attention is paid to systems of complex exponents, Gaussian functions, lacunar algebraic and trigonometric polynomials. Applications to signal processing, linear ODE, switching dynamical systems, and to Markov-Bernstein type inequalities are considered.
We propose a method for obtaining parsimonious decompositions of networks into higher order interactions which can take the form of arbitrary motifs.The method is based on a class of analytically solvable generative models, where vertices are connected via explicit copies of motifs, which in combination with non-parametric priors allow us to infer higher order interactions from dyadic graph data without any prior knowledge on the types or frequencies of such interactions. Crucially, we also consider 'degree--corrected' models that correctly reflect the degree distribution of the network and consequently prove to be a better fit for many real world--networks compared to non-degree corrected models. We test the presented approach on simulated data for which we recover the set of underlying higher order interactions to a high degree of accuracy. For empirical networks the method identifies concise sets of atomic subgraphs from within thousands of candidates that cover a large fraction of edges and include higher order interactions of known structural and functional significance. The method not only produces an explicit higher order representation of the network but also a fit of the network to analytically tractable models opening new avenues for the systematic study of higher order network structures.
We develop a new rank-based approach for univariate two-sample testing in the presence of missing data which makes no assumptions about the missingness mechanism. This approach is a theoretical extension of the Wilcoxon-Mann-Whitney test that controls the Type I error by providing exact bounds for the test statistic after accounting for the number of missing values. Greater statistical power is shown when the method is extended to account for a bounded domain. Furthermore, exact bounds are provided on the proportions of data that can be missing in the two samples while yielding a significant result. Simulations demonstrate that our method has good power, typically for cases of $10\%$ to $20\%$ missing data, while standard imputation approaches fail to control the Type I error. We illustrate our method on complex clinical trial data in which patients' withdrawal from the trial lead to missing values.
Decision making and learning in the presence of uncertainty has attracted significant attention in view of the increasing need to achieve robust and reliable operations. In the case where uncertainty stems from the presence of adversarial attacks this need is becoming more prominent. In this paper we focus on linear and nonlinear classification problems and propose a novel adversarial training method for robust classifiers, inspired by Support Vector Machine (SVM) margins. We view robustness under a data driven lens, and derive finite sample complexity bounds for both linear and non-linear classifiers in binary and multi-class scenarios. Notably, our bounds match natural classifiers' complexity. Our algorithm minimizes a worst-case surrogate loss using Linear Programming (LP) and Second Order Cone Programming (SOCP) for linear and non-linear models. Numerical experiments on the benchmark MNIST and CIFAR10 datasets show our approach's comparable performance to state-of-the-art methods, without needing adversarial examples during training. Our work offers a comprehensive framework for enhancing binary linear and non-linear classifier robustness, embedding robustness in learning under the presence of adversaries.
Regression analysis is a central topic in statistical modeling, aiming to estimate the relationships between a dependent variable, commonly referred to as the response variable, and one or more independent variables, i.e., explanatory variables. Linear regression is by far the most popular method for performing this task in several fields of research, such as prediction, forecasting, or causal inference. Beyond various classical methods to solve linear regression problems, such as Ordinary Least Squares, Ridge, or Lasso regressions - which are often the foundation for more advanced machine learning (ML) techniques - the latter have been successfully applied in this scenario without a formal definition of statistical significance. At most, permutation or classical analyses based on empirical measures (e.g., residuals or accuracy) have been conducted to reflect the greater ability of ML estimations for detection. In this paper, we introduce a method, named Statistical Agnostic Regression (SAR), for evaluating the statistical significance of an ML-based linear regression based on concentration inequalities of the actual risk using the analysis of the worst case. To achieve this goal, similar to the classification problem, we define a threshold to establish that there is sufficient evidence with a probability of at least 1-eta to conclude that there is a linear relationship in the population between the explanatory (feature) and the response (label) variables. Simulations in only two dimensions demonstrate the ability of the proposed agnostic test to provide a similar analysis of variance given by the classical $F$ test for the slope parameter.
We present a method for end-to-end learning of Koopman surrogate models for optimal performance in control. In contrast to previous contributions that employ standard reinforcement learning (RL) algorithms, we use a training algorithm that exploits the potential differentiability of environments based on mechanistic simulation models. We evaluate the performance of our method by comparing it to that of other controller type and training algorithm combinations on a literature known eNMPC case study. Our method exhibits superior performance on this problem, thereby constituting a promising avenue towards more capable controllers that employ dynamic surrogate models.
In many communication contexts, the capabilities of the involved actors cannot be known beforehand, whether it is a cell, a plant, an insect, or even a life form unknown to Earth. Regardless of the recipient, the message space and time scale could be too fast, too slow, too large, or too small and may never be decoded. Therefore, it pays to devise a way to encode messages agnostic of space and time scales. We propose the use of fractal functions as self-executable infinite-frequency carriers for sending messages, given their properties of structural self-similarity and scale invariance. We call it `fractal messaging'. Starting from a spatial embedding, we introduce a framework for a space-time scale-free messaging approach to this challenge. When considering a space and time-agnostic framework for message transmission, it would be interesting to encode a message such that it could be decoded at several spatio-temporal scales. Hence, the core idea of the framework proposed herein is to encode a binary message as waves along infinitely many frequencies (in power-like distributions) and amplitudes, transmit such a message, and then decode and reproduce it. To do so, the components of the Weierstrass function, a known fractal, are used as carriers of the message. Each component will have its amplitude modulated to embed the binary stream, allowing for a space-time-agnostic approach to messaging.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.