We derive a simple unified framework giving closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics.
The question of answering queries over ML predictions has been gaining attention in the database community. This question is challenging because the cost of finding high quality answers corresponds to invoking an oracle such as a human expert or an expensive deep neural network model on every single item in the DB and then applying the query. We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage of finding high quality answers for both Precision-Target (PT) and Recall-Target (RT) queries. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. It relies on two assumptions. Under the Proxy Quality assumption, proxy quality can be quantified in a probabilistic manner w.r.t. the oracle. This allows us to develop two algorithms: PQA that efficiently finds high quality answers with high probability and no oracle calls, and PQE, a heuristic extension that achieves empirically good performance with a small number of oracle calls. Alternatively, under the Core Set Closure assumption, we develop two algorithms: CSC that efficiently returns high quality answers with high probability and minimal oracle usage, and CSE, which extends it to more general settings. Our extensive experiments on five real-world datasets on both query types, PT and RT, demonstrate that our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.
Fairness and robustness play vital roles in trustworthy machine learning. Observing safety-critical needs in various annotation-expensive vision applications, we introduce a novel learning framework, Fair Robust Active Learning (FRAL), generalizing conventional active learning to fair and adversarial robust scenarios. This framework allows us to achieve standard and robust minimax fairness with limited acquired labels. In FRAL, we then observe existing fairness-aware data selection strategies suffer from either ineffectiveness under severe data imbalance or inefficiency due to huge computations of adversarial training. To address these two problems, we develop a novel Joint INconsistency (JIN) method exploiting prediction inconsistencies between benign and adversarial inputs as well as between standard and robust models. These two inconsistencies can be used to identify potential fairness gains and data imbalance mitigations. Thus, by performing label acquisition with our inconsistency-based ranking metrics, we can alleviate the class imbalance issue and enhance minimax fairness with limited computation. Extensive experiments on diverse datasets and sensitive groups demonstrate that our method obtains the best results in standard and robust fairness under white-box PGD attacks compared with existing active data selection baselines.
Thanks to its favorable properties, the multivariate normal distribution is still largely employed for modeling phenomena in various scientific fields. However, when the number of components $p$ is of the same asymptotic order as the sample size $n$, standard inferential techniques are generally inadequate to conduct hypothesis testing on the mean vector and/or the covariance matrix. Within several prominent frameworks, we propose then to draw reliable conclusions via a directional test. We show that under the null hypothesis the directional $p$-value is exactly uniformly distributed even when $p$ is of the same order of $n$, provided that conditions for the existence of the maximum likelihood estimate for the normal model are satisfied. Extensive simulation results confirm the theoretical findings across different values of $p/n$, and show that under the null hypothesis the directional test outperforms not only the usual first and higher-order finite-$p$ solutions but also alternative methods tailored for high-dimensional settings. Simulation results also indicate that the power performance of the different tests depends on the specific alternative hypothesis.
We propose and investigate several statistical models and corresponding sampling schemes for data analysis based on unbalanced optimal transport (UOT) between finitely supported measures. Specifically, we analyse Kantorovich-Rubinstein (KR) distances with penalty parameter $C>0$. The main result provides non-asymptotic bounds on the expected error for the empirical KR distance as well as for its barycenters. The impact of the penalty parameter $C$ is studied in detail. Our approach justifies randomised computational schemes for UOT which can be used for fast approximate computations in combination with any exact solver. Using synthetic and real datasets, we empirically analyse the behaviour of the expected errors in simulation studies and illustrate the validity of our theoretical bounds.
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.
Just like weights, bias terms are the learnable parameters of many popular machine learning models, including neural networks. Biases are believed to effectively increase the representational power of neural networks to solve a wide range of tasks in computer vision. However, we argue that if we consider the intrinsic distribution of images in the input space as well as some desired properties a model should have from the first principles, biases can be completely ignored in addressing many image-related tasks, such as image classification. Our observation indicates that zero-bias neural networks could perform comparably to neural networks with bias at least on practical image classification tasks. In addition, we prove that zero-bias neural networks possess a nice property called scalar (multiplication) invariance, which has great potential in learning and understanding images captured under poor illumination conditions. We then extend scalar invariance to more general cases that allow us to verify certain convex regions of the input space. Our experimental results show that zero-bias models could outperform the state-of-art models by a very large margin (over 60%) when predicting images under a low illumination condition (multiplying a scalar of 0.01); while achieving the same-level performance as normal models.
The capability of recurrent neural networks to approximate trajectories of a random dynamical system, with random inputs, on non-compact domains, and over an indefinite or infinite time horizon is considered. The main result states that certain random trajectories over an infinite time horizon may be approximated to any desired accuracy, uniformly in time, by a certain class of deep recurrent neural networks, with simple feedback structures. The formulation here contrasts with related literature on this topic, much of which is restricted to compact state spaces and finite time intervals. The model conditions required here are natural, mild, and easy to test, and the proof is very simple.
Recent years have witnessed significant advances in technologies and services in modern network applications, including smart grid management, wireless communication, cybersecurity as well as multi-agent autonomous systems. Considering the heterogeneous nature of networked entities, emerging network applications call for game-theoretic models and learning-based approaches in order to create distributed network intelligence that responds to uncertainties and disruptions in a dynamic or an adversarial environment. This paper articulates the confluence of networks, games and learning, which establishes a theoretical underpinning for understanding multi-agent decision-making over networks. We provide an selective overview of game-theoretic learning algorithms within the framework of stochastic approximation theory, and associated applications in some representative contexts of modern network systems, such as the next generation wireless communication networks, the smart grid and distributed machine learning. In addition to existing research works on game-theoretic learning over networks, we highlight several new angles and research endeavors on learning in games that are related to recent developments in artificial intelligence. Some of the new angles extrapolate from our own research interests. The overall objective of the paper is to provide the reader a clear picture of the strengths and challenges of adopting game-theoretic learning methods within the context of network systems, and further to identify fruitful future research directions on both theoretical and applied studies.
In order to overcome the expressive limitations of graph neural networks (GNNs), we propose the first method that exploits vector flows over graphs to develop globally consistent directional and asymmetric aggregation functions. We show that our directional graph networks (DGNs) generalize convolutional neural networks (CNNs) when applied on a grid. Whereas recent theoretical works focus on understanding local neighbourhoods, local structures and local isomorphism with no global information flow, our novel theoretical framework allows directional convolutional kernels in any graph. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then we propose the use of the Laplacian eigenvectors as such vector field, and we show that the method generalizes CNNs on an n-dimensional grid, and is provably more discriminative than standard GNNs regarding the Weisfeiler-Lehman 1-WL test. Finally, we bring the power of CNN data augmentation to graphs by providing a means of doing reflection, rotation and distortion on the underlying directional field. We evaluate our method on different standard benchmarks and see a relative error reduction of 8\% on the CIFAR10 graph dataset and 11% to 32% on the molecular ZINC dataset. An important outcome of this work is that it enables to translate any physical or biological problems with intrinsic directional axes into a graph network formalism with an embedded directional field.
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.