We propose an $\ell_1$-penalized estimator for high-dimensional models of Expected Shortfall (ES). The estimator is obtained as the solution to a least-squares problem for an auxiliary dependent variable, which is defined as a transformation of the dependent variable and a pre-estimated tail quantile. Leveraging a sparsity condition, we derive a nonasymptotic bound on the prediction and estimator errors of the ES estimator, accounting for the estimation error in the dependent variable, and provide conditions under which the estimator is consistent. Our estimator is applicable to heavy-tailed time-series data and we find that the amount of parameters in the model may grow with the sample size at a rate that depends on the dependence and heavy-tailedness in the data. In an empirical application, we consider the systemic risk measure CoES and consider a set of regressors that consists of nonlinear transformations of a set of state variables. We find that the nonlinear model outperforms an unpenalized and untransformed benchmark considerably.
Inspired by the concept of fault tolerance quantum computation, this article proposes a framework dubbed Exact Homomorphic Encryption, EHE, enabling exact computations on encrypted data without the need for pre-decryption. The introduction of quantum gates is a critical step for constructing the message encryption and the computation encryption within the framework. Of significance is that both encryptions are respectively accomplished in a multivariate polynomial set generated by quantum gates. Two fundamental traits of quantum gates the invertibility and the noncommutativity, establish the success of EHE. The encrypted computation is exact because its encryption transformation is conducted with invertible gates. In the same vein, decryptions for both an encrypted message and encrypted computation are exact. The second trait of noncommutativity among applied quantum gates brings forth the security for the two encryptions. Toward the message encryption, a plaintext is encoded into a ciphertext via a polynomial set generated by a product of noncommuting gates randomly chosen. In the computation encryption, a desired operation is encoded into an encrypted polynomial set generated by another product of noncommuting gates. The encrypted computation is then the evaluation of the encrypted polynomial set on the ciphertext and is referred to as the cryptovaluation. EHE is not only attainable on quantum computers, but also straightforwardly realizable on traditional computing environments. Surpassing the standard security 2^128 of quantum resilience, both the encryptions further reach a security greater than the suggested threshold 2^1024 and are characterized as hyper quantum-resilient. Thanks to the two essential traits of quantum gates, this framework can be regarded as the initial tangible manifestation of the concept noncommutative cryptography.
Graph is a fundamental mathematical structure in characterizing relations between different objects and has been widely used on various learning tasks. Most methods implicitly assume a given graph to be accurate and complete. However, real data is inevitably noisy and sparse, which will lead to inferior results. Despite the remarkable success of recent graph representation learning methods, they inherently presume that the graph is homophilic, and largely overlook heterophily, where most connected nodes are from different classes. In this regard, we propose a novel robust graph structure learning method to achieve a high-quality graph from heterophilic data for downstream tasks. We first apply a high-pass filter to make each node more distinctive from its neighbors by encoding structure information into the node features. Then, we learn a robust graph with an adaptive norm characterizing different levels of noise. Afterwards, we propose a novel regularizer to further refine the graph structure. Clustering and semi-supervised classification experiments on heterophilic graphs verify the effectiveness of our method.
Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue of finding, differentiating, and understanding models is becoming more crucial. Inspired from research on data lakes, we introduce and define the concept of model lakes. We discuss fundamental research challenges in the management of large models. And we discuss what principled data management techniques can be brought to bear on the study of large model management.
Federated Learning (FL) is a popular algorithm to train machine learning models on user data constrained to edge devices (for example, mobile phones) due to privacy concerns. Typically, FL is trained with the assumption that no part of the user data can be egressed from the edge. However, in many production settings, specific data-modalities/meta-data are limited to be on device while others are not. For example, in commercial SLU systems, it is typically desired to prevent transmission of biometric signals (such as audio recordings of the input prompt) to the cloud, but egress of locally (i.e. on the edge device) transcribed text to the cloud may be possible. In this work, we propose a new algorithm called Partial Federated Learning (PartialFL), where a machine learning model is trained using data where a subset of data modalities or their intermediate representations can be made available to the server. We further restrict our model training by preventing the egress of data labels to the cloud for better privacy, and instead use a contrastive learning based model objective. We evaluate our approach on two different multi-modal datasets and show promising results with our proposed approach.
Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learnt from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not) as well as infinite model spaces, which directly generalize classical results.
Cross Attention is a popular method for retrieving information from a set of context tokens for making predictions. At inference time, for each prediction, Cross Attention scans the full set of $\mathcal{O}(N)$ tokens. In practice, however, often only a small subset of tokens are required for good performance. Methods such as Perceiver IO are cheap at inference as they distill the information to a smaller-sized set of latent tokens $L < N$ on which cross attention is then applied, resulting in only $\mathcal{O}(L)$ complexity. However, in practice, as the number of input tokens and the amount of information to distill increases, the number of latent tokens needed also increases significantly. In this work, we propose Tree Cross Attention (TCA) - a module based on Cross Attention that only retrieves information from a logarithmic $\mathcal{O}(\log(N))$ number of tokens for performing inference. TCA organizes the data in a tree structure and performs a tree search at inference time to retrieve the relevant tokens for prediction. Leveraging TCA, we introduce ReTreever, a flexible architecture for token-efficient inference. We show empirically that Tree Cross Attention (TCA) performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient. Furthermore, we compare ReTreever against Perceiver IO, showing significant gains while using the same number of tokens for inference.
Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness. In practice, attack algorithms are artificially selected and tuned by human experts to break a ML system. However, manual selection of attackers tends to be sub-optimal, leading to a mistakenly assessment of model security. In this paper, a new procedure called Composite Adversarial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms and their hyper-parameters from a candidate pool of \textbf{32 base attackers}. We design a search space where attack policy is represented as an attacking sequence, i.e., the output of the previous attacker is used as the initialization input for successors. Multi-objective NSGA-II genetic algorithm is adopted for finding the strongest attack policy with minimum complexity. The experimental result shows CAA beats 10 top attackers on 11 diverse defenses with less elapsed time (\textbf{6 $\times$ faster than AutoAttack}), and achieves the new state-of-the-art on $l_{\infty}$, $l_{2}$ and unrestricted adversarial attacks.
Embedding models for deterministic Knowledge Graphs (KG) have been extensively studied, with the purpose of capturing latent semantic relations between entities and incorporating the structured knowledge into machine learning. However, there are many KGs that model uncertain knowledge, which typically model the inherent uncertainty of relations facts with a confidence score, and embedding such uncertain knowledge represents an unresolved challenge. The capturing of uncertain knowledge will benefit many knowledge-driven applications such as question answering and semantic search by providing more natural characterization of the knowledge. In this paper, we propose a novel uncertain KG embedding model UKGE, which aims to preserve both structural and uncertainty information of relation facts in the embedding space. Unlike previous models that characterize relation facts with binary classification techniques, UKGE learns embeddings according to the confidence scores of uncertain relation facts. To further enhance the precision of UKGE, we also introduce probabilistic soft logic to infer confidence scores for unseen relation facts during training. We propose and evaluate two variants of UKGE based on different learning objectives. Experiments are conducted on three real-world uncertain KGs via three tasks, i.e. confidence prediction, relation fact ranking, and relation fact classification. UKGE shows effectiveness in capturing uncertain knowledge by achieving promising results on these tasks, and consistently outperforms baselines on these tasks.
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.