Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.
A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to $\textit{classify}$ between data samples and samples from some reference distribution. These techniques are successful in simple low-dimensional settings but fail to achieve good results on complex high-dimensional data, like images. A different family of methods for learning distributions is that of denoising diffusion models (DDMs), in which a model is trained to $\textit{denoise}$ data samples. These approaches achieve state-of-the-art results in image, video, and audio generation. In this work, we present $\textit{Classification Diffusion Models}$ (CDMs), a generative technique that adopts the denoising-based formalism of DDMs while making use of a classifier that predicts the amount of noise added to a clean signal, similarly to DRE methods. Our approach is based on the observation that an MSE-optimal denoiser for white Gaussian noise can be expressed in terms of the gradient of a cross-entropy-optimal classifier for predicting the noise level. As we illustrate, CDM achieves better denoising results compared to DDM, and leads to at least comparable FID in image generation. CDM is also capable of highly efficient one-step exact likelihood estimation, achieving state-of-the-art results among methods that use a single step. Code is available on the project's webpage in //shaharYadin.github.io/CDM/ .
Let $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ be an infinite sequence of families of compact connected sets in $\mathbb{R}^{d}$. An infinite sequence of compact connected sets $\left\{ B_{n} \right\}_{n\in \mathbb{N}}$ is called heterochromatic sequence from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ if there exists an infinite sequence $\left\{ i_{n} \right\}_{n\in \mathbb{N}}$ of natural numbers satisfying the following two properties: (a) $\{i_{n}\}_{n\in \mathbb{N}}$ is a monotonically increasing sequence, and (b) for all $n \in \mathbb{N}$, we have $B_{n} \in \mathcal{F}_{i_n}$. We show that if every heterochromatic sequence from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ contains $d+1$ sets that can be pierced by a single hyperplane then there exists a finite collection $\mathcal{H}$ of hyperplanes from $\mathbb{R}^{d}$ that pierces all but finitely many families from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$. As a direct consequence of our result, we get that if every countable subcollection from an infinite family $\mathcal{F}$ of compact connected sets in $\mathbb{R}^{d}$ contains $d+1$ sets that can be pierced by a single hyperplane then $\mathcal{F}$ can be pierced by finitely many hyperplanes. To establish the optimality of our result we show that, for all $d \in \mathbb{N}$, there exists an infinite sequence $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ of families of compact connected sets satisfying the following two conditions: (1) for all $n \in \mathbb{N}$, $\mathcal{F}_{n}$ is not pierceable by finitely many hyperplanes, and (2) for any $m \in \mathbb{N}$ and every sequence $\left\{B_n\right\}_{n=m}^{\infty}$ of compact connected sets in $\mathbb{R}^d$, where $B_i\in\mathcal{F}_i$ for all $i \geq m$, there exists a hyperplane in $\mathbb{R}^d$ that pierces at least $d+1$ sets in the sequence.
Consider a following NP-problem DOUBLE CLIQUE (abbr.: CLIQ$_{2}$): Given a natural number $k>2$ and a pair of two disjoint subgraphs of a fixed graph $G$ decide whether each subgraph in question contains a $k$-clique. I prove that CLIQ$_{2}$ can't be solved in polynomial time by a deterministic TM, which infers $\mathbf{P}\neq \mathbf{NP}$. This proof upgrades the well-known proof of polynomial unsolvability of the partial result with respect to analogous monotone problem CLIQUE (abbr.: CLIQ) as well as my previous presentation that used appropriate 3-value semantics. Note that problem CLIQ$_{2}$ is not monotone and appears more complex than just iterated CLIQ, as the required subgraphs are mutually dependent.
Set reconciliation, where two parties hold fixed-length bit strings and run a protocol to learn the strings they are missing from each other, is a fundamental task in many distributed systems. We present Rateless Invertible Bloom Lookup Tables (Rateless IBLT), the first set reconciliation protocol, to the best of our knowledge, that achieves low computation cost and near-optimal communication cost across a wide range of scenarios: set differences of one to millions, bit strings of a few bytes to megabytes, and workloads injected by potential adversaries. Rateless IBLT is based on a novel encoder that incrementally encodes the set difference into an infinite stream of coded symbols, resembling rateless error-correcting codes. We compare Rateless IBLT with state-of-the-art set reconciliation schemes and demonstrate significant improvements. Rateless IBLT achieves 3--4x lower communication cost than non-rateless schemes with similar computation cost, and 2--2000x lower computation cost than schemes with similar communication cost. We show the real-world benefits of Rateless IBLT by applying it to synchronize the state of the Ethereum blockchain, and demonstrate 5.6x lower end-to-end completion time and 4.4x lower communication cost compared to the system used in production.
The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.
Differential privacy (DP) offers a theoretical upper bound on the potential privacy leakage of analgorithm, while empirical auditing establishes a practical lower bound. Auditing techniques exist forDP training algorithms. However machine learning can also be made private at inference. We propose thefirst framework for auditing private prediction where we instantiate adversaries with varying poisoningand query capabilities. This enables us to study the privacy leakage of four private prediction algorithms:PATE [Papernot et al., 2016], CaPC [Choquette-Choo et al., 2020], PromptPATE [Duan et al., 2023],and Private-kNN [Zhu et al., 2020]. To conduct our audit, we introduce novel techniques to empiricallyevaluate privacy leakage in terms of Renyi DP. Our experiments show that (i) the privacy analysis ofprivate prediction can be improved, (ii) algorithms which are easier to poison lead to much higher privacyleakage, and (iii) the privacy leakage is significantly lower for adversaries without query control than thosewith full control.
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.