A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model -- long the standard method for censored regression in economics -- has not been adapted for high-dimensional regression at all. To fill this gap and bring up-to-date techniques from high-dimensional statistics to the field of high-dimensional left-censored regression, we propose several penalized Tobit models. We develop a fast algorithm which combines quadratic minimization with coordinate descent to compute the penalized Tobit solution path. Theoretically, we analyze the Tobit lasso and Tobit with a folded concave penalty, bounding the $\ell_2$ estimation loss for the former and proving that a local linear approximation estimator for the latter possesses the strong oracle property. Through an extensive simulation study, we find that our penalized Tobit models provide more accurate predictions and parameter estimates than other methods. We use a penalized Tobit model to analyze high-dimensional left-censored HIV viral load data from the AIDS Clinical Trials Group and identify potential drug resistance mutations in the HIV genome. Appendices contain intermediate theoretical results and technical proofs.
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characteristics of these methodologies that carry potential risks to both safety and fundamental rights. Although there are mechanisms in the adoption process to minimize these risks (e.g., safety regulations), these do not exclude the possibility of harm occurring, and if this happens, victims should be able to seek compensation. Liability regimes will therefore play a key role in ensuring basic protection for victims using or interacting with these systems. However, the same characteristics that make AI systems inherently risky, such as lack of causality, opacity, unpredictability or their self and continuous learning capabilities, may lead to considerable difficulties when it comes to proving causation. This paper presents three case studies, as well as the methodology to reach them, that illustrate these difficulties. Specifically, we address the cases of cleaning robots, delivery drones and robots in education. The outcome of the proposed analysis suggests the need to revise liability regimes to alleviate the burden of proof on victims in cases involving AI technologies.
Approximate message passing (AMP) emerges as an effective iterative paradigm for solving high-dimensional statistical problems. However, prior AMP theory -- which focused mostly on high-dimensional asymptotics -- fell short of predicting the AMP dynamics when the number of iterations surpasses $o\big(\frac{\log n}{\log\log n}\big)$ (with $n$ the problem dimension). To address this inadequacy, this paper develops a non-asymptotic framework for understanding AMP in spiked matrix estimation. Built upon new decomposition of AMP updates and controllable residual terms, we lay out an analysis recipe to characterize the finite-sample behavior of AMP in the presence of an independent initialization, which is further generalized to allow for spectral initialization. As two concrete consequences of the proposed analysis recipe: (i) when solving $\mathbb{Z}_2$ synchronization, we predict the behavior of spectrally initialized AMP for up to $O\big(\frac{n}{\mathrm{poly}\log n}\big)$ iterations, showing that the algorithm succeeds without the need of a subsequent refinement stage (as conjectured recently by \citet{celentano2021local}); (ii) we characterize the non-asymptotic behavior of AMP in sparse PCA (in the spiked Wigner model) for a broad range of signal-to-noise ratio.
Although understanding and characterizing causal effects have become essential in observational studies, it is challenging when the confounders are high-dimensional. In this article, we develop a general framework $\textit{CausalEGM}$ for estimating causal effects by encoding generative modeling, which can be applied in both binary and continuous treatment settings. Under the potential outcome framework with unconfoundedness, we establish a bidirectional transformation between the high-dimensional confounders space and a low-dimensional latent space where the density is known (e.g., multivariate normal distribution). Through this, CausalEGM simultaneously decouples the dependencies of confounders on both treatment and outcome and maps the confounders to the low-dimensional latent space. By conditioning on the low-dimensional latent features, CausalEGM can estimate the causal effect for each individual or the average causal effect within a population. Our theoretical analysis shows that the excess risk for CausalEGM can be bounded through empirical process theory. Under an assumption on encoder-decoder networks, the consistency of the estimate can be guaranteed. In a series of experiments, CausalEGM demonstrates superior performance over existing methods for both binary and continuous treatments. Specifically, we find CausalEGM to be substantially more powerful than competing methods in the presence of large sample sizes and high dimensional confounders. The software of CausalEGM is freely available at //github.com/SUwonglab/CausalEGM.
The generalized coloring numbers of Kierstead and Yang (Order 2003) offer an algorithmically-useful characterization of graph classes with bounded expansion. In this work, we consider the hardness and approximability of these parameters. First, we complete the work of Grohe et al. (WG 2015) by showing that computing the weak 2-coloring number is NP-hard. Our approach further establishes that determining if a graph has weak $r$-coloring number at most $k$ is para-NP-hard when parameterized by $k$ for all $r \geq 2$. We adapt this to determining if a graph has $r$-coloring number at most $k$ as well, proving para-NP-hardness for all $r \geq 2$. Para-NP-hardness implies that no XP algorithm (runtime $O(n^{f(k)})$) exists for testing if a generalized coloring number is at most $k$. Moreover, there exists a constant $c$ such that it is NP-hard to approximate the generalized coloring numbers within a factor of $c$. To complement these results, we give an approximation algorithm for the generalized coloring numbers, improving both the runtime and approximation factor of the existing approach of Dvo\v{r}\'{a}k (EuJC 2013). We prove that greedily ordering vertices with small estimated backconnectivity achieves a $(k-1)^{r-1}$-approximation for the $r$-coloring number and an $O(k^{r-1})$-approximation for the weak $r$-coloring number.
Counterfactual explanations are an increasingly popular form of post hoc explanation due to their (i) applicability across problem domains, (ii) proposed legal compliance (e.g., with GDPR), and (iii) reliance on the contrastive nature of human explanation. Although counterfactual explanations are normally used to explain individual predictive-instances, we explore a novel use case in which groups of similar instances are explained in a collective fashion using ``group counterfactuals'' (e.g., to highlight a repeating pattern of illness in a group of patients). These group counterfactuals meet a human preference for coherent, broad explanations covering multiple events/instances. A novel, group-counterfactual algorithm is proposed to generate high-coverage explanations that are faithful to the to-be-explained model. This explanation strategy is also evaluated in a large, controlled user study (N=207), using objective (i.e., accuracy) and subjective (i.e., confidence, explanation satisfaction, and trust) psychological measures. The results show that group counterfactuals elicit modest but definite improvements in people's understanding of an AI system. The implications of these findings for counterfactual methods and for XAI are discussed.
The linear regression model is widely used in the biomedical and social sciences as well as in policy and business research to adjust for covariates and estimate the average effects of treatments. Behind every causal inference endeavor there is at least a notion of a randomized experiment. However, in routine regression analyses in observational studies, it is unclear how well the adjustments made by regression approximate key features of randomization experiments, such as covariate balance, study representativeness, sample boundedness, and unweighted sampling. In this paper, we provide software to empirically address this question. In the new lmw package for R, we compute the implied linear model weights for average treatment effects and provide diagnostics for them. The weights are obtained as part of the design stage of the study; that is, without using outcome information. The implementation is general and applicable, for instance, in settings with instrumental variables and multi-valued treatments; in essence, in any situation where the linear model is the vehicle for adjustment and estimation of average treatment effects with discrete-valued interventions.
Combining extreme value theory with Bayesian methods offers several advantages, such as a quantification of uncertainty on parameter estimation or the ability to study irregular models that cannot be handled by frequentist statistics. However, it comes with many options that are left to the user concerning model building, computational algorithms, and even inference itself. Among them, the parameterization of the model induces a geometry that can alter the efficiency of computational algorithms, in addition to making calculations involved. We focus on the Poisson process characterization of extremes and outline two key benefits of an orthogonal parameterization addressing both issues. First, several diagnostics show that Markov chain Monte Carlo convergence is improved compared with the original parameterization. Second, orthogonalization also helps deriving Jeffreys and penalized complexity priors, and establishing posterior propriety. The analysis is supported by simulations, and our framework is then applied to extreme level estimation on river flow data.
Following the breakthrough work of Tardos in the bit-complexity model, Vavasis and Ye gave the first exact algorithm for linear programming in the real model of computation with running time depending only on the constraint matrix. For solving a linear program (LP) $\max\, c^\top x,\: Ax = b,\: x \geq 0,\: A \in \mathbb{R}^{m \times n}$, Vavasis and Ye developed a primal-dual interior point method using a 'layered least squares' (LLS) step, and showed that $O(n^{3.5} \log (\bar{\chi}_A+n))$ iterations suffice to solve (LP) exactly, where $\bar{\chi}_A$ is a condition measure controlling the size of solutions to linear systems related to $A$. Monteiro and Tsuchiya, noting that the central path is invariant under rescalings of the columns of $A$ and $c$, asked whether there exists an LP algorithm depending instead on the measure $\bar{\chi}^*_A$, defined as the minimum $\bar{\chi}_{AD}$ value achievable by a column rescaling $AD$ of $A$, and gave strong evidence that this should be the case. We resolve this open question affirmatively. Our first main contribution is an $O(m^2 n^2 + n^3)$ time algorithm which works on the linear matroid of $A$ to compute a nearly optimal diagonal rescaling $D$ satisfying $\bar{\chi}_{AD} \leq n(\bar{\chi}^*)^3$. This algorithm also allows us to approximate the value of $\bar{\chi}_A$ up to a factor $n (\bar{\chi}^*)^2$. As our second main contribution, we develop a scaling invariant LLS algorithm, together with a refined potential function based analysis for LLS algorithms in general. With this analysis, we derive an improved $O(n^{2.5} \log n\log (\bar{\chi}^*_A+n))$ iteration bound for optimally solving (LP) using our algorithm. The same argument also yields a factor $n/\log n$ improvement on the iteration complexity bound of the original Vavasis-Ye algorithm.
Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.