A central problem in Binary Hypothesis Testing (BHT) is to determine the optimal tradeoff between the Type I error (referred to as false alarm) and Type II (referred to as miss) error. In this context, the exponential rate of convergence of the optimal miss error probability -- as the sample size tends to infinity -- given some (positive) restrictions on the false alarm probabilities is a fundamental question to address in theory. Considering the more realistic context of a BHT with a finite number of observations, this paper presents a new non-asymptotic result for the scenario with monotonic (sub-exponential decreasing) restriction on the Type I error probability, which extends the result presented by Strassen in 2009. Building on the use of concentration inequalities, we offer new upper and lower bounds to the optimal Type II error probability for the case of finite observations. Finally, the derived bounds are evaluated and interpreted numerically (as a function of the number samples) for some vanishing Type I error restrictions.
We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.
Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful family of approximations are Kronecker-Factored, block-diagonal curvature estimates (KFAC). Here, we combine tools from prior work to evaluate exact second-order updates with careful ablations to establish a surprising result: Due to its approximations, KFAC is not closely related to second-order updates, and in particular, it significantly outperforms true second-order updates. This challenges widely held believes and immediately raises the question why KFAC performs so well. We answer this question by showing that KFAC approximates a first-order algorithm, which performs gradient descent on neurons rather than weights. Finally, we show that this optimizer often improves over KFAC in terms of computational cost and data-efficiency.
We prove two theorems related to the Central Limit Theorem (CLT) for Martin-L\"of Random (MLR) sequences. Martin-L\"of randomness attempts to capture what it means for a sequence of bits to be "truly random". By contrast, CLTs do not make assertions about the behavior of a single random sequence, but only on the distributional behavior of a sequence of random variables. Semantically, we usually interpret CLTs as assertions about the collective behavior of infinitely many sequences. Yet, our intuition is that if a sequence of bits is "truly random", then it should provide a "source of randomness" for which CLT-type results should hold. We tackle this difficulty by using a sampling scheme that generates an infinite number of samples from a single binary sequence. We show that when we apply this scheme to a Martin-L\"of random sequence, the empirical moments and cumulative density functions (CDF) of these samples tend to their corresponding counterparts for the normal distribution. We also prove the well known almost sure central limit theorem (ASCLT), which provides an alternative, albeit less intuitive, answer to this question. Both results are also generalized for Schnorr random sequences.
This paper presents an algorithm to generate a new kind of polygonal mesh obtained from triangulations. Each polygon is built from a terminal-edge region surrounded by edges that are not the longest-edge of any of the two triangles that share them. The algorithm is divided into three phases. The first phase consists of labeling each edge and triangle of the input triangulation according to its size; the second phase builds polygons (simple or not) from terminal-edges regions using the label system; and the third phase transforms each non simple polygon into simple ones. The final mesh contains polygons with convex and nonconvex shape. Since Voronoi based meshes are currently the most used polygonal meshes, we compare some geometric properties of our meshes against constrained Voronoi meshes. Several experiments are run to compare the shape and size of polygons, the number of final mesh points and polygons. Finally, we validate these polygonal meshes by solving a Laplace equation on an L-shaped domain using the Virtual Element Method (VEM) and show the optimal convergence rate of the numerical solution.
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any time period afterwards. In the presence of data measured over time, Bayesian time series models have been used to impute what would have happened after the policy was initiated, had the policy not taken place, in order to estimate causal effects. However, the considerations regarding the definition of the target estimands, the underlying assumptions, the plausibility of such assumptions, and the choice of an appropriate model have not been thoroughly investigated. In this paper, we establish useful estimands for the evaluation of large-scale policies. We discuss that imputation of missing potential outcomes relies on an assumption which, even though untestable, can be partially evaluated using observed data. We illustrate an approach to evaluate this key causal assumption and facilitate model elicitation based on data from the time interval before policy initiation and using classic statistical techniques. As an illustration, we study the Hospital Readmissions Reduction Program (HRRP), a US federal intervention aiming to improve health outcomes for patients with pneumonia, acute myocardial infraction, or congestive failure admitted to a hospital. We evaluate the effect of the HRRP on population mortality across the US and in four geographic subregions, and at different time windows. We find that the HRRP increased mortality from the three targeted conditions across most scenarios considered, and is likely to have had a detrimental effect on public health.
In this paper, we introduce a novel family of iterative algorithms which carry out $\alpha$-divergence minimisation in a Variational Inference context. They do so by ensuring a systematic decrease at each step in the $\alpha$-divergence between the variational and the posterior distributions. In its most general form, the variational distribution is a mixture model and our framework allows us to simultaneously optimise the weights and components parameters of this mixture model. Notably, our approach permits to build on various methods previously proposed for $\alpha$-divergence minimisation such as Gradient or Power Descent schemes and we also shed a new light on an integrated Expectation Maximization algorithm. Lastly, we provide empirical evidence that our methodology yields improved results on several multimodal target distributions.
We study FO+, a fragment of first-order logic on finite words, where monadic predicates can only appear positively. We show that there is an FO-definable language that is monotone in monadic predicates but not definable in FO+. This provides a simple proof that Lyndon's preservation theorem fails on finite structures. We lift this example language to finite graphs, thereby providing a new result of independent interest for FO-definable graph classes: negation might be needed even when the class is closed under addition of edges. We finally show that given a regular language of finite words, it is undecidable whether it is definable in FO+.
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $\mathcal{O}(\sqrt{V_1^\star K})$ in reinforcement learning with large state spaces, namely the linear MDP setting. Here $V_1^\star$ is the value of the optimal policy and $K$ is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.
We study the problem of estimating the diagonal of an implicitly given matrix $A$. For such a matrix we have access to an oracle that allows us to evaluate the matrix vector product $Av$. For random variable $v$ drawn from an appropriate distribution, this may be used to return an estimate of the diagonal of the matrix $A$. Whilst results exist for probabilistic guarantees relating to the error of estimates of the trace of $A$, no such results have yet been derived for the diagonal. We analyse the number of queries $s$ required to guarantee that with probability at least $1-\delta$ the estimates of the relative error of the diagonal entries is at most $\varepsilon$. We extend this analysis to the 2-norm of the difference between the estimate and the diagonal of $A$. We prove, discuss and experiment with bounds on the number of queries $s$ required to guarantee a probabilistic bound on the estimates of the diagonal by employing Rademacher and Gaussian random variables. Two sufficient upper bounds on the minimum number of query vectors are proved, extending the work of Avron and Toledo [JACM 58(2)8, 2011], and later work of Roosta-Khorasani and Ascher [FoCM 15, 1187-1212, 2015]. We find that, generally, there is little difference between the two, with convergence going as $O(\log(1/\delta)/\varepsilon^2)$ for individual diagonal elements. However for small $s$, we find that the Rademacher estimator is superior. These results allow us to then extend the ideas of Meyer, Musco, Musco and Woodruff [SOSA, 142-155, 2021], suggesting algorithm Diag++, to speed up the convergence of diagonal estimation from $O(1/\varepsilon^2)$ to $O(1/\varepsilon)$ and make it robust to the spectrum of any positive semi-definite matrix $A$.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.