This work considers the problem of the noisy binary search in a sorted array. The noise is modeled by a parameter $p$ that dictates that a comparison can be incorrect with probability $p$, independently of other queries. We state two types of upper bounds on the number of queries: the worst-case and expected query complexity scenarios. The bounds improve the ones known to date, i.e., our algorithms require fewer queries. Additionally, they have simpler statements, and work for the full range of parameters. All query complexities for the expected query scenarios are tight up to lower order terms. For the problem where target prior is uniform over all possible inputs, we provide algorithm with expected complexity upperbounded by $(\log_2 n + \log_2 \delta^{-1} + 3)/I(p)$, where $n$ is the domain size, $0\le p < 1/2$ is the noise ratio, and $\delta>0$ is the failure probability, and $I(p)$ is the information gain function. As a side-effect, we close some correctness issues regarding previous work. Also, en route, we obtain new and improved query complexities for the search generalized to arbitrary graphs. This paper continues and improves upon the lines of research of Burnashev-Zigangirov [Prob. Per. Informatsii, 1974], Ben-Or and Hassidim [FOCS 2008], Gu and Xu [STOC 2023], and Emamjomeh-Zadeh et al. [STOC 2016], Dereniowski et al. [SOSA@SODA 2019].
Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
We consider network-based decentralized optimization problems, where each node in the network possesses a local function and the objective is to collectively attain a consensus solution that minimizes the sum of all the local functions. A major challenge in decentralized optimization is the reliance on communication which remains a considerable bottleneck in many applications. To address this challenge, we propose an adaptive randomized communication-efficient algorithmic framework that reduces the volume of communication by periodically tracking the disagreement error and judiciously selecting the most influential and effective edges at each node for communication. Within this framework, we present two algorithms: Adaptive Consensus (AC) to solve the consensus problem and Adaptive Consensus based Gradient Tracking (AC-GT) to solve smooth strongly convex decentralized optimization problems. We establish strong theoretical convergence guarantees for the proposed algorithms and quantify their performance in terms of various algorithmic parameters under standard assumptions. Finally, numerical experiments showcase the effectiveness of the framework in significantly reducing the information exchange required to achieve a consensus solution.
A graph G is a multi-interval PCG if there exist an edge weighted tree T with non-negative real values and disjoint intervals of the non-negative real half-line such that each node of G is uniquely associated to a leaf of T and there is an edge between two nodes in G if and only if the weighted distance between their corresponding leaves in T lies within any such intervals. If the number of intervals is k, then we call the graph a k-interval-PCG; in symbols, G = k-interval-PCG (T, I1, . . . , Ik). It is known that 2-interval-PCGs do not contain all graphs and the smallest known graph outside this class has 135 nodes. Here we prove that all graphs with at most 8 nodes are 2-interval-PCGs, so doing one step towards the determination of the smallest value of n such that there exists an n node graph that is not a 2-interval-PCG.
This work considers Bayesian experimental design for the inverse boundary value problem of linear elasticity in a two-dimensional setting. The aim is to optimize the positions of compactly supported pressure activations on the boundary of the examined body in order to maximize the value of the resulting boundary deformations as data for the inverse problem of reconstructing the Lam\'e parameters inside the object. We resort to a linearized measurement model and adopt the framework of Bayesian experimental design, under the assumption that the prior and measurement noise distributions are mutually independent Gaussians. This enables the use of the standard Bayesian A-optimality criterion for deducing optimal positions for the pressure activations. The (second) derivatives of the boundary measurements with respect to the Lam\'e parameters and the positions of the boundary pressure activations are deduced to allow minimizing the corresponding objective function, i.e., the trace of the covariance matrix of the posterior distribution, by a gradient-based optimization algorithm. Two-dimensional numerical experiments are performed to demonstrate the functionality of our approach.
Challenges with data in the big-data era include (i) the dimension $p$ is often larger than the sample size $n$ (ii) outliers or contaminated points are frequently hidden and more difficult to detect. Challenge (i) renders most conventional methods inapplicable. Thus, it attracts tremendous attention from statistics, computer science, and bio-medical communities. Numerous penalized regression methods have been introduced as modern methods for analyzing high-dimensional data. Disproportionate attention has been paid to the challenge (ii) though. Penalized regression methods can do their job very well and are expected to handle the challenge (ii) simultaneously. Most of them, however, can break down by a single outlier (or single adversary contaminated point) as revealed in this article. The latter systematically examines leading penalized regression methods in the literature in terms of their robustness, provides quantitative assessment, and reveals that most of them can break down by a single outlier. Consequently, a novel robust penalized regression method based on the least sum of squares of depth trimmed residuals is proposed and studied carefully. Experiments with simulated and real data reveal that the newly proposed method can outperform some leading competitors in estimation and prediction accuracy in the cases considered.
A $hole$ is an induced cycle of length at least four, and an odd hole is a hole of odd length. A {\em fork} is a graph obtained from $K_{1,3}$ by subdividing an edge once. An {\em odd balloon} is a graph obtained from an odd hole by identifying respectively two consecutive vertices with two leaves of $K_{1, 3}$. A {\em gem} is a graph that consists of a $P_4$ plus a vertex adjacent to all vertices of the $P_4$. A {\em butterfly} is a graph obtained from two traingles by sharing exactly one vertex. A graph $G$ is perfectly divisible if for each induced subgraph $H$ of $G$, $V(H)$ can be partitioned into $A$ and $B$ such that $H[A]$ is perfect and $\omega(H[B])<\omega(H)$. In this paper, we show that (odd balloon, fork)-free graphs are perfectly divisible (this generalizes some results of Karthick {\em et al}). As an application, we show that $\chi(G)\le\binom{\omega(G)+1}{2}$ if $G$ is (fork, gem)-free or (fork, butterfly)-free.
We consider the minimal thermodynamic cost of an individual computation, where a single input $x$ is mapped to a single output $y$. In prior work, Zurek proposed that this cost was given by $K(x\vert y)$, the conditional Kolmogorov complexity of $x$ given $y$ (up to an additive constant which does not depend on $x$ or $y$). However, this result was derived from an informal argument, applied only to deterministic computations, and had an arbitrary dependence on the choice of protocol (via the additive constant). Here we use stochastic thermodynamics to derive a generalized version of Zurek's bound from a rigorous Hamiltonian formulation. Our bound applies to all quantum and classical processes, whether noisy or deterministic, and it explicitly captures the dependence on the protocol. We show that $K(x\vert y)$ is a minimal cost of mapping $x$ to $y$ that must be paid using some combination of heat, noise, and protocol complexity, implying a tradeoff between these three resources. Our result is a kind of "algorithmic fluctuation theorem" with implications for the relationship between the Second Law and the Physical Church-Turing thesis.
We consider problems of minimizing functionals $\mathcal{F}$ of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for $\mathcal{F}$ based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.
The history of the seemingly simple problem of straight line fitting in the presence of both $x$ and $y$ errors has been fraught with misadventure, with statistically ad hoc and poorly tested methods abounding in the literature. The problem stems from the emergence of latent variables describing the "true" values of the independent variables, the priors on which have a significant impact on the regression result. By analytic calculation of maximum a posteriori values and biases, and comprehensive numerical mock tests, we assess the quality of possible priors. In the presence of intrinsic scatter, the only prior that we find to give reliably unbiased results in general is a mixture of one or more Gaussians with means and variances determined as part of the inference. We find that a single Gaussian is typically sufficient and dub this model Marginalised Normal Regression (MNR). We illustrate the necessity for MNR by comparing it to alternative methods on an important linear relation in cosmology, and extend it to nonlinear regression and an arbitrary covariance matrix linking $x$ and $y$. We publicly release a Python/Jax implementation of MNR and its Gaussian mixture model extension that is coupled to Hamiltonian Monte Carlo for efficient sampling, which we call ROXY (Regression and Optimisation with X and Y errors).
Every large $k$-connected graph-minor induces a $k$-tangle in its ambient graph. The converse holds for $k\le 3$, but fails for $k\ge 4$. This raises the question whether `$k$-connected' can be relaxed to obtain a characterisation of $k$-tangles through highly cohesive graph-minors. We show that this can be achieved for $k=4$ by proving that internally 4-connected graphs have unique 4-tangles, and that every graph with a 4-tangle $\tau$ has an internally 4-connected minor whose unique 4-tangle lifts to~$\tau$.