Sorting is the task of ordering $n$ elements using pairwise comparisons. It is well known that $m=\Theta(n\log n)$ comparisons are both necessary and sufficient when the outcomes of the comparisons are observed with no noise. In this paper, we study the sorting problem in the presence of noise. Unlike the common approach in the literature which aims to minimize the number of pairwise comparisons $m$ to achieve a given desired error probability, our goal is to characterize the maximal ratio $\frac{n\log n}{m}$ such that the ordering of the elements can be estimated with a vanishing error probability asymptotically. The maximal ratio is referred to as the noisy sorting capacity. In this work, we derive upper and lower bounds on the noisy sorting capacity. The algorithm that attains the lower bound is based on the well-known Burnashev--Zigangirov algorithm for coding over channels with feedback. By comparing with existing algorithms in the literature under the proposed framework, we show that our algorithm can achieve a strictly larger ratio asymptotically.
A directed graph is oriented if it can be obtained by orienting the edges of a simple, undirected graph. For an oriented graph $G$, let $\beta(G)$ denote the size of a minimum feedback arc set, a smallest subset of edges whose deletion leaves an acyclic subgraph. A simple consequence of a result of Berger and Shor is that any oriented graph $G$ with $m$ edges satisfies $\beta(G) = m/2 - \Omega(m^{3/4})$. We observe that if an oriented graph $G$ has a fixed forbidden subgraph $B$, the upper bound of $\beta(G) = m/2 - \Omega(m^{3/4})$ is best possible as a function of the number of edges if $B$ is not bipartite, but the exponent $3/4$ in the lower order term can be improved if $B$ is bipartite. We also show that for every rational number $r$ between $3/4$ and $1$, there is a finite collection of digraphs $\mathcal{B}$ such that every $\mathcal{B}$-free digraph $G$ with $m$ edges satisfies $\beta(G) = m/2 - \Omega(m^r)$, and this bound is best possible up to the implied constant factor. The proof uses a connection to Tur\'an numbers and a result of Bukh and Conlon. Both of our upper bounds come equipped with randomized linear-time algorithms that construct feedback arc sets achieving those bounds. Finally, we give a characterization of quasirandom directed graphs via minimum feedback arc sets.
Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption as the ground truth for each audio. However, it leads to a significant variation in the optimization directions and weakens the model stability. To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions. We conducted extensive experiments on two datasets using four commonly used encoder and decoder architectures. Experimental results demonstrate the effectiveness of the proposed method. The code is available at //github.com/PRIS-CV/Caption-Feature-Space-Regularization.
We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.
Let ${\mathcal M}\subset {\mathbb R}^n$ be a $C^2$-smooth compact submanifold of dimension $d$. Assume that the volume of ${\mathcal M}$ is at most $V$ and the reach (i.e. the normal injectivity radius) of ${\mathcal M}$ is greater than $\tau$. Moreover, let $\mu$ be a probability measure on ${\mathcal M}$ whose density on ${\mathcal M}$ is a strictly positive Lipschitz-smooth function. Let $x_j\in {\mathcal M}$, $j=1,2,\dots,N$ be $N$ independent random samples from distribution $\mu$. Also, let $\xi_j$, $j=1,2,\dots, N$ be independent random samples from a Gaussian random variable in ${\mathbb R}^n$ having covariance $\sigma^2I$, where $\sigma$ is less than a certain specified function of $d, V$ and $\tau$. We assume that we are given the data points $y_j=x_j+\xi_j,$ $j=1,2,\dots,N$, modelling random points of ${\mathcal M}$ with measurement noise. We develop an algorithm which produces from these data, with high probability, a $d$ dimensional submanifold ${\mathcal M}_o\subset {\mathbb R}^n$ whose Hausdorff distance to ${\mathcal M}$ is less than $Cd\sigma^2/\tau$ and whose reach is greater than $c{\tau}/d^6$ with universal constants $C,c > 0$. The number $N$ of random samples required depends almost linearly on $n$, polynomially on $\sigma^{-1}$ and exponentially on $d$.
Heavy ball momentum is a popular acceleration idea in stochastic optimization. There have been several attempts to understand its perceived benefits, but the complete picture is still unclear. Specifically, the error expression in the presence of noise has two separate terms: the bias and the variance, but most existing works only focus on bias and show that momentum accelerates its decay. Such analyses overlook the interplay between bias and variance and, therefore, miss important implications. In this work, we analyze a sample complexity bound of stochastic approximation algorithms with heavy-ball momentum that accounts for both bias and variance. We find that for the same step size, which is small enough, the iterates with momentum have improved sample complexity compared to the ones without. However, by using a different step-size sequence, the non-momentum version can nullify this benefit. Subsequently, we show that our sample complexity bounds are indeed tight for a small enough neighborhood around the solution and large enough noise variance. Our analysis also sheds some light on the finite-time behavior of these algorithms. This explains the perceived benefit in the initial phase of momentum-based schemes.
When researchers carry out a null hypothesis significance test, it is tempting to assume that a statistically significant result lowers Prob(H0), the probability of the null hypothesis being true. Technically, such a statement is meaningless for various reasons: e.g., the null hypothesis does not have a probability associated with it. However, it is possible to relax certain assumptions to compute the posterior probability Prob(H0) under repeated sampling. We show in a step-by-step guide that the intuitively appealing belief, that Prob(H0) is low when significant results have been obtained under repeated sampling, is in general incorrect and depends greatly on: (a) the prior probability of the null being true; (b) type-I error rate, (c) type-II error rate, and (d) replication of a result. Through step-by-step simulations using open-source code in the R System of Statistical Computing, we show that uncertainty about the null hypothesis being true often remains high despite a significant result. To help the reader develop intuitions about this common misconception, we provide a Shiny app (//danielschad.shinyapps.io/probnull/). We expect that this tutorial will help researchers better understand and judge results from null hypothesis significance tests.
In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie (2003)'s kappa theorem. Our framework presumes a binary endogenous explanatory variable ("treatment") and a binary instrumental variable, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which we show is weight normalized, is likely to dominate the others in many contexts. A notable exception is in settings with one-sided noncompliance, where certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. We use a simulation study and three empirical applications to illustrate our findings. In applications to causal effects of college education using the college proximity instrument (Card, 1995) and causal effects of childbearing using the sibling sex composition instrument (Angrist and Evans, 1998), the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both. Overall, our results suggest that (i) the relative performance of different kappa weighting estimators varies with features of the data-generating process; and that (ii) the normalized version of Tan (2006)'s estimator may be an attractive alternative in many contexts. Applied researchers with access to a binary instrumental variable should also consider covariate balancing or doubly robust estimators of the LATE.
The performance of a quantum information processing protocol is ultimately judged by distinguishability measures that quantify how distinguishable the actual result of the protocol is from the ideal case. The most prominent distinguishability measures are those based on the fidelity and trace distance, due to their physical interpretations. In this paper, we propose and review several algorithms for estimating distinguishability measures based on trace distance and fidelity. The algorithms can be used for distinguishing quantum states, channels, and strategies (the last also known in the literature as "quantum combs"). The fidelity-based algorithms offer novel physical interpretations of these distinguishability measures in terms of the maximum probability with which a single prover (or competing provers) can convince a verifier to accept the outcome of an associated computation. We simulate many of these algorithms by using a variational approach with parameterized quantum circuits. We find that the simulations converge well in both the noiseless and noisy scenarios, for all examples considered. Furthermore, the noisy simulations exhibit a parameter noise resilience.
In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.