This paper considers the sparse recovery with shuffled labels, i.e., $\by = \bPitrue \bX \bbetatrue + \bw$, where $\by \in \RR^n$, $\bPi\in \RR^{n\times n}$, $\bX\in \RR^{n\times p}$, $\bbetatrue\in \RR^p$, $\bw \in \RR^n$ denote the sensing result, the unknown permutation matrix, the design matrix, the sparse signal, and the additive noise, respectively. Our goal is to reconstruct both the permutation matrix $\bPitrue$ and the sparse signal $\bbetatrue$. We investigate this problem from both the statistical and computational aspects. From the statistical aspect, we first establish the minimax lower bounds on the sample number $n$ and the \emph{signal-to-noise ratio} ($\snr$) for the correct recovery of permutation matrix $\bPitrue$ and the support set $\supp(\bbetatrue)$, to be more specific, $n \gtrsim k\log p$ and $\log\snr \gtrsim \log n + \frac{k\log p}{n}$. Then, we confirm the tightness of these minimax lower bounds by presenting an exhaustive-search based estimator whose performance matches the lower bounds thereof up to some multiplicative constants. From the computational aspect, we impose a parsimonious assumption on the number of permuted rows and propose a computationally-efficient estimator accordingly. Moreover, we show that our proposed estimator can obtain the ground-truth $(\bPitrue, \supp(\bbetatrue))$ under mild conditions. Furthermore, we provide numerical experiments to corroborate our claims.
Let $G$ be a graph on $n$ vertices of maximum degree $\Delta$. We show that, for any $\delta > 0$, the down-up walk on independent sets of size $k \leq (1-\delta)\alpha_c(\Delta)n$ mixes in time $O_{\Delta,\delta}(k\log{n})$, thereby resolving a conjecture of Davies and Perkins in an optimal form. Here, $\alpha_{c}(\Delta)n$ is the NP-hardness threshold for the problem of counting independent sets of a given size in a graph on $n$ vertices of maximum degree $\Delta$. Our mixing time has optimal dependence on $k,n$ for the entire range of $k$; previously, even polynomial mixing was not known. In fact, for $k = \Omega_{\Delta}(n)$ in this range, we establish a log-Sobolev inequality with optimal constant $\Omega_{\Delta,\delta}(1/n)$. At the heart of our proof are three new ingredients, which may be of independent interest. The first is a method for lifting $\ell_\infty$-independence from a suitable distribution on the discrete cube -- in this case, the hard-core model -- to the slice by proving stability of an Edgeworth expansion using a multivariate zero-free region for the base distribution. The second is a generalization of the Lee-Yau induction to prove log-Sobolev inequalities for distributions on the slice with considerably less symmetry than the uniform distribution. The third is a sharp decomposition-type result which provides a lossless comparison between the Dirichlet form of the original Markov chain and that of the so-called projected chain in the presence of a contractive coupling.
We are considering the geometric amoebot model where a set of $n$ amoebots is placed on the triangular grid. An amoebot is able to send information to its neighbors, and to move via expansions and contractions. Since amoebots and information can only travel node by node, most problems have a natural lower bound of $\Omega(D)$ where $D$ denotes the diameter of the structure. Inspired by the nervous and muscular system, Feldmann et al. have proposed the reconfigurable circuit extension and the joint movement extension of the amoebot model with the goal of breaking this lower bound. In the joint movement extension, the way amoebots move is altered. Amoebots become able to push and pull other amoebots. Feldmann et al. demonstrated the power of joint movements by transforming a line of amoebots into a rhombus within $O(\log n)$ rounds. However, they left the details of the extension open. The goal of this paper is therefore to formalize and extend the joint movement extension. In order to provide a proof of concept for the extension, we consider two fundamental problems of modular robot systems: shape formation and locomotion. We approach these problems by defining meta-modules of rhombical and hexagonal shape, respectively. The meta-modules are capable of movement primitives like sliding, rotating, and tunneling. This allows us to simulate shape formation algorithms of various modular robot systems. Finally, we construct three amoebot structures capable of locomotion by rolling, crawling, and walking, respectively.
Bayes' rule tells us how to invert a causal process in order to update our beliefs in light of new evidence. If the process is believed to have a complex compositional structure, we may observe that the inversion of the whole can be computed piecewise in terms of the component processes. We study the structure of this compositional rule, noting that it relates to the lens pattern in functional programming. Working in a suitably general axiomatic presentation of a category of Markov kernels, we see how we can think of Bayesian inversion as a particular instance of a state-dependent morphism in a fibred category. We discuss the compositional nature of this, formulated as a functor on the underlying category and explore how this can used for a more type-driven approach to statistical inference.
Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i.e., the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the minorization-maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method.
In this paper we show how to use drift analysis in the case of two random variables $X_1, X_2$, when the drift is approximatively given by $A\cdot (X_1,X_2)^T$ for a matrix $A$. The non-trivial case is that $X_1$ and $X_2$ impede each other's progress, and we give a full characterization of this case. As application, we develop and analyze a minimal example TwoLinear of a dynamic environment that can be hard. The environment consists of two linear function $f_1$ and $f_2$ with positive weights $1$ and $n$, and in each generation selection is based on one of them at random. They only differ in the set of positions that have weight $1$ and $n$. We show that the $(1+1)$-EA with mutation rate $\chi/n$ is efficient for small $\chi$ on TwoLinear, but does not find the shared optimum in polynomial time for large $\chi$.
The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources are to be simultaneously reconstructed from the union of their $k$-mer sets. We consider an asymptotic regime where $m = n^\alpha$ i.i.d. source sequences of length $n$ are to be reconstructed from the set of their substrings of length $k=\beta \log n$, and seek to characterize the $(\alpha,\beta)$ pairs for which reconstruction is information-theoretically feasible. We show that, as $n \to \infty$, the source sequences can be reconstructed if $\beta > \max(2\alpha+1,\alpha+2)$ and cannot be reconstructed if $\beta < \max( 2\alpha+1, \alpha+ \tfrac32)$, characterizing the feasibility region almost completely. Interestingly, our result shows that there are feasible $(\alpha,\beta)$ pairs where repeats across the source strings abound, and non-trivial reconstruction algorithms are needed to achieve the fundamental limit.
Federated noisy label learning (FNLL) is emerging as a promising tool for privacy-preserving multi-source decentralized learning. Existing research, relying on the assumption of class-balanced global data, might be incapable to model complicated label noise, especially in medical scenarios. In this paper, we first formulate a new and more realistic federated label noise problem where global data is class-imbalanced and label noise is heterogeneous, and then propose a two-stage framework named FedNoRo for noise-robust federated learning. Specifically, in the first stage of FedNoRo, per-class loss indicators followed by Gaussian Mixture Model are deployed for noisy client identification. In the second stage, knowledge distillation and a distance-aware aggregation function are jointly adopted for noise-robust federated model updating. Experimental results on the widely-used ICH and ISIC2019 datasets demonstrate the superiority of FedNoRo against the state-of-the-art FNLL methods for addressing class imbalance and label noise heterogeneity in real-world FL scenarios.
We consider problems of minimizing functionals $\mathcal{F}$ of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for $\mathcal{F}$ based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.
Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.
Medical image segmentation requires consensus ground truth segmentations to be derived from multiple expert annotations. A novel approach is proposed that obtains consensus segmentations from experts using graph cuts (GC) and semi supervised learning (SSL). Popular approaches use iterative Expectation Maximization (EM) to estimate the final annotation and quantify annotator's performance. Such techniques pose the risk of getting trapped in local minima. We propose a self consistency (SC) score to quantify annotator consistency using low level image features. SSL is used to predict missing annotations by considering global features and local image consistency. The SC score also serves as the penalty cost in a second order Markov random field (MRF) cost function optimized using graph cuts to derive the final consensus label. Graph cut obtains a global maximum without an iterative procedure. Experimental results on synthetic images, real data of Crohn's disease patients and retinal images show our final segmentation to be accurate and more consistent than competing methods.