We prove an optimal $O(n \log n)$ mixing time of the Glauber dynamics for the Ising models with edge activity $\beta \in \left(\frac{\Delta-2}{\Delta}, \frac{\Delta}{\Delta-2}\right)$. This mixing time bound holds even if the maximum degree $\Delta$ is unbounded. We refine the boosting technique developed in [CFYZ21], and prove a new boosting theorem by utilizing the entropic independence defined in [AJK+21]. The theorem relates the modified log-Sobolev (MLS) constant of the Glauber dynamics for a near-critical Ising model to that for an Ising model in a sub-critical regime.
Choosing a shrinkage method can be done by selecting a penalty from a list of pre-specified penalties or by constructing a penalty based on the data. If a list of penalties for a class of linear models is given, we provide comparisons based on sample size and number of non-zero parameters under a predictive stability criterion based on data perturbation. These comparisons provide recommendations for penalty selection in a variety of settings. If the preference is to construct a penalty customized for a given problem, then we propose a technique based on genetic algorithms, again using a predictive criterion. We find that, in general, a custom penalty never performs worse than any commonly used penalties but that there are cases the custom penalty reduces to a recognizable penalty. Since penalty selection is mathematically equivalent to prior selection, our method also constructs priors. The techniques and recommendations we offer are intended for finite sample cases. In this context, we argue that predictive stability under perturbation is one of the few relevant properties that can be invoked when the true model is not known. Nevertheless, we study variable inclusion in simulations and, as part of our shrinkage selection strategy, we include oracle property considerations. In particular, we see that the oracle property typically holds for penalties that satisfy basic regularity conditions and therefore is not restrictive enough to play a direct role in penalty selection. In addition, our real data example also includes considerations merging from model mis-specification.
This work studies how the introduction of the entropic regularization term in unbalanced Optimal Transport (OT) models may alter their homogeneity with respect to the input measures. We observe that in common settings (including balanced OT and unbalanced OT with Kullback-Leibler divergence to the marginals), although the optimal transport cost itself is not homogeneous, optimal transport plans and the so-called Sinkhorn divergences are indeed homogeneous. However, homogeneity does not hold in more general Unbalanced Regularized Optimal Transport (UROT) models, for instance those using the Total Variation as divergence to the marginals. We propose to modify the entropic regularization term to retrieve an UROT model that is homogeneous while preserving most properties of the standard UROT model. We showcase the importance of using our Homogeneous UROT (HUROT) model when it comes to regularize Optimal Transport with Boundary, a transportation model involving a spatially varying divergence to the marginals for which the standard (inhomogeneous) UROT model would yield inappropriate behavior.
Optimal experimental design (OED) plays an important role in the problem of identifying uncertainty with limited experimental data. In many applications, we seek to minimize the uncertainty of a predicted quantity of interest (QoI) based on the solution of the inverse problem, rather than the inversion model parameter itself. In these scenarios, we develop an efficient method for goal-oriented optimal experimental design (GOOED) for large-scale Bayesian linear inverse problem that finds sensor locations to maximize the expected information gain (EIG) for a predicted QoI. By deriving a new formula to compute the EIG, exploiting low-rank structures of two appropriate operators, we are able to employ an online-offline decomposition scheme and a swapping greedy algorithm to maximize the EIG at a cost measured in model solutions that is independent of the problem dimensions. We provide detailed error analysis of the approximated EIG, and demonstrate the efficiency, accuracy, and both data- and parameter-dimension independence of the proposed algorithm for a contaminant transport inverse problem with infinite-dimensional parameter field.
The entropy is a measure of uncertainty that plays a central role in information theory. When the distribution of the data is unknown, an estimate of the entropy needs be obtained from the data sample itself. We propose a semi-parametric estimate, based on a mixture model approximation of the distribution of interest. The estimate can rely on any type of mixture, but we focus on Gaussian mixture model to demonstrate its accuracy and versatility. Performance of the proposed approach is assessed through a series of simulation studies. We also illustrate its use on two real-life data examples.
We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.
Many-user MAC is an important model for understanding energy efficiency of massive random access in 5G and beyond. Introduced in Polyanskiy'2017 for the AWGN channel, subsequent works have provided improved bounds on the asymptotic minimum energy-per-bit required to achieve a target per-user error at a given user density and payload, going beyond the AWGN setting. The best known rigorous bounds use spatially coupled codes along with the optimal AMP algorithm. But these bounds are infeasible to compute beyond a few (around 10) bits of payload. In this paper, we provide new achievability bounds for the many-user AWGN and quasi-static Rayleigh fading MACs using the spatially coupled codebook design along with a scalar AMP algorithm. The obtained bounds are computable even up to 100 bits and outperform the previous ones at this payload.
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.