We prove that the well-studied triangulation flip walk on a convex point set mixes in time O(n^3 log^3 n), the first progress since McShine and Tetali's O(n^5 log n) bound in 1997. In the process we give lower and upper bounds of respectively Omega(1/(sqrt n log n)) and O(1/sqrt n) -- asymptotically tight up to an O(log n) factor -- for the expansion of the associahedron graph K_n. The upper bound recovers Molloy, Reed, and Steiger's Omega(n^{3/2}) bound on the mixing time of the walk. To obtain these results, we introduce a framework consisting of a set of sufficient conditions under which a given Markov chain mixes rapidly. This framework is a purely combinatorial analogue that in some circumstances gives better results than the projection-restriction technique of Jerrum, Son, Tetali, and Vigoda. In particular, in addition to the result for triangulations, we show quasipolynomial mixing for the k-angulation flip walk on a convex point set, for fixed k >= 4.
A sequence of random variables is called exchangeable if its joint distribution is invariant under permutations. The original formulation of de Finetti's theorem says that any exchangeable sequence of $\{0,1\}$-valued random variables can be thought of as a mixture of independent and identically distributed sequences in a certain precise mathematical sense. Interpreting this statement from a convex analytic perspective, Hewitt and Savage obtained the same conclusion for more general state spaces under some topological conditions. The main contribution of this paper is in providing a new framework that explains the theorem purely as a consequence of the underlying distribution of the random variables, with no topological conditions (beyond Hausdorffness) on the state space being necessary if the distribution is Radon. We also show that it is consistent with the axioms of ZFC that de Finetti's theorem holds for all sequences of exchangeable random variables taking values in any complete metric space. The framework we use is based on nonstandard analysis. We have provided a self-contained introduction to nonstandard analysis as an appendix, thus rendering measure theoretic probability and point-set topology as the only prerequisites for this paper. Our introduction aims to develop some new ideologies that might be of interest to mathematicians, philosophers, and mathematics educators alike. Our technical tools come from nonstandard topological measure theory, in which a highlight is a new generalization of Prokhorov's theorem. Modulo such technical tools, our proof relies on properties of the empirical measures induced by hyperfinitely many identically distributed random variables -- a feature that allows us to establish de Finetti's theorem in the generality that we seek while still retaining the combinatorial intuition of proofs of simpler versions of de Finetti's theorem.
We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance under a relaxed assumption on the gradient noise distribution compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.
Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.
In this paper, we will show the $L^p$-resolvent estimate for the finite element approximation of the Stokes operator for $p \in \left( \frac{2N}{N+2}, \frac{2N}{N-2} \right)$, where $N \ge 2$ is the dimension of the domain. It is expected that this estimate can be applied to error estimates for finite element approximation of the non-stationary Navier--Stokes equations, since studies in this direction are successful in numerical analysis of nonlinear parabolic equations. To derive the resolvent estimate, we introduce the solution of the Stokes resolvent problem with a discrete external force. We then obtain local energy error estimate according to a novel localization technique and establish global $L^p$-type error estimates. The restriction for $p$ is caused by the treatment of lower-order terms appearing in the local energy error estimate. Our result may be a breakthrough in the $L^p$-theory of finite element methods for the non-stationary Navier--Stokes equations.
Force-aware grasping is an essential capability for most robots in practical applications. Especially for compliant grippers, such as Fin-Ray grippers, it still remains challenging to build a bidirectional mathematical model that mutually maps the shape deformation and contact force. Part I of this article has constructed the force-displacement relationship for design optimization through the co-rotational theory. In Part II, we further devise a displacement-force mathematical model, enabling the compliant gripper to precisely estimate contact force from deformations sensor-free. The presented displacement-force model elaborately investigates contact forces and provides force feedback for a force control system of a gripper, where deformation appears as displacements in contact points. Afterward, simulation experiments are conducted to evaluate the performance of the proposed model through comparisons with the finite-element analysis (FEA) in Ansys. Simulation results reveal that the proposed model accurately estimates contact force, with an average error of around 3% and 4% for single or multiple node cases, respectively, regardless of various design parameters (Part I of this article is released in Arxiv1)
The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.
The entropy production rate is a central quantity in non-equilibrium statistical physics, scoring how far a stochastic process is from being time-reversible. In this paper, we compute the entropy production of diffusion processes at non-equilibrium steady-state under the condition that the time-reversal of the diffusion remains a diffusion. We start by characterising the entropy production of both discrete and continuous-time Markov processes. We investigate the time-reversal of time-homogeneous stationary diffusions and recall the most general conditions for the reversibility of the diffusion property, which includes hypoelliptic and degenerate diffusions, and locally Lipschitz vector fields. We decompose the drift into its time-reversible and irreversible parts, or equivalently, the generator into symmetric and antisymmetric operators. We show the equivalence with a decomposition of the backward Kolmogorov equation considered in hypocoercivity theory, and a decomposition of the Fokker-Planck equation in GENERIC form. The main result shows that when the time-irreversible part of the drift is in the range of the volatility matrix (almost everywhere) the forward and time-reversed path space measures of the process are mutually equivalent, and evaluates the entropy production. When this does not hold, the measures are mutually singular and the entropy production is infinite. We verify these results using exact numerical simulations of linear diffusions. We illustrate the discrepancy between the entropy production of non-linear diffusions and their numerical simulations in several examples and illustrate how the entropy production can be used for accurate numerical simulation. Finally, we discuss the relationship between time-irreversibility and sampling efficiency, and how we can modify the definition of entropy production to score how far a process is from being generalised reversible.
We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball -- our algorithms use $\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d^{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
We present high-order, finite element-based Second Moment Methods (SMMs) for solving radiation transport problems in two spatial dimensions. We leverage the close connection between the Variable Eddington Factor (VEF) method and SMM to convert existing discretizations of the VEF moment system to discretizations of the SMM moment system. The moment discretizations are coupled to a high-order Discontinuous Galerkin discretization of the Discrete Ordinates transport equations. We show that the resulting methods achieve high-order accuracy on high-order (curved) meshes, preserve the thick diffusion limit, and are effective on a challenging multi-material problem both in outer fixed-point iterations and in inner preconditioned iterative solver iterations for the discrete moment systems. We also present parallel scaling results and provide direct comparisons to the VEF algorithms the SMM algorithms were derived from.
Aiming at expanding few-shot relations' coverage in knowledge graphs (KGs), few-shot knowledge graph completion (FKGC) has recently gained more research interests. Some existing models employ a few-shot relation's multi-hop neighbor information to enhance its semantic representation. However, noise neighbor information might be amplified when the neighborhood is excessively sparse and no neighbor is available to represent the few-shot relation. Moreover, modeling and inferring complex relations of one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N) by previous knowledge graph completion approaches requires high model complexity and a large amount of training instances. Thus, inferring complex relations in the few-shot scenario is difficult for FKGC models due to limited training instances. In this paper, we propose a few-shot relational learning with global-local framework to address the above issues. At the global stage, a novel gated and attentive neighbor aggregator is built for accurately integrating the semantics of a few-shot relation's neighborhood, which helps filtering the noise neighbors even if a KG contains extremely sparse neighborhoods. For the local stage, a meta-learning based TransH (MTransH) method is designed to model complex relations and train our model in a few-shot learning fashion. Extensive experiments show that our model outperforms the state-of-the-art FKGC approaches on the frequently-used benchmark datasets NELL-One and Wiki-One. Compared with the strong baseline model MetaR, our model achieves 5-shot FKGC performance improvements of 8.0% on NELL-One and 2.8% on Wiki-One by the metric Hits@10.