There has been great interest in using tools from dynamical systems and numerical analysis of differential equations to understand and construct new optimization methods. In particular, recently a new paradigm has emerged that applies ideas from mechanics and geometric integration to obtain accelerated optimization methods on Euclidean spaces. This has important consequences given that accelerated methods are the workhorses behind many machine learning applications. In this paper we build upon these advances and propose a framework for dissipative and constrained Hamiltonian systems that is suitable for solving optimization problems on arbitrary smooth manifolds. Importantly, this allows us to leverage the well-established theory of symplectic integration to derive "rate-matching" dissipative integrators. This brings a new perspective to optimization on manifolds whereby convergence guarantees follow by construction from classical arguments in symplectic geometry and backward error analysis. Moreover, we construct two dissipative generalizations of leapfrog that are straightforward to implement: one for Lie groups and homogeneous spaces, that relies on the tractable geodesic flow or a retraction thereof, and the other for constrained submanifolds that is based on a dissipative generalization of the famous RATTLE integrator.
Throughput is a main performance objective in communication networks. This paper considers a fundamental maximum throughput routing problem -- the all-or-nothing multicommodity flow (ANF) problem -- in arbitrary directed graphs and in the practically relevant but challenging setting where demands can be (much) larger than the edge capacities. Hence, in addition to assigning requests to valid flows for each routed commodity, an admission control mechanism is required which prevents overloading the network when routing commodities. We make several contributions. On the theoretical side we obtain substantially improved bi-criteria approximation algorithms for this NP-hard problem. We present two non-trivial linear programming relaxations and show how to convert their fractional solutions into integer solutions via randomized rounding. One is an exponential-size formulation (solvable in polynomial time using a separation oracle) that considers a "packing" view and allows a more flexible approach, while the other is a generalization of the compact LP formulation of Liu et al. (INFOCOM'19) that allows for easy solving via standard LP solvers. We obtain a polynomial-time randomized algorithm that yields an arbitrarily good approximation on the weighted throughput while violating the edge capacity constraints by only a small multiplicative factor. We also describe a deterministic rounding algorithm by derandomization, using the method of pessimistic estimators. We complement our theoretical results with a proof of concept empirical evaluation.
Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of the potential function of target distribution to explore sample space efficiently. However, computing gradients can often be computationally expensive for large scale applications, such as those in contemporary machine learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by stochastic ones, commonly via uniformly subsampled data points, and achieve improved computational efficiency, however at the price of introducing sampling error. We propose a non-uniform subsampling scheme to improve the sampling accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is designed so that a non-uniform-SG-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method, and hence the inaccuracy due to SG approximation is reduced. EWSG differs from classical variance reduction (VR) techniques as it focuses on the entire distribution instead of just the variance; nevertheless, its reduced local variance is also proved. EWSG can also be viewed as an extension of the importance sampling idea, successful for stochastic-gradient-based optimizations, to sampling tasks. In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC algorithm. Numerical experiments are provided, not only to demonstrate EWSG's effectiveness, but also to guide hyperparameter choices, and validate our \emph{non-asymptotic global error bound} despite of approximations in the implementation. Notably, while statistical accuracy is improved, convergence speed can be comparable to the uniform version, which renders EWSG a practical alternative to VR (but EWSG and VR can be combined too).
We consider the power of local algorithms for approximately solving Max $k$XOR, a generalization of two constraint satisfaction problems previously studied with classical and quantum algorithms (MaxCut and Max E3LIN2). On instances with either random signs or no overlapping clauses and $D+1$ clauses per variable, we calculate the average satisfying fraction of the depth-1 QAOA and compare with a generalization of the local threshold algorithm. Notably, the quantum algorithm outperforms the threshold algorithm for $k > 4$. On the other hand, we highlight potential difficulties for the QAOA to achieve computational quantum advantage on this problem. We first compute a tight upper bound on the maximum satisfying fraction of nearly all large random regular Max $k$XOR instances by numerically calculating the ground state energy density $P(k)$ of a mean-field $k$-spin glass. The upper bound grows with $k$ much faster than the performance of both one-local algorithms. We also identify a new obstruction result for low-depth quantum circuits (including the QAOA) when $k=3$, generalizing a result of Bravyi et al [arXiv:1910.08980] when $k=2$. We conjecture that a similar obstruction exists for all $k$.
Heat Equation Driven Area Coverage (HEDAC) is a state-of-the-art multi-agent ergodic motion control guided by a gradient of a potential field. A finite element method is hereby implemented to obtain a solution of Helmholtz partial differential equation, which models the potential field for surveying motion control. This allows us to survey arbitrarily shaped domains and to include obstacles in an elegant and robust manner intrinsic to HEDAC's fundamental idea. For a simple kinematic motion, the obstacles and boundary avoidance constraints are successfully handled by directing the agent motion with the gradient of the potential. However, including additional constraints, such as the minimal clearance dsitance from stationary and moving obstacles and the minimal path curvature radius, requires further alternations of the control algorithm. We introduce a relatively simple yet robust approach for handling these constraints by formulating a straightforward optimization problem based on collision-free escapes route maneuvers. This approach provides a guaranteed collision avoidance mechanism, while being computationally inexpensive as a result of the optimization problem partitioning. The proposed motion control is evaluated in three realistic surveying scenarios simulations, showing the effectiveness of the surveying and the robustness of the control algorithm. Furthermore, potential maneuvering difficulties due to improperly defined surveying scenarios are highlighted and we provide guidelines on how to overpass them. The results are promising and indiacate real-world applicability of proposed constrained multi-agent motion control for autonomous surveying and potentially other HEDAC utilizations.
In this paper, we study the problem of topology optimization and routing in integrated access and backhaul (IAB) networks, as one of the promising techniques for evolving 5G networks. We study the problem from different perspectives. We develop efficient genetic algorithm-based schemes for both IAB node placement and non-IAB backhaul link distribution, and evaluate the effect of routing on bypassing temporal blockages. Here, concentrating on millimeter wave-based communications, we study the service coverage probability, defined as the probability of the event that the user equipments' (UEs) minimum rate requirements are satisfied. Moreover, we study the effect of different parameters such as the antenna gain, blockage and tree foliage on the system performance. Finally, we summarize the recent Rel-16 as well as the upcoming Rel-17 3GPP discussions on routing in IAB networks, and discuss the main challenges for enabling mesh-based IAB networks. As we show, with a proper network topology, IAB is an attractive approach to enable the network densification required by 5G and beyond.
This paper studies distributed binary test of statistical independence under communication (information bits) constraints. While testing independence is very relevant in various applications, distributed independence test is particularly useful for event detection in sensor networks where data correlation often occurs among observations of devices in the presence of a signal of interest. By focusing on the case of two devices because of their tractability, we begin by investigating conditions on Type I error probability restrictions under which the minimum Type II error admits an exponential behavior with the sample size. Then, we study the finite sample-size regime of this problem. We derive new upper and lower bounds for the gap between the minimum Type II error and its exponential approximation under different setups, including restrictions imposed on the vanishing Type I error probability. Our theoretical results shed light on the sample-size regimes at which approximations of the Type II error probability via error exponents became informative enough in the sense of predicting well the actual error probability. We finally discuss an application of our results where the gap is evaluated numerically, and we show that exponential approximations are not only tractable but also a valuable proxy for the Type II probability of error in the finite-length regime.
Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.
Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as facility location, maxcut, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. We propose an approach to integrate a differentiable proxy for common graph optimization problems into training of machine learning models for tasks such as link prediction. This allows the model to focus specifically on the downstream task that its predictions will be used for. Experimental results show that our end-to-end system obtains better performance on example optimization tasks than can be obtained by combining state of the art link prediction methods with expert-designed graph optimization algorithms.
The era of big data provides researchers with convenient access to copious data. However, people often have little knowledge about it. The increasing prevalence of big data is challenging the traditional methods of learning causality because they are developed for the cases with limited amount of data and solid prior causal knowledge. This survey aims to close the gap between big data and learning causality with a comprehensive and structured review of traditional and frontier methods and a discussion about some open problems of learning causality. We begin with preliminaries of learning causality. Then we categorize and revisit methods of learning causality for the typical problems and data types. After that, we discuss the connections between learning causality and machine learning. At the end, some open problems are presented to show the great potential of learning causality with data.
We propose an algorithm for real-time 6DOF pose tracking of rigid 3D objects using a monocular RGB camera. The key idea is to derive a region-based cost function using temporally consistent local color histograms. While such region-based cost functions are commonly optimized using first-order gradient descent techniques, we systematically derive a Gauss-Newton optimization scheme which gives rise to drastically faster convergence and highly accurate and robust tracking performance. We furthermore propose a novel complex dataset dedicated for the task of monocular object pose tracking and make it publicly available to the community. To our knowledge, It is the first to address the common and important scenario in which both the camera as well as the objects are moving simultaneously in cluttered scenes. In numerous experiments - including our own proposed data set - we demonstrate that the proposed Gauss-Newton approach outperforms existing approaches, in particular in the presence of cluttered backgrounds, heterogeneous objects and partial occlusions.