We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
Heterogeneous Bayesian decentralized data fusion captures the set of problems in which two robots must combine two probability density functions over non-equal, but overlapping sets of random variables. In the context of multi-robot dynamic systems, this enables robots to take a "divide and conquer" approach to reason and share data over complementary tasks instead of over the full joint state space. For example, in a target tracking application, this allows robots to track different subsets of targets and share data on only common targets. This paper presents a framework by which robots can each use a local factor graph to represent relevant partitions of a complex global joint probability distribution, thus allowing them to avoid reasoning over the entirety of a more complex model and saving communication as well as computation costs. From a theoretical point of view, this paper makes contributions by casting the heterogeneous decentralized fusion problem in terms of a factor graph, analyzing the challenges that arise due to dynamic filtering, and then developing a new conservative filtering algorithm that ensures statistical correctness. From a practical point of view, we show how this framework can be used to represent different multi-robot applications and then test it with simulations and hardware experiments to validate and demonstrate its statistical conservativeness, applicability, and robustness to real-world challenges.
Graph Neural Networks (GNNs) and Transformer have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-temporal dependencies crucial for applications such as traffic datasets. Although many existing methods utilize multi-head attention mechanisms and message-passing neural networks (MPNNs) to capture both spatial and temporal relations, these approaches encode temporal and spatial relations independently, and reflect the graph's topological characteristics in a limited manner. In this work, we introduce the Cycle to Mixer (Cy2Mixer), a novel spatio-temporal GNN based on topological non-trivial invariants of spatio-temporal graphs with gated multi-layer perceptrons (gMLP). The Cy2Mixer is composed of three blocks based on MLPs: A message-passing block for encapsulating spatial information, a cycle message-passing block for enriching topological information through cyclic subgraphs, and a temporal block for capturing temporal properties. We bolster the effectiveness of Cy2Mixer with mathematical evidence emphasizing that our cycle message-passing block is capable of offering differentiated information to the deep learning model compared to the message-passing block. Furthermore, empirical evaluations substantiate the efficacy of the Cy2Mixer, demonstrating state-of-the-art performances across various traffic benchmark datasets.
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor. Our framework requires minimal assumptions on the noise and can be extended to functions deviating from strict linearity up to some adjustable threshold, thereby accommodating a comprehensive and pragmatically relevant set of functions. The derived confidence regions can be cast as constraints within a Mixed Integer Linear Programming framework, enabling optimisation of linear objectives. This representation enables robust optimization and the extraction of confidence intervals for specific parameter coordinates. Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing. Finally, we validate the empirical applicability of our method on synthetic data.
We propose a novel approach to nonlinear functional regression, called the Mapping-to-Parameter function model, which addresses complex and nonlinear functional regression problems in parameter space by employing any supervised learning technique. Central to this model is the mapping of function data from an infinite-dimensional function space to a finite-dimensional parameter space. This is accomplished by concurrently approximating multiple functions with a common set of B-spline basis functions by any chosen order, with their knot distribution determined by the Iterative Local Placement Algorithm, a newly proposed free knot placement algorithm. In contrast to the conventional equidistant knot placement strategy that uniformly distributes knot locations based on a predefined number of knots, our proposed algorithms determine knot location according to the local complexity of the input or output functions. The performance of our knot placement algorithms is shown to be robust in both single-function approximation and multiple-function approximation contexts. Furthermore, the effectiveness and advantage of the proposed prediction model in handling both function-on-scalar regression and function-on-function regression problems are demonstrated through several real data applications, in comparison with four groups of state-of-the-art methods.
We prove that to each real singularity $f: (\mathbb{R}^{n+1}, 0) \to (\mathbb{R}, 0)$ one can associate two systems of differential equations $\mathfrak{g}^{k\pm}_f$ which are pushforwards in the category of $\mathcal{D}$-modules over $\mathbb{R}^{\pm}$, of the sheaf of real analytic functions on the total space of the positive, respectively negative, Milnor fibration. We prove that for $k=0$ if $f$ is an isolated singularity then $\mathfrak{g}^{\pm}$ determines the the $n$-th homology groups of the positive, respectively negative, Milnor fibre. We then calculate $\mathfrak{g}^{+}$ for ordinary quadratic singularities and prove that under certain conditions on the choice of morsification, one recovers the top homology groups of the Milnor fibers of any isolated singularity $f$. As an application we construct a public-key encryption scheme based on morsification of singularities.
Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently reduce the computational cost for estimation and solution algorithm's progress within a limited budget due to extreme noise levels and heteroskedasticity of system responses. We propose integrating stratification with adaptive sampling for the purpose of efficiency in optimization. Stratification can exploit local dependence in the simulation inputs and outputs. Yet, the state-of-the-art does not provide a full capability to adaptively stratify the data as different solution alternatives are evaluated. We devise two procedures for data-driven calibration problems that involve a large dataset with multiple covariates to calibrate models within a fixed overall simulation budget. The first approach dynamically stratifies the input data using binary trees, while the second approach uses closed-form solutions based on linearity assumptions between the objective function and concomitant variables. We find that dynamical adjustment of stratification structure accelerates optimization and reduces run-to-run variability in generated solutions. Our case study for calibrating a wind power simulation model, widely used in the wind industry, using the proposed stratified adaptive sampling, shows better-calibrated parameters under a limited budget.
We present the first $\varepsilon$-differentially private, computationally efficient algorithm that estimates the means of product distributions over $\{0,1\}^d$ accurately in total-variation distance, whilst attaining the optimal sample complexity to within polylogarithmic factors. The prior work had either solved this problem efficiently and optimally under weaker notions of privacy, or had solved it optimally while having exponential running times.
In an instance of the minimum eigenvalue problem, we are given a collection of $n$ vectors $v_1,\ldots, v_n \subset {\mathbb{R}^d}$, and the goal is to pick a subset $B\subseteq [n]$ of given vectors to maximize the minimum eigenvalue of the matrix $\sum_{i\in B} v_i v_i^{\top} $. Often, additional combinatorial constraints such as cardinality constraint $\left(|B|\leq k\right)$ or matroid constraint ($B$ is a basis of a matroid defined on $[n]$) must be satisfied by the chosen set of vectors. The minimum eigenvalue problem with matroid constraints models a wide variety of problems including the Santa Clause problem, the E-design problem, and the constructive Kadison-Singer problem. In this paper, we give a randomized algorithm that finds a set $B\subseteq [n]$ subject to any matroid constraint whose minimum eigenvalue is at least $(1-\epsilon)$ times the optimum, with high probability. The running time of the algorithm is $O\left( n^{O(d\log(d)/\epsilon^2)}\right)$. In particular, our results give a polynomial time asymptotic scheme when the dimension of the vectors is constant. Our algorithm uses a convex programming relaxation of the problem after guessing a rescaling which allows us to apply pipage rounding and matrix Chernoff inequalities to round to a good solution. The key new component is a structural lemma which enables us to "guess'' the appropriate rescaling, which could be of independent interest. Our approach generalizes the approximation guarantee to monotone, homogeneous functions and as such we can maximize $\det(\sum_{i\in B} v_i v_i^\top)^{1/d}$, or minimize any norm of the eigenvalues of the matrix $\left(\sum_{i\in B} v_i v_i^\top\right)^{-1} $, with the same running time under some mild assumptions. As a byproduct, we also get a simple algorithm for an algorithmic version of Kadison-Singer problem.
Current models for event causality identification (ECI) mainly adopt a supervised framework, which heavily rely on labeled data for training. Unfortunately, the scale of current annotated datasets is relatively limited, which cannot provide sufficient support for models to capture useful indicators from causal statements, especially for handing those new, unseen cases. To alleviate this problem, we propose a novel approach, shortly named CauSeRL, which leverages external causal statements for event causality identification. First of all, we design a self-supervised framework to learn context-specific causal patterns from external causal statements. Then, we adopt a contrastive transfer strategy to incorporate the learned context-specific causal patterns into the target ECI model. Experimental results show that our method significantly outperforms previous methods on EventStoryLine and Causal-TimeBank (+2.0 and +3.4 points on F1 value respectively).
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.