The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this article, we study the insertion procedure, originally introduced by Chang and Lyons (2019), from both a theoretical and a practical point of view. After describing our version of the method, we give its rate of convergence for piecewise linear paths, accompanied by an implementation in Pytorch. The algorithm is parallelized, meaning that it is very efficient at inverting a batch of signatures simultaneously. Its performance is illustrated with both real-world and simulated examples.
As models continue to grow in size, the development of memory optimization methods (MOMs) has emerged as a solution to address the memory bottleneck encountered when training large models. To comprehensively examine the practical value of various MOMs, we have conducted a thorough analysis of existing literature from a systems perspective. Our analysis has revealed a notable challenge within the research community: the absence of standardized metrics for effectively evaluating the efficacy of MOMs. The scarcity of informative evaluation metrics hinders the ability of researchers and practitioners to compare and benchmark different approaches reliably. Consequently, drawing definitive conclusions and making informed decisions regarding the selection and application of MOMs becomes a challenging endeavor. To address the challenge, this paper summarizes the scenarios in which MOMs prove advantageous for model training. We propose the use of distinct evaluation metrics under different scenarios. By employing these metrics, we evaluate the prevailing MOMs and find that their benefits are not universal. We present insights derived from experiments and discuss the circumstances in which they can be advantageous.
A stable and high-order accurate solver for linear and nonlinear parabolic equations is presented. An additive Runge-Kutta method is used for the time stepping, which integrates the linear stiff terms by an implicit singly diagonally implicit Runge-Kutta (ESDIRK) method and the nonlinear terms by an explicit Runge-Kutta (ERK) method. In each time step, the implicit solve is performed by the recently developed Hierarchical Poincar\'e-Steklov (HPS) method. This is a fast direct solver for elliptic equations that decomposes the space domain into a hierarchical tree of subdomains and builds spectral collocation solvers locally on the subdomains. These ideas are naturally combined in the presented method since the singly diagonal coefficient in ESDIRK and a fixed time-step ensures that the coefficient matrix in the implicit solve of HPS remains the same for all time stages. This means that the precomputed inverse can be efficiently reused, leading to a scheme with complexity (in two dimensions) $\mathcal{O}(N^{1.5})$ for the precomputation where the solution operator to the elliptic problems is built, and then $\mathcal{O}(N)$ for each time step. The stability of the method is proved for first order in time and any order in space, and numerical evidence substantiates a claim of stability for a much broader class of time discretization methods.Numerical experiments supporting the accuracy of efficiency of the method in one and two dimensions are presented.
Hand-eye calibration algorithms are mature and provide accurate transformation estimations for an effective camera-robot link but rely on a sufficiently wide range of calibration data to avoid errors and degenerate configurations. To solve the hand-eye problem in robotic-assisted minimally invasive surgery and also simplify the calibration procedure by using neural network method cooporating with the new objective function. We present a neural network-based solution that estimates the transformation from a sequence of images and kinematic data which significantly simplifies the calibration procedure. The network utilises the long short-term memory architecture to extract temporal information from the data and solve the hand-eye problem. The objective function is derived from the linear combination of remote centre of motion constraint, the re-projection error and its derivative to induce a small change in the hand-eye transformation. The method is validated with the data from da Vinci Si and the result shows that the estimated hand-eye matrix is able to re-project the end-effector from the robot coordinate to the camera coordinate within 10 to 20 pixels of accuracy in both testing dataset. The calibration performance is also superior to the previous neural network-based hand-eye method. The proposed algorithm shows that the calibration procedure can be simplified by using deep learning techniques and the performance is improved by the assumption of non-static hand-eye transformations.
We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive convex programming dual reformulation for general nominal distributions, transport costs, and loss functions. Compared with Wasserstein DRO, our proposed approach offers enhanced computational tractability for a broader class of loss functions, and the worst-case distribution exhibits greater plausibility in practical scenarios. To solve the dual reformulation, we develop a stochastic mirror descent algorithm with biased gradient oracles. Remarkably, this algorithm achieves near-optimal sample complexity for both smooth and nonsmooth loss functions, nearly matching the sample complexity of the Empirical Risk Minimization counterpart. Finally, we provide numerical examples using synthetic and real data to demonstrate its superior performance.
Stability and optimal convergence analysis of a non-uniform implicit-explicit L1 finite element method (IMEX-L1-FEM) is studied for a class of time-fractional linear partial differential/integro-differential equations with non-self-adjoint elliptic part having (space-time) variable coefficients. The proposed scheme is based on a combination of an IMEX-L1 method on graded mesh in the temporal direction and a finite element method in the spatial direction. With the help of a discrete fractional Gr\"{o}nwall inequality, optimal error estimates in $L^2$- and $H^1$-norms are derived for the problem with initial data $u_0 \in H_0^1(\Omega)\cap H^2(\Omega)$. Under higher regularity condition $u_0 \in \dot{H}^3(\Omega)$, a super convergence result is established and as a consequence, $L^\infty$ error estimate is obtained for 2D problems. Numerical experiments are presented to validate our theoretical findings.
Stochastic approximation with multiple coupled sequences (MSA) has found broad applications in machine learning as it encompasses a rich class of problems including bilevel optimization (BLO), multi-level compositional optimization (MCO), and reinforcement learning (specifically, actor-critic methods). However, designing provably-efficient federated algorithms for MSA has been an elusive question even for the special case of double sequence approximation (DSA). Towards this goal, we develop FedMSA which is the first federated algorithm for MSA, and establish its near-optimal communication complexity. As core novelties, (i) FedMSA enables the provable estimation of hypergradients in BLO and MCO via local client updates, which has been a notable bottleneck in prior theory, and (ii) our convergence guarantees are sensitive to the heterogeneity-level of the problem. We also incorporate momentum and variance reduction techniques to achieve further acceleration leading to near-optimal rates. Finally, we provide experiments that support our theory and demonstrate the empirical benefits of FedMSA. As an example, FedMSA enables order-of-magnitude savings in communication rounds compared to prior federated BLO schemes.
We investigate error bounds for numerical solutions of divergence structure linear elliptic PDEs on compact manifolds without boundary. Our focus is on a class of monotone finite difference approximations, which provide a strong form of stability that guarantees the existence of a bounded solution. In many settings including the Dirichlet problem, it is easy to show that the resulting solution error is proportional to the formal consistency error of the scheme. We make the surprising observation that this need not be true for PDEs posed on compact manifolds without boundary. We propose a particular class of approximation schemes built around an underlying monotone scheme with consistency error $O(h^\alpha)$. By carefully constructing barrier functions, we prove that the solution error is bounded by $O(h^{\alpha/(d+1)})$ in dimension $d$. We also provide a specific example where this predicted convergence rate is observed numerically. Using these error bounds, we further design a family of provably convergent approximations to the solution gradient.
Point processes often have a natural interpretation with respect to a continuous process. We propose a point process construction that describes arrival time observations in terms of the state of a latent diffusion process. In this framework, we relate the return times of a diffusion in a continuous path space to new arrivals of the point process. This leads to a continuous sample path that is used to describe the underlying mechanism generating the arrival distribution. These models arise in many disciplines, such as financial settings where actions in a market are determined by a hidden continuous price or in neuroscience where a latent stimulus generates spike trains. Based on the developments in It\^o's excursion theory, we propose methods for inferring and sampling from the point process derived from the latent diffusion process. We illustrate the approach with numerical examples using both simulated and real data. The proposed methods and framework provide a basis for interpreting point processes through the lens of diffusions.
In 1989 George Cybenko proved in a landmark paper that wide shallow neural networks can approximate arbitrary continuous functions on a compact set. This universal approximation theorem sparked a lot of follow-up research. Shen, Yang and Zhang determined optimal approximation rates for ReLU-networks in $L^p$-norms with $p \in [1,\infty)$. Kidger and Lyons proved a universal approximation theorem for deep narrow ReLU-networks. Telgarsky gave an example of a deep narrow ReLU-network that cannot be approximated by a wide shallow ReLU-network unless it has exponentially many neurons. However, there are even more questions that still remain unresolved. Are there any wide shallow ReLU-networks that cannot be approximated well by deep narrow ReLU-networks? Is the universal approximation theorem still true for other norms like the Sobolev norm $W^{1,1}$? Do these results hold for activation functions other than ReLU? We will answer all of those questions and more with a framework of two expressive powers. The first one is well-known and counts the maximal number of linear regions of a function calculated by a ReLU-network. We will improve the best known bounds for this expressive power. The second one is entirely new.
As a baseball game progresses, batters appear to perform better the more times they face a particular pitcher. The apparent drop-off in pitcher performance from one time through the order to the next, known as the Time Through the Order Penalty (TTOP), is often attributed to within-game batter learning. Although the TTOP has largely been accepted within baseball and influences many managers' in-game decision making, we argue that existing approaches of estimating the size of the TTOP cannot disentangle continuous evolution in pitcher performance over the course of the game from discontinuities between successive times through the order. Using a Bayesian multinomial regression model, we find that, after adjusting for confounders like batter and pitcher quality, handedness, and home field advantage, there is little evidence of strong discontinuity in pitcher performance between times through the order. Our analysis suggests that the start of the third time through the order should not be viewed as a special cutoff point in deciding whether to pull a starting pitcher.