We study limit theorems for entropic optimal transport (EOT) maps, dual potentials, and the Sinkhorn divergence. The key technical tool we use is a first and second-order Hadamard differentiability analysis of EOT potentials with respect to the marginal distributions, which may be of independent interest. Given the differentiability results, the functional delta method is used to obtain central limit theorems for empirical EOT potentials and maps. The second-order functional delta method is leveraged to establish the limit distribution of the empirical Sinkhorn divergence under the null. Building on the latter result, we further derive the null limit distribution of the Sinkhorn independence test statistic and characterize the correct order. Since our limit theorems follow from Hadamard differentiability of the relevant maps, as a byproduct, we also obtain bootstrap consistency and asymptotic efficiency of the empirical EOT map, potentials, and Sinkhorn divergence.
This work introduces efficient symbolic algorithms for quantitative reactive synthesis. We consider resource-constrained robotic manipulators that need to interact with a human to achieve a complex task expressed in linear temporal logic. Our framework generates reactive strategies that not only guarantee task completion but also seek cooperation with the human when possible. We model the interaction as a two-player game and consider regret-minimizing strategies to encourage cooperation. We use symbolic representation of the game to enable scalability. For synthesis, we first introduce value iteration algorithms for such games with min-max objectives. Then, we extend our method to the regret-minimizing objectives. Our benchmarks reveal that our symbolic framework not only significantly improves computation time (up to an order of magnitude) but also can scale up to much larger instances of manipulation problems with up to 2x number of objects and locations than the state of the art.
The Skolem problem is a long-standing open problem in linear dynamical systems: can a linear recurrence sequence (LRS) ever reach 0 from a given initial configuration? Similarly, the positivity problem asks whether the LRS stays positive from an initial configuration. Deciding Skolem (or positivity) has been open for half a century: the best known decidability results are for LRS with special properties (e.g., low order recurrences). But these problems are easier for ``uninitialized'' variants, where the initial configuration is not fixed but can vary arbitrarily: checking if there is an initial configuration from which the LRS stays positive can be decided in polynomial time (Tiwari in 2004, Braverman in 2006). In this paper, we consider problems that lie between the initialized and uninitialized variant. More precisely, we ask if 0 (resp. negative numbers) can be avoided from every initial configuration in a neighborhood of a given initial configuration. This can be considered as a robust variant of the Skolem (resp. positivity) problem. We show that these problems lie at the frontier of decidability: if the neighbourhood is given as part of the input, then robust Skolem and robust positivity are Diophantine hard, i.e., solving either would entail major breakthrough in Diophantine approximations, as happens for (non-robust) positivity. However, if one asks whether such a neighbourhood exists, then the problems turn out to be decidable with PSPACE complexity. Our techniques also allow us to tackle robustness for ultimate positivity, which asks whether there is a bound on the number of steps after which the LRS remains positive. There are two variants depending on whether we ask for a ``uniform'' bound on this number of steps. For the non-uniform variant, when the neighbourhood is open, the problem turns out to be tractable, even when the neighbourhood is given as input.
Transport engineers employ various interventions to enhance traffic-network performance. Quantifying the impacts of Cycle Superhighways is complicated due to the non-random assignment of such an intervention over the transport network. Treatment effects on asymmetric and heavy-tailed distributions are better reflected at extreme tails rather than at the median. We propose a novel method to estimate the treatment effect at extreme tails incorporating heavy-tailed features in the outcome distribution. The analysis of London transport data using the proposed method indicates that the extreme traffic flow increased substantially after Cycle Superhighways came into operation.
The ensemble data assimilation of computational fluid dynamics simulations based on the lattice Boltzmann method (LBM) and the local ensemble transform Kalman filter (LETKF) is implemented and optimized on a GPU supercomputer based on NVIDIA A100 GPUs. To connect the LBM and LETKF parts, data transpose communication is optimized by overlapping computation, file I/O, and communication based on data dependency in each LETKF kernel. In two dimensional forced isotropic turbulence simulations with the ensemble size of $M=64$ and the number of grid points of $N_x=128^2$, the optimized implementation achieved $\times3.80$ speedup from the naive implementation, in which the LETKF part is not parallelized. The main computing kernel of the local problem is the eigenvalue decomposition (EVD) of $M\times M$ real symmetric dense matrices, which is computed by a newly developed batched EVD in $\verb|EigenG|$. The batched EVD in $\verb|EigenG|$ outperforms that in $\verb|cuSOLVER|$, and $\times65.3$ speedup was achieved.
Numerical methods for Inverse Kinematics (IK) employ iterative, linear approximations of the IK until the end-effector is brought from its initial pose to the desired final pose. These methods require the computation of the Jacobian of the Forward Kinematics (FK) and its inverse in the linear approximation of the IK. Despite all the successful implementations reported in the literature, Jacobian-based IK methods can still fail to preserve certain useful properties if an improper matrix inverse, e.g. Moore-Penrose (MP), is employed for incommensurate robotic systems. In this paper, we propose a systematic, robust and accurate numerical solution for the IK problem using the Mixed (MX) Generalized Inverse (GI) applied to any type of Jacobians (e.g., analytical, numerical or geometric) derived for any commensurate and incommensurate robot. This approach is robust to whether the system is under-determined (less than 6 DoF) or over-determined (more than 6 DoF). We investigate six robotics manipulators with various Degrees of Freedom (DoF) to demonstrate that commonly used GI's fail to guarantee the same system behaviors when the units are varied for incommensurate robotics manipulators. In addition, we evaluate the proposed methodology as a global IK solver and compare against well-known IK methods for redundant manipulators. Based on the experimental results, we conclude that the right choice of GI is crucial in preserving certain properties of the system (i.e. unit-consistency).
We survey analytical methods and evaluation results for the performance assessment of caching strategies. Knapsack solutions are derived, which provide static caching bounds for independent requests and general bounds for dynamic caching under arbitrary request pattern. We summarize Markov- and time-to-live-based solutions, which assume specific stochastic processes for capturing web request streams and timing. We compare the performance of caching strategies with different knowledge about the properties of data objects regarding a broad set of caching demands. The efficiency of web caching must regard benefits for network wide traffic load, energy consumption and quality-of-service aspects in a tradeoff with costs for updating and storage overheads.
Statistical network analysis primarily focuses on inferring the parameters of an observed network. In many applications, especially in the social sciences, the observed data is the groups formed by individual subjects. In these applications, the network is itself a parameter of a statistical model. Zhao and Weko (2019) propose a model-based approach, called the hub model, to infer implicit networks from grouping behavior. The hub model assumes that each member of the group is brought together by a member of the group called the hub. The set of members which can serve as a hub is called the hub set. The hub model belongs to the family of Bernoulli mixture models. Identifiability of Bernoulli mixture model parameters is a notoriously difficult problem. This paper proves identifiability of the hub model parameters and estimation consistency under mild conditions. Furthermore, this paper generalizes the hub model by introducing a model component that allows hubless groups in which individual nodes spontaneously appear independent of any other individual. We refer to this additional component as the null component. The new model bridges the gap between the hub model and the degenerate case of the mixture model -- the Bernoulli product. Identifiability and consistency are also proved for the new model. In addition, a penalized likelihood approach is proposed to estimate the hub set when it is unknown.
This is part II of a two-part paper. Part I presented a universal Birkhoff theory for fast and accurate trajectory optimization. The theory rested on two main hypotheses. In this paper, it is shown that if the computational grid is selected from any one of the Legendre and Chebyshev family of node points, be it Lobatto, Radau or Gauss, then, the resulting collection of trajectory optimization methods satisfy the hypotheses required for the universal Birkhoff theory to hold. All of these grid points can be generated at an $\mathcal{O}(1)$ computational speed. Furthermore, all Birkhoff-generated solutions can be tested for optimality by a joint application of Pontryagin's- and Covector-Mapping Principles, where the latter was developed in Part~I. More importantly, the optimality checks can be performed without resorting to an indirect method or even explicitly producing the full differential-algebraic boundary value problem that results from an application of Pontryagin's Principle. Numerical problems are solved to illustrate all these ideas. The examples are chosen to particularly highlight three practically useful features of Birkhoff methods: (1) bang-bang optimal controls can be produced without suffering any Gibbs phenomenon, (2) discontinuous and even Dirac delta covector trajectories can be well approximated, and (3) extremal solutions over dense grids can be computed in a stable and efficient manner.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.