Developing robot controllers in a simulated environment is advantageous but transferring the controllers to the target environment presents challenges, often referred to as the "sim-to-real gap". We present a method for continuous improvement of modeling and control after deploying the robot to a dynamically-changing target environment. We develop a differentiable physics simulation framework that performs online system identification and optimal control simultaneously, using the incoming observations from the target environment in real time. To ensure robust system identification against noisy observations, we devise an algorithm to assess the confidence of our estimated parameters, using numerical analysis of the dynamic equations. To ensure real-time optimal control, we adaptively schedule the optimization window in the future so that the optimized actions can be replenished faster than they are consumed, while staying as up-to-date with new sensor information as possible. The constant re-planning based on a constantly improved model allows the robot to swiftly adapt to the changing environment and utilize real-world data in the most sample-efficient way. Thanks to a fast differentiable physics simulator, the optimization for both system identification and control can be solved efficiently for robots operating in real time. We demonstrate our method on a set of examples in simulation and show that our results are favorable compared to baseline methods.
We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.
We present a method to simulate movement in interaction with computers, using Model Predictive Control (MPC). The method starts from understanding interaction from an Optimal Feedback Control (OFC) perspective. We assume that users aim to minimize an internalized cost function, subject to the constraints imposed by the human body and the interactive system. In contrast to previous linear approaches used in HCI, MPC can compute optimal controls for nonlinear systems. This allows us to use state-of-the-art biomechanical models and handle nonlinearities that occur in almost any interactive system. Instead of torque actuation, our model employs second-order muscles acting directly at the joints. We compare three different cost functions and evaluate the simulated trajectories against user movements in a Fitts' Law type pointing study with four different interaction techniques. Our results show that the combination of distance, control, and joint acceleration cost matches individual users' movements best, and predicts movements with an accuracy that is within the between-user variance. To aid HCI researchers and designers, we introduce CFAT, a novel method to identify maximum voluntary torques in joint-actuated models based on experimental data, and give practical advice on how to simulate human movement for different users, interaction techniques, and tasks.
Industrial Control Systems (ICSs) rely on insecure protocols and devices to monitor and operate critical infrastructure. Prior work has demonstrated that powerful attackers with detailed system knowledge can manipulate exchanged sensor data to deteriorate performance of the process, even leading to full shutdowns of plants. Identifying those attacks requires iterating over all possible sensor values, and running detailed system simulation or analysis to identify optimal attacks. That setup allows adversaries to identify attacks that are most impactful when applied on the system for the first time, before the system operators become aware of the manipulations. In this work, we investigate if constrained attackers without detailed system knowledge and simulators can identify comparable attacks. In particular, the attacker only requires abstract knowledge on general information flow in the plant, instead of precise algorithms, operating parameters, process models, or simulators. We propose an approach that allows single-shot attacks, i.e., near-optimal attacks that are reliably shutting down a system on the first try. The approach is applied and validated on two use cases, and demonstrated to achieve comparable results to prior work, which relied on detailed system information and simulations.
The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model, where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates (1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via back-propagation through time, and (2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the energy-based model simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the energy-based model. We demonstrate the proposed methods on autonomous driving tasks, and show that they can learn suitable cost functions for optimal control.
Collision avoidance is a widely investigated topic in robotic applications. When applying collision avoidance techniques to a mobile robot, how to deal with the spatial structure of the robot still remains a challenge. In this paper, we design a configuration-aware safe control law by solving a Quadratic Programming (QP) with designed Control Barrier Functions (CBFs) constraints, which can safely navigate a mobile robotic arm to a desired region while avoiding collision with environmental obstacles. The advantage of our approach is that it correctly and in an elegant way incorporates the spatial structure of the mobile robotic arm. This is achieved by merging geometric restrictions among mobile robotic arm links into CBFs constraints. Simulations on a rigid rod and the modeled mobile robotic arm are performed to verify the feasibility and time-efficiency of proposed method. Numerical results about the time consuming for different degrees of freedom illustrate that our method scales well with dimension.
Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of the overlap-add algorithm in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, current frame-online systems only partially leverage this future information. To fully exploit this information, this study proposes an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a novel loss function to account for the scale difference between predicted and oracle target signals. Evaluations results on a noisy-reverberant speech enhancement task show the effectiveness of the proposed algorithms.
In this work, we develop quantization and variable-length source codecs for the feedback links in linear-quadratic-Gaussian (LQG) control systems. We prove that for any fixed control performance, the approaches we propose nearly achieve lower bounds on communication cost that have been established in prior work. In particular, we refine the analysis of a classical achievability approach with an eye towards more practical details. Notably, in the prior literature the source codecs used to demonstrate the (near) achievability of these lower bounds are often implicitly assumed to be time-varying. For single-input single-output (SISO) plants, we prove that it suffices to consider time-invariant quantization and source coding. This result follows from analyzing the long-term stochastic behavior of the system's quantized measurements and reconstruction errors. To our knowledge, this time-invariant achievability result is the first in the literature.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
Multi-camera vehicle tracking is one of the most complicated tasks in Computer Vision as it involves distinct tasks including Vehicle Detection, Tracking, and Re-identification. Despite the challenges, multi-camera vehicle tracking has immense potential in transportation applications including speed, volume, origin-destination (O-D), and routing data generation. Several recent works have addressed the multi-camera tracking problem. However, most of the effort has gone towards improving accuracy on high-quality benchmark datasets while disregarding lower camera resolutions, compression artifacts and the overwhelming amount of computational power and time needed to carry out this task on its edge and thus making it prohibitive for large-scale and real-time deployment. Therefore, in this work we shed light on practical issues that should be addressed for the design of a multi-camera tracking system to provide actionable and timely insights. Moreover, we propose a real-time city-scale multi-camera vehicle tracking system that compares favorably to computationally intensive alternatives and handles real-world, low-resolution CCTV instead of idealized and curated video streams. To show its effectiveness, in addition to integration into the Regional Integrated Transportation Information System (RITIS), we participated in the 2021 NVIDIA AI City multi-camera tracking challenge and our method is ranked among the top five performers on the public leaderboard.
We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.