This paper presents an approach for trajectory-centric learning control based on contraction metrics and disturbance estimation for nonlinear systems subject to matched uncertainties. The approach allows for the use of a broad class of model learning tools including deep neural networks to learn uncertain dynamics while still providing guarantees of transient tracking performance throughout the learning phase, including the special case of no learning. Within the proposed approach, a disturbance estimation law is proposed to estimate the pointwise value of the uncertainty, with pre-computable estimation error bounds (EEBs). The learned dynamics, the estimated disturbances, and the EEBs are then incorporated in a robust Riemannian energy condition to compute the control law that guarantees exponential convergence of actual trajectories to desired ones throughout the learning phase, even when the learned model is poor. On the other hand, with improved accuracy, the learned model can be incorporated in a high-level planner to plan better trajectories with improved performance, e.g., lower energy consumption and shorter travel time. The proposed framework is validated on a planar quadrotor navigation example.
The scope of this paper is the analysis and approximation of an optimal control problem related to the Allen-Cahn equation. A tracking functional is minimized subject to the Allen-Cahn equation using distributed controls that satisfy point-wise control constraints. First and second order necessary and sufficient conditions are proved. The lowest order discontinuous Galerkin - in time - scheme is considered for the approximation of the control to state and adjoint state mappings. Under a suitable restriction on maximum size of the temporal and spatial discretization parameters $k$, $h$ respectively in terms of the parameter $\epsilon$ that describes the thickness of the interface layer, a-priori estimates are proved with constants depending polynomially upon $1/ \epsilon$. Unlike to previous works for the uncontrolled Allen-Cahn problem our approach does not rely on a construction of an approximation of the spectral estimate, and as a consequence our estimates are valid under low regularity assumptions imposed by the optimal control setting. These estimates are also valid in cases where the solution and its discrete approximation do not satisfy uniform space-time bounds independent of $\epsilon$. These estimates and a suitable localization technique, via the second order condition (see \cite{Arada-Casas-Troltzsch_2002,Casas-Mateos-Troltzsch_2005,Casas-Raymond_2006,Casas-Mateos-Raymond_2007}), allows to prove error estimates for the difference between local optimal controls and their discrete approximation as well as between the associated state and adjoint state variables and their discrete approximations
We consider a hierarchical edge-cloud architecture in which services are provided to mobile users as chains of virtual network functions. Each service has specific computation requirements and target delay performance, which require placing the corresponding chain properly and allocating a suitable amount of computing resources. Furthermore, chain migration may be necessary to meet the services' target delay, or convenient to keep the service provisioning cost low. We tackle such issues by formalizing the problem of optimal chain placement and resource allocation in the edge-cloud continuum, taking into account migration, bandwidth, and computation costs. Specifically, we first envision an algorithm that, leveraging resource augmentation, addresses the above problem and provides an upper bound to the amount of resources required to find a feasible solution. We use this algorithm as a building block to devise an efficient approach targeting the minimum-cost solution, while minimizing the required resource augmentation. Our results, obtained through trace-driven, large-scale simulations, show that our solution can provide a feasible solution by using half the amount of resources required by state-of-the-art alternatives.
We tackle the problem of flying time-optimal trajectories through multiple waypoints with quadrotors. State-of-the-art solutions split the problem into a planning task - where a global, time-optimal trajectory is generated - and a control task - where this trajectory is accurately tracked. However, at the current state, generating a time-optimal trajectory that considers the full quadrotor model requires solving a difficult time allocation problem via optimization, which is computationally demanding (in the order of minutes or even hours). This is detrimental for replanning in presence of disturbances. We overcome this issue by solving the time allocation problem and the control problem concurrently via Model Predictive Contouring Control (MPCC). Our MPCC optimally selects the future states of the platform at runtime, while maximizing the progress along the reference path and minimizing the distance to it. We show that, even when tracking simplified trajectories, the proposed MPCC results in a path that approaches the true time-optimal one, and which can be generated in real-time. We validate our approach in the real world, where we show that our method outperforms both the current state-of-the-art and a world-class human pilot in terms of lap time achieving speeds of up to 60 km/h.
Understanding the global dynamics of a robot controller, such as identifying attractors and their regions of attraction (RoA), is important for safe deployment and synthesizing more effective hybrid controllers. This paper proposes a topological framework to analyze the global dynamics of robot controllers, even data-driven ones, in an effective and explainable way. It builds a combinatorial representation representing the underlying system's state space and non-linear dynamics, which is summarized in a directed acyclic graph, the Morse graph. The approach only probes the dynamics locally by forward propagating short trajectories over a state-space discretization, which needs to be a Lipschitz-continuous function. The framework is evaluated given either numerical or data-driven controllers for classical robotic benchmarks. It is compared against established analytical and recent machine learning alternatives for estimating the RoAs of such controllers. It is shown to outperform them in accuracy and efficiency. It also provides deeper insights as it describes the global dynamics up to the discretization's resolution. This allows to use the Morse graph to identify how to synthesize controllers to form improved hybrid solutions or how to identify the physical limitations of a robotic system.
In experiments to estimate parameters of a parametric model, Bayesian experiment design allows measurement settings to be chosen based on utility, which is the predicted improvement of parameter distributions due to modeled measurement results. In this paper we compare information-theory-based utility with three alternative utility algorithms. Tests of these utility alternatives in simulated adaptive measurements demonstrate large improvements in computational speed with slight impacts on measurement efficiency.
Safe deployment of autonomous robots in diverse scenarios requires agents that are capable of efficiently adapting to new environments while satisfying constraints. In this work, we propose a practical and theoretically-justified approach to maintaining safety in the presence of dynamics uncertainty. Our approach leverages Bayesian meta-learning with last-layer adaptation. The expressiveness of neural-network features trained offline, paired with efficient last-layer online adaptation, enables the derivation of tight confidence sets which contract around the true dynamics as the model adapts online. We exploit these confidence sets to plan trajectories that guarantee the safety of the system. Our approach handles problems with high dynamics uncertainty, where reaching the goal safely is potentially initially infeasible, by first \textit{exploring} to gather data and reduce uncertainty, before autonomously \textit{exploiting} the acquired information to safely perform the task. Under reasonable assumptions, we prove that our framework guarantees the high-probability satisfaction of all constraints at all times jointly, i.e. over the total task duration. This theoretical analysis also motivates two regularizers of last-layer meta-learning models that improve online adaptation capabilities as well as performance by reducing the size of the confidence sets. We extensively demonstrate our approach in simulation and on hardware.
Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background.
We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.
How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. First, we uncover and demonstrate the bias alleviation property of double actors by building double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. Next, we interestingly find that double actors help improve the exploration ability of the agent. Finally, to mitigate the uncertainty of value estimate from double critics, we further propose to regularize the critic networks under double actors architecture, which gives rise to Double Actors Regularized Critics (DARC) algorithm. Extensive experimental results on challenging continuous control tasks show that DARC significantly outperforms state-of-the-art methods with higher sample efficiency.
Machine learning methods are powerful in distinguishing different phases of matter in an automated way and provide a new perspective on the study of physical phenomena. We train a Restricted Boltzmann Machine (RBM) on data constructed with spin configurations sampled from the Ising Hamiltonian at different values of temperature and external magnetic field using Monte Carlo methods. From the trained machine we obtain the flow of iterative reconstruction of spin state configurations to faithfully reproduce the observables of the physical system. We find that the flow of the trained RBM approaches the spin configurations of the maximal possible specific heat which resemble the near criticality region of the Ising model. In the special case of the vanishing magnetic field the trained RBM converges to the critical point of the Renormalization Group (RG) flow of the lattice model. Our results suggest an alternative explanation of how the machine identifies the physical phase transitions, by recognizing certain properties of the configuration like the maximization of the specific heat, instead of associating directly the recognition procedure with the RG flow and its fixed points. Then from the reconstructed data we deduce the critical exponent associated to the magnetization to find satisfactory agreement with the actual physical value. We assume no prior knowledge about the criticality of the system and its Hamiltonian.