Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.
Federated learning for training models over mobile devices is gaining popularity. Current systems for this task exhibit significant trade-offs between model accuracy, privacy guarantee, and device efficiency. For instance, Oort (OSDI 2021) provides excellent accuracy and efficiency but requires a trusted central server. On the other hand, Orchard (OSDI 2020) provides good accuracy and the rigorous guarantee of differential privacy over an untrusted server, but creates huge overhead for the devices. This paper describes Aero, a new federated learning system that significantly improves this trade-off. Aero guarantees good accuracy, differential privacy over an untrusted server, and keeps the device overhead low. The key idea of Aero is to tune system architecture and design to a specific set of popular, federated learning algorithms. This tuning requires novel optimizations and techniques, e.g., a new protocol to securely aggregate updates from devices. An evaluation of Aero demonstrates that it provides comparable accuracy to plain federated learning (without differential privacy), and it improves efficiency (CPU and network) over Orchard by up to $10^5\times$.
Data collection costs can vary widely across variables in data science tasks. Two-phase designs are often employed to save data collection costs. In two-phase studies, inexpensive variables are collected for all subjects in the first phase, and expensive variables are measured for a subset of subjects in the second phase based on a predetermined sampling rule. The estimation efficiency under two-phase designs relies heavily on the sampling rule. Existing literature primarily focuses on designing sampling rules for estimating a scalar parameter in some parametric models or some specific estimating problems. However, real-world scenarios are usually model-unknown and involve two-phase designs for model-free estimation of a scalar or multi-dimensional parameter. This paper proposes a maximin criterion to design an optimal sampling rule based on semiparametric efficiency bounds. The proposed method is model-free and applicable to general estimating problems. The resulting sampling rule can minimize the semiparametric efficiency bound when the parameter is scalar and improve the bound for every component when the parameter is multi-dimensional. Simulation studies demonstrate that the proposed designs reduce the variance of the resulting estimator in various settings. The implementation of the proposed design is illustrated in a real data analysis.
We develop a machine learning approach to identifying parameters with steady-state solutions, locating such solutions, and determining their linear stability for systems of ordinary differential equations and dynamical systems with parameters. Our approach begins with the construction of target functions that can be used to identify parameters with steady-state solution and the linear stability of such solutions. We design a parameter-solution neural network (PSNN) that couples a parameter neural network and a solution neural network to approximate the target function, and develop efficient algorithms to train the PSNN and to locate steady-state solutions. We also present a theory of approximation of the target function by our PSNN based on the neural network kernel decomposition. Numerical results are reported to show that our approach is robust in identifying the phase boundaries separating different regions in the parameter space corresponding to no solution or different numbers of solutions and in classifying the stability of solutions. These numerical results also validate our analysis. Although the primary focus in this study centers on steady states of parameterized dynamical systems, our approach is applicable generally to finding solutions for parameterized nonlinear systems of algebraic equations. Some potential improvements and future work are discussed.
A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+\epsilon)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem.
In this article, we study nonparametric inference for a covariate-adjusted regression function. This parameter captures the average association between a continuous exposure and an outcome after adjusting for other covariates. In particular, under certain causal conditions, this parameter corresponds to the average outcome had all units been assigned to a specific exposure level, known as the causal dose-response curve. We propose a debiased local linear estimator of the covariate-adjusted regression function, and demonstrate that our estimator converges pointwise to a mean-zero normal limit distribution. We use this result to construct asymptotically valid confidence intervals for function values and differences thereof. In addition, we use approximation results for the distribution of the supremum of an empirical process to construct asymptotically valid uniform confidence bands. Our methods do not require undersmoothing, permit the use of data-adaptive estimators of nuisance functions, and our estimator attains the optimal rate of convergence for a twice differentiable function. We illustrate the practical performance of our estimator using numerical studies and an analysis of the effect of air pollution exposure on cardiovascular mortality.
In this work, we present a multiscale approach for the reliable coarse-scale approximation of spatial network models represented by a linear system of equations with respect to the nodes of a graph. The method is based on the ideas of the Localized Orthogonal Decomposition (LOD) strategy and is constructed in a fully algebraic way. This allows to apply the method to geometrically challenging objects such as corrugated cardboard. In particular, the method can also be applied to finite difference or finite element discretizations of elliptic partial differential equations, yielding an approximation with similar properties as the LOD in the continuous setting. We present a rigorous error analysis of the proposed method under suitable assumptions on the network. Moreover, numerical examples are presented that underline our theoretical results.
The recent success of neural networks in natural language processing has drawn renewed attention to learning sequence-to-sequence (seq2seq) tasks. While there exists a rich literature that studies classification and regression tasks using solvable models of neural networks, seq2seq tasks have not yet been studied from this perspective. Here, we propose a simple model for a seq2seq task that has the advantage of providing explicit control over the degree of memory, or non-Markovianity, in the sequences -- the stochastic switching-Ornstein-Uhlenbeck (SSOU) model. We introduce a measure of non-Markovianity to quantify the amount of memory in the sequences. For a minimal auto-regressive (AR) learning model trained on this task, we identify two learning regimes corresponding to distinct phases in the stationary state of the SSOU process. These phases emerge from the interplay between two different time scales that govern the sequence statistics. Moreover, we observe that while increasing the integration window of the AR model always improves performance, albeit with diminishing returns, increasing the non-Markovianity of the input sequences can improve or degrade its performance. Finally, we perform experiments with recurrent and convolutional neural networks that show that our observations carry over to more complicated neural network architectures.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.