We study the algorithmic problem of optimally covering a tree with $k$ mobile robots. The tree is known to all robots, and our goal is to assign a walk to each robot in such a way that the union of these walks covers the whole tree. We assume that the edges have the same length, and that traveling along an edge takes a unit of time. Two objective functions are considered: the cover time and the cover length. The cover time is the maximum time a robot needs to finish its assigned walk and the cover length is the sum of the lengths of all the walks. We also consider a variant in which the robots must rendezvous periodically at the same vertex in at most a certain number of moves. We show that the problem is different for the two cost functions. For the cover time minimization problem, we prove that the problem is NP-hard when $k$ is part of the input, regardless of whether periodic rendezvous are required or not. For the cover length minimization problem, we show that it can be solved in polynomial time when periodic rendezvous are not required, and it is NP-hard otherwise.
The change-point detection problem has been widely studied in time series and signal processing literature. The current methods can be resumed in the search for the appropiate partitions of a whole time series such that the problem can be approached as one of optimization; nevertheless, an exact optimization approach could result computationally expensive and approximate ones discard potential scenarios for change-points configurations in a non-rigorous manner. Thus, a framework it is presented to detect change-points in a univariate time series using a decision criterion based on the Minimum Description Length (MDL), modified such that a Bayesian analysis is included. To search for the points of change, the times where mean value deviations occur (exceedances) are analyzed and then it is evaluated which of these could constitute a change-point through a genetic algorithm using as a fitness function the previously described MDL. The effectiveness of the method it is assessed through a simulation study and on the other hand, it is analyzed its practical validity in a real dataset for the presence of Particulate Matter of less than 2.5 microns (PM2.5) in Bogot\'a, Colombia for the 2018-2020 period under different settings to understand the algorithm convergence. It is found that this definition for the objective function tends to find better results for both the number of change-points and their location in the series for most of cases reducing the error in comparison to other available methods in the literature.
We introduce a multiple testing method that controls the median of the proportion of false discoveries (FDP) in a flexible way. Our method only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Benjamini-Hochberg requires choosing the target FDP alpha before looking at the data, but our method does not. For example, if using alpha=0.05 leads to no discoveries, alpha can be increased to 0.1. We further provide mFDP-adjusted p-values, which consequently also have a post hoc interpretation. The method does not assume independence and was valid in all considered simulation scenarios. The procedure is inspired by the popular estimator of the total number of true hypotheses by Schweder, Spj{\o}tvoll and Storey. We adapt this estimator to provide a median unbiased estimator of the FDP, first assuming that a fixed rejection threshold is used. Taking this as a starting point, we proceed to construct simultaneously median unbiased estimators of the FDP. This simultaneity allows for the claimed flexibility. Our method is powerful and its time complexity is linear in the number of hypotheses, after sorting the p-values.
We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation.
The utilization of renewable energy technologies, particularly hydrogen, has seen a boom in interest and has spread throughout the world. Ethanol steam reformation is one of the primary methods capable of producing hydrogen efficiently and reliably. This paper provides an in-depth study of the reformulated system both theoretically and numerically, as well as a plan to explore the possibility of converting the system into its conservation form. Lastly, we offer an overview of several numerical approaches for solving the general first-order quasi-linear hyperbolic equation to the particular model for ethanol steam reforming (ESR). We conclude by presenting some results that would enable the usage of these ODE/PDE solvers to be used in non-linear model predictive control (NMPC) algorithms and discuss the limitations of our approach and directions for future work.
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.
Compared to on-policy policy gradient techniques, off-policy model-free deep reinforcement learning (RL) that uses previously gathered data can improve sampling efficiency. However, off-policy learning becomes challenging when the discrepancy between the distributions of the policy of interest and the policies that collected the data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories that increases the computational complexity and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences. Moreover, their generalization to continuous action domains is strictly limited as they require action probabilities, which is unsuitable for deterministic policies. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy. Our method offers an adequate single-step off-policy correction without any probability estimates, and theoretical results show that it can achieve a contraction mapping with a fixed unique point, which allows "safe" off-policy learning. An extensive set of empirical results indicate that our algorithm substantially improves the state-of-the-art and attains higher returns in fewer steps than the competing methods by efficiently scheduling the learning rate in Q-learning and policy optimization.
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.
We propose an efficient way of solving optimal control problems for rigid-body systems on the basis of inverse dynamics and the multiple-shooting method. We treat all variables, including the state, acceleration, and control input torques, as optimization variables and treat the inverse dynamics as an equality constraint. We eliminate the update of the control input torques from the linear equation of Newton's method by applying condensing for inverse dynamics. The size of the resultant linear equation is the same as that of the multiple-shooting method based on forward dynamics except for the variables related to the passive joints and contacts. Compared with the conventional methods based on forward dynamics, the proposed method reduces the computational cost of the dynamics and their sensitivities by utilizing the recursive Newton-Euler algorithm (RNEA) and its partial derivatives. In addition, it increases the sparsity of the Hessian of the Karush-Kuhn-Tucker conditions, which reduces the computational cost, e.g., of Riccati recursion. Numerical experiments show that the proposed method outperforms state-of-the-art implementations of differential dynamic programming based on forward dynamics in terms of computational time and numerical robustness.
This work provides a theoretical analysis for optimally solving the pose estimation problem using total least squares for vector observations from landmark features, which is central to applications involving simultaneous localization and mapping. First, the optimization process is formulated with observation vectors extracted from point-cloud features. Then, error-covariance expressions are derived. The attitude and position estimates obtained via the derived optimization process are proven to reach the bounds defined by the Cram\'er-Rao lower bound under the small-angle approximation of attitude errors. A fully populated observation noise-covariance matrix is assumed as the weight in the cost function to cover the most general case of the sensor uncertainty. This includes more generic correlations in the errors than previous cases involving an isotropic noise assumption. The proposed solution is verified using Monte Carlo simulations and an experiment with an actual LIDAR to validate the error-covariance analysis.
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.