动漫AV观看网站不卡无码,黄色片视频免费观看国产,精品国产V一区二区三区

Consider the problem of power control for an energy harvesting communication system, where the transmitter is equipped with a finite-sized rechargeable battery and is able to look ahead to observe a fixed number of future energy arrivals. An implicit characterization of the maximum average throughput over an additive white Gaussian noise channel and the associated optimal power control policy is provided via the Bellman equation under the assumption that the energy arrival process is stationary and memoryless. A more explicit characterization is obtained for the case of Bernoulli energy arrivals by means of asymptotically tight upper and lower bounds on both the maximum average throughput and the optimal power control policy. Apart from their pivotal role in deriving the desired analytical results, such bounds are highly valuable from a numerical perspective as they can be efficiently computed using convex optimization solvers.

相關內容

優化器

關注 0

欠估計 · 過估計 · 估計誤差 · 估計/估計量 · 人工智能 ·

2022 年 1 月 13 日

A Discrete-Time Switching System Analysis of Q-learning

Donghwan Lee,Jianghai Hu,Niao He

This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic affine switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used. Our analysis also sheds light on the overestimation phenomenon of Q-learning. We further illustrate and validate the analysis through numerical simulations.

優化器 · 同質 · 正則化項 · FAST · 相互獨立的 ·

2022 年 1 月 13 日

Optimal finite elements for ergodic stochastic two-scale elliptic equations

Viet Ha Hoang,Chen Hui Pang,Wee Chin Tan

We develop an essentially optimal finite element approach for solving ergodic stochastic two-scale elliptic equations whose two-scale coefficient may depend also on the slow variable. We solve the limiting stochastic two-scale homogenized equation obtained from the stochastic two-scale convergence in the mean (A. Bourgeat, A. Mikelic and S. Wright, J. reine angew. Math, Vol. 456, 1994), whose solution comprises of the solution to the homogenized equation and the corrector, by truncating the infinite domain of the fast variable and using the sparse tensor product finite elements. We show that the convergence rate in terms of the truncation level is equivalent to that for solving the cell problems in the same truncated domain. Solving this equation, we obtain the solution to the homogenized equation and the corrector at the same time, using only a number of degrees of freedom that is essentially equivalent to that required for solving one cell problem. Optimal complexity is obtained when the corrector possesses sufficient regularity with respect to both the fast and the slow variables. Although the regularity norm of the corrector depends on the size of the truncated domain, we show that the convergence rate of the approximation for the solution to the homogenized equation is independent of the size of the truncated domain. With the availability of an analytic corrector, we construct a numerical corrector for the solution of the original stochastic two-scale equation from the finite element solution to the truncated stochastic two-scale homogenized equation. Numerical examples of quasi-periodic two-scale equations, and a stochastic two-scale equation of the checker board type, whose coefficient is discontinuous, confirm the theoretical results.

優化器 · 可約的 · 粒子群優化算法 · 設計 · 貪心 ·

2022 年 1 月 12 日

Adaptive Scheduling and Trajectory Design for Power-Constrained Wireless UAV Relays

Matthew Bliss,Nicolò Michelusi

from arxiv, Correcting issues regarding system model description

This paper investigates the adaptive trajectory and communication scheduling design for an unmanned aerial vehicle (UAV) relaying random data traffic generated by ground nodes to a base station. The goal is to minimize the expected average communication delay to serve requests, subject to an average UAV mobility power constraint. It is shown that the problem can be cast as a semi-Markov decision process with a two-scale structure, which is optimized efficiently: in the outer decision, the UAV radial velocity for waiting phases and end radius for communication phases optimize the average long-term delay-power trade-off; given outer decisions, inner decisions greedily minimize the instantaneous delay-power cost, yielding the optimal angular velocity in waiting states, and the optimal relay strategy and UAV trajectory in communication states. A constrained particle swarm optimization algorithm is designed to optimize these trajectory problems, demonstrating 100x faster computational speeds than successive convex approximation methods. Simulations demonstrate that an intelligent adaptive design exploiting realistic UAV mobility features, such as helicopter translational lift, reduces the average communication delay and UAV mobility power consumption by 44% and 7%, respectively, with respect to an optimal hovering strategy and by 2% and 13%, respectively, with respect to a greedy delay minimization scheme.

估計/估計量 · tuning · 穩健性 · 真實值 · 解析梯度 ·

2022 年 1 月 12 日

Differentiable Moving Horizon Estimation for Robust Flight Control

Bingheng Wang,Zhengtian Ma,Shupeng Lai,Lin Zhao,Tong Heng Lee

from arxiv, This paper has been accepted for presentation at the 60th IEEE Conference on Decision and Control (CDC2021)

Estimating and reacting to external disturbances is of fundamental importance for robust control of quadrotors. Existing estimators typically require significant tuning or training with a large amount of data, including the ground truth, to achieve satisfactory performance. This paper proposes a data-efficient differentiable moving horizon estimation (DMHE) algorithm that can automatically tune the MHE parameters online and also adapt to different scenarios. We achieve this by deriving the analytical gradient of the estimated trajectory from MHE with respect to the tuning parameters, enabling end-to-end learning for auto-tuning. Most interestingly, we show that the gradient can be calculated efficiently from a Kalman filter in a recursive form. Moreover, we develop a model-based policy gradient algorithm to learn the parameters directly from the trajectory tracking errors without the need for the ground truth. The proposed DMHE can be further embedded as a layer with other neural networks for joint optimization. Finally, we demonstrate the effectiveness of the proposed method via both simulation and experiments on quadrotors, where challenging scenarios such as sudden payload change and flying in downwash are examined.

后向 · 離散化 · 近似 · Principle · CASE ·

2022 年 1 月 12 日

Backward error analysis for variational discretisations of partial differential equations

Robert I McLachlan,Christian Offen

In backward error analysis, an approximate solution to an equation is compared to the exact solution to a nearby "modified" equation. In numerical ordinary differential equations, the two agree up to any power of the step size. If the differential equation has a geometric property then the modified equation may share it. In this way, known properties of differential equations can be applied to the approximation. But for partial differential equations, the known modified equations are of higher order, limiting applicability of the theory. Therefore, we study symmetric solutions of discretized partial differential equations that arise from a discrete variational principle. These symmetric solutions obey infinite-dimensional functional equations. We show that these equations admit second-order modified equations which are Hamiltonian and also possess first-order Lagrangians in modified coordinates. The modified equation and its associated structures are computed explicitly for the case of rotating travelling waves in the nonlinear wave equation.

優化器 · 機器人 · 代價函數 · 估計/估計量 · 成比例 ·

2022 年 1 月 12 日

Object Gathering with a Tethered Robot Duo

Yao Su,Yuhong Jiang,Yixin Zhu,Hangxin Liu

We devise a cooperative planning framework to generate optimal trajectories for a tethered robot duo, who is tasked to gather scattered objects spread in a large area using a flexible net. Specifically, the proposed planning framework first produces a set of dense waypoints for each robot, serving as the initialization for optimization. Next, we formulate an iterative optimization scheme to generate smooth and collision-free trajectories while ensuring cooperation within the robot duo to efficiently gather objects and properly avoid obstacles. We validate the generated trajectories in simulation and implement them in physical robots using Model Reference Adaptive Controller (MRAC) to handle unknown dynamics of carried payloads. In a series of studies, we find that: (i) a U-shape cost function is effective in planning cooperative robot duo, and (ii) the task efficiency is not always proportional to the tethered net's length. Given an environment configuration, our framework can gauge the optimal net length. To our best knowledge, ours is the first that provides such estimation for tethered robot duo.

Extensibility · 約束優化 · 歐氏距離平方 · 約束 · 替代函數 ·

2022 年 1 月 11 日

Extensions to the Proximal Distance Method of Constrained Optimization

Alfonso Landeros,Oscar Hernan Madrid Padilla,Hua Zhou,Kenneth Lange

from arxiv, 42 pages (28 main text, 10 appendices, 4 references), 8 tables, 2 figures

The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $\rho$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $\rho$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.

近似誤差 · 線性的 · 閉式 · 近似 · 分解的 ·

2022 年 1 月 11 日

A Closed-Form Bound on the Asymptotic Linear Convergence of Iterative Methods via Fixed Point Analysis

Trung Vu,Raviv Raich

In many iterative optimization methods, fixed-point theory enables the analysis of the convergence rate via the contraction factor associated with the linear approximation of the fixed-point operator. While this factor characterizes the asymptotic linear rate of convergence, it does not explain the non-linear behavior of these algorithms in the non-asymptotic regime. In this letter, we take into account the effect of the first-order approximation error and present a closed-form bound on the convergence in terms of the number of iterations required for the distance between the iterate and the limit point to reach an arbitrarily small fraction of the initial distance. Our bound includes two terms: one corresponds to the number of iterations required for the linearized version of the fixed-point operator and the other corresponds to the overhead associated with the approximation error. With a focus on the convergence in the scalar case, the tightness of the proposed bound is proven for positively quadratic first-order difference equations.

TSC · 控制器 · 回合 · 強化學習 · 學成 ·

2022 年 1 月 11 日

Towards Real-World Deployment of Reinforcement Learning for Traffic Signal Control

Arthur Müller,Vishal Rangras,Georg Schnittker,Michael Waldmann,Maxim Friesen,Tobias Ferfers,Lukas Schreckenberg,Florian Hufen,Jürgen Jasperneite,Marco Wiering

from arxiv, Paper was accepted by ICMLA 2021 (20th IEEE International Conference on Machine Learning and Applications). Code available under //github.com/RL-INA/LemgoRL

Sub-optimal control policies in intersection traffic signal controllers (TSC) contribute to congestion and lead to negative effects on human health and the environment. Reinforcement learning (RL) for traffic signal control is a promising approach to design better control policies and has attracted considerable research interest in recent years. However, most work done in this area used simplified simulation environments of traffic scenarios to train RL-based TSC. To deploy RL in real-world traffic systems, the gap between simplified simulation environments and real-world applications has to be closed. Therefore, we propose LemgoRL, a benchmark tool to train RL agents as TSC in a realistic simulation environment of Lemgo, a medium-sized town in Germany. In addition to the realistic simulation model, LemgoRL encompasses a traffic signal logic unit that ensures compliance with all regulatory and safety requirements. LemgoRL offers the same interface as the wellknown OpenAI gym toolkit to enable easy deployment in existing research work. To demonstrate the functionality and applicability of LemgoRL, we train a state-of-the-art Deep RL algorithm on a CPU cluster utilizing a framework for distributed and parallel RL and compare its performance with other methods. Our benchmark tool drives the development of RL algorithms towards real-world applications.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.