丰满人妻被公侵犯高清版_视频一区二区三区黄色视频_日本本亚洲三级在线播放_91久久久久精品一区二区_高清无码中字在线一区二区_好爽又高潮了十分钟试看_最刺激黄A大片免费观看网站

In this paper consider a two user multiple access channel with noisy feedback. There are two senders with independent messages who transmit symbols across an additive white Gaussian channel to a receiver, who in turn sends back a symbol which is received by the two senders through two independent noisy Gaussian channels. We consider the case when the feedback is active i.e. the receiver actively encodes the feedback using a linear state process. We pose this as a problem of linear sequential coding at the senders and the receiver to minimize the terminal mean square probability of error at the receiver. This is an instance of decentralized control with no common information at the senders and the receiver. In this paper, we construct two linear controllers at the sender and the receiver. Due to linearity of the policies and the controllers, all the random variables involved are jointly Gaussian. Moreover, the corresponding covariance matrix at the receiver of the estimation process of the senders' messages is a deterministic process, which is a function of the parameters of the controllers and the strategies of the players, and is thus perfectly observed by the senders. Based on this observation, we use deterministic dynamic programming to find the optimal policies and the optimal linear controllers at both the senders and the receiver. The problem with passive feedback can be considered as a special case.

相關內容

線性的

關注 1

控制器 · 部分可觀測馬爾可夫決策過程 · 學成 · 近似 · 馬爾可夫鏈 ·

2022 年 2 月 22 日

Learning to Control Partially Observed Systems with Finite Memory

Semih Cayci,Niao He,R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.

Principle · MoDELS · Buffer（公司） · Networking · Performer ·

2022 年 2 月 21 日

PAQMAN: A Principled Approach to Active Queue Management

Sounak Kar,Bastian Alt,Heinz Koeppl,Amr Rizk

Active Queue Management (AQM) aims to prevent bufferbloat and serial drops in router and switch FIFO packet buffers that usually employ drop-tail queueing. AQM describes methods to send proactive feedback to TCP flow sources to regulate their rate using selective packet drops or markings. Traditionally, AQM policies relied on heuristics to approximately provide Quality of Service (QoS) such as a target delay for a given flow. These heuristics are usually based on simple network and TCP control models together with the monitored buffer filling. A primary drawback of these heuristics is that their way of accounting flow characteristics into the feedback mechanism and the corresponding effect on the state of congestion are not well understood. In this work, we show that taking a probabilistic model for the flow rates and the dequeueing pattern, a Semi-Markov Decision Process (SMDP) can be formulated to obtain an optimal packet dropping policy. This policy-based AQM, denoted PAQMAN, takes into account a steady-state model of TCP and a target delay for the flows. Additionally, we present an inference algorithm that builds on TCP congestion control in order to calibrate the model parameters governing underlying network conditions. Finally, we evaluate the performance of our approach using simulation compared to state-of-the-art AQM algorithms.

分解的 · 核化 · FAST · 線性的 · 正定矩陣 ·

2022 年 2 月 21 日

I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels

Olivier Beaumont,Lionel Eyraud-Dubois,Mathieu Vérité,Julien Langou

In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-$k$ update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we consider a machine model with a fast memory of size $S$ and an unbounded slow memory. In this model, all computations must be performed on operands in fast memory, and the goal is to minimize the amount of communication between slow and fast memories. As the set of computations is fixed by the choice of the algorithm, only the ordering of the computations (the schedule) directly influences the volume of communications.We prove lower bounds of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}}$ for the communication volume of the Cholesky factorization of an $N\times N$ symmetric positive definite matrix, and of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}}$ for the SYRK computation of $\mat{A}\cdot\transpose{\mat{A}}$, where $\mathbf{A}$ is an $N\times M$ matrix. Both bounds improve the best known lower bounds from the literature by a factor $\sqrt{2}$.In addition, we present two out-of-core, sequential algorithms with matching communication volume: \TBS for SYRK, with a volume of $\frac{1}{\sqrt{2}}\frac{N^2M}{\sqrt{S}} + \bigo{NM\log N}$, and \LBC for Cholesky, with a volume of $\frac{1}{3\sqrt{2}}\frac{N^3}{\sqrt{S}} + \bigo{N^{5/2}}$. Both algorithms improve over the best known algorithms from the literature by a factor $\sqrt{2}$, and prove that the leading terms in our lower bounds cannot be improved further. This work shows that the operational intensity of symmetric kernels like SYRK or Cholesky is intrinsically higher (by a factor $\sqrt{2}$) than that of corresponding non-symmetric kernels (GEMM and LU factorization).

控制器 · 泛函 · 目標函數 · MoDELS · 優化器 ·

2022 年 2 月 21 日

Safe Learning-based Gradient-free Model Predictive Control Based on Cross-entropy Method

Lei Zheng,Rui Yang,Zhixuan Wu,Jiesen Pan,Hui Cheng

from arxiv, 21 pages, 11 figures, Accepted for publication in Engineering Applications of Artificial Intelligence

In this paper, a safe and learning-based control framework for model predictive control (MPC) is proposed to optimize nonlinear systems with a non-differentiable objective function under uncertain environmental disturbances. The control framework integrates a learning-based MPC with an auxiliary controller in a way of minimal intervention. The learning-based MPC augments the prior nominal model with incremental Gaussian Processes to learn the uncertain disturbances. The cross-entropy method (CEM) is utilized as the sampling-based optimizer for the MPC with a non-differentiable objective function. A minimal intervention controller is devised with a control Lyapunov function and a control barrier function to guide the sampling process and endow the system with high probabilistic safety. The proposed algorithm shows a safe and adaptive control performance on a simulated quadrotor in the tasks of trajectory tracking and obstacle avoidance under uncertain wind disturbances.

INFORMS · 通道 · Processing（編程語言） · 有向 · 信息理論 ·

2022 年 2 月 20 日

Upper Bounds on the Feedback Error Exponent of Channels With States and Memory

Mohsen Heidari,Achilleas Anastasopoulos,S. Sandeep Pradhan

As a class of state-dependent channels, Markov channels have been long studied in information theory for characterizing the feedback capacity and error exponent. This paper studies a more general variant of such channels where the state evolves via a general stochastic process, not necessarily Markov or ergodic. The states are assumed to be unknown to the transmitter and the receiver, but the underlying probability distributions are known. For this setup, we derive an upper bound on the feedback error exponent and the feedback capacity with variable-length codes. The bounds are expressed in terms of the directed mutual information and directed relative entropy. The bounds on the error exponent are simplified to Burnashev's expression for discrete memoryless channels. Our method relies on tools from the theory of martingales to analyze a stochastic process defined based on the entropy of the message given the past channel's outputs.

優化器 · 塊 · 最優化 · 可約的 · 拉格朗日乘子 ·

2022 年 2 月 20 日

Practical Interference Exploitation Precoding without Symbol-by-Symbol Optimization: A Block-Level Approach

Ang Li,Chao Shen,Xuewen Liao,Christos Masouros,A. Lee Swindlehurst

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

In this paper, we propose a constructive interference (CI)-based block-level precoding (CI-BLP) approach for the downlink of a multi-user multiple-input single-output (MU-MISO) communication system. Contrary to existing CI precoding approaches which have to be designed on a symbol-by-symbol level, here a constant precoding matrix is applied to a block of symbol slots within a channel coherence interval, thus significantly reducing the computational costs over traditional CI-based symbol-level precoding (CI-SLP) as the CI-BLP optimization problem only needs to be solved once per block. For both PSK and QAM modulation, we formulate an optimization problem to maximize the minimum CI effect over the block subject to a block- rather than symbol-level power budget. We mathematically derive the optimal precoding matrix for CI-BLP as a function of the Lagrange multipliers in closed form. By formulating the dual problem, the original CI-BLP optimization problem is further shown to be equivalent to a quadratic programming (QP) optimization. Numerical results validate our derivations, and show that the proposed CI-BLP scheme achieves improved performance over the traditional CI-SLP method, thanks to the relaxed power constraint over the considered block of symbol slots.

INFORMS · 可理解性 · Fisher信息矩陣 · 學成 · 示例 ·

2022 年 2 月 19 日

Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems

Ingvar Ziemann,Henrik Sandberg

from arxiv, v2: fixed metadata

This paper presents local minimax regret lower bounds for adaptively controlling linear-quadratic-Gaussian (LQG) systems. We consider smoothly parametrized instances and provide an understanding of when logarithmic regret is impossible which is both instance specific and flexible enough to take problem structure into account. This understanding relies on two key notions: That of local-uninformativeness; when the optimal policy does not provide sufficient excitation for identification of the optimal policy, and yields a degenerate Fisher information matrix; and that of information-regret-boundedness, when the small eigenvalues of a policy-dependent information matrix are boundable in terms of the regret of that policy. Combined with a reduction to Bayesian estimation and application of Van Trees' inequality, these two conditions are sufficient for proving regret bounds on order of magnitude $\sqrt{T}$ in the time horizon, $T$. This method yields lower bounds that exhibit tight dimensional dependencies and scale naturally with control-theoretic problem constants. For instance, we are able to prove that systems operating near marginal stability are fundamentally hard to learn to control. We further show that large classes of systems satisfy these conditions, among them any state-feedback system with both $A$- and $B$-matrices unknown. Most importantly, we also establish that a nontrivial class of partially observable systems, essentially those that are over-actuated, satisfy these conditions, thus providing a $\sqrt{T}$ lower bound also valid for partially observable systems. Finally, we turn to two simple examples which demonstrate that our lower bound captures classical control-theoretic intuition: our lower bounds diverge for systems operating near marginal stability or with large filter gain -- these can be arbitrarily hard to (learn to) control.

CASE · 約束 · MoDELS · 情景 · 方陣 ·

2022 年 2 月 16 日

On the minimax rate of the Gaussian sequence model under bounded convex constraints

Matey Neykov

from arxiv, 24 pages; updated citations; fixed a typo

We determine the exact minimax rate of a Gaussian sequence model under bounded convex constraints, purely in terms of the local geometry of the given constraint set $K$. Our main result shows that the minimax risk (up to constant factors) under the squared $L_2$ loss is given by $\epsilon^{*2} \wedge \operatorname{diam}(K)^2$ with \begin{align*} \epsilon^* = \sup \bigg\{\epsilon : \frac{\epsilon^2}{\sigma^2} \leq \log M^{\operatorname{loc}}(\epsilon)\bigg\}, \end{align*} where $\log M^{\operatorname{loc}}(\epsilon)$ denotes the local entropy of the set $K$, and $\sigma^2$ is the variance of the noise. We utilize our abstract result to re-derive known minimax rates for some special sets $K$ such as hyperrectangles, ellipses, and more generally quadratically convex orthosymmetric sets. Finally, we extend our results to the unbounded case with known $\sigma^2$ to show that the minimax rate in that case is $\epsilon^{*2}$.

評分函數 · 圖 · 學成 · 泛函 · 得分 ·

2020 年 3 月 19 日

Causal Discovery with Reinforcement Learning

Shengyu Zhu,Ignavier Ng,Zhitang Chen

from arxiv, Camera-ready version for ICLR 2020 (oral). Codes, datasets, and training logs have been made available at //github.com/huawei-noah/trustworthyAI/tree/master/Causal_Structure_Learning/Causal_Discovery_RL

Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. Traditional score-based casual discovery methods rely on various local heuristics to search for a Directed Acyclic Graph (DAG) according to a predefined score function. While these methods, e.g., greedy equivalence search, may have attractive results with infinite samples and certain model assumptions, they are usually less satisfactory in practice due to finite data and possible violation of assumptions. Motivated by recent advances in neural combinatorial optimization, we propose to use Reinforcement Learning (RL) to search for the DAG with the best scoring. Our encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. The reward incorporates both the predefined score function and two penalty terms for enforcing acyclicity. In contrast with typical RL applications where the goal is to learn a policy, we use RL as a search strategy and our final output would be the graph, among all graphs generated during training, that achieves the best reward. We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint.

優化器 · 強化學習 · 學成 · state-of-the-art · SimPLe ·

2018 年 7 月 25 日

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.