四虎亚洲精品高清在线观看,尹人香蕉网在线视频观看

Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.

相關內容

蒸餾

關注 5

模型評估 · 隨機采樣 · 樣本 · MoDELS · Learning ·

2023 年 6 月 7 日

CRS-FL: Conditional Random Sampling for Communication-Efficient and Privacy-Preserving Federated Learning

Jianhua Wang,Xiaolin Chang,Jelena Mi?i?,Vojislav B. Mi?i?,Lin Li,Yingying Yao

Federated Learning (FL), a privacy-oriented distributed ML paradigm, is being gaining great interest in Internet of Things because of its capability to protect participants data privacy. Studies have been conducted to address challenges existing in standard FL, including communication efficiency and privacy-preserving. But they cannot achieve the goal of making a tradeoff between communication efficiency and model accuracy while guaranteeing privacy. This paper proposes a Conditional Random Sampling (CRS) method and implements it into the standard FL settings (CRS-FL) to tackle the above-mentioned challenges. CRS explores a stochastic coefficient based on Poisson sampling to achieve a higher probability of obtaining zero-gradient unbiasedly, and then decreases the communication overhead effectively without model accuracy degradation. Moreover, we dig out the relaxation Local Differential Privacy (LDP) guarantee conditions of CRS theoretically. Extensive experiment results indicate that (1) in communication efficiency, CRS-FL performs better than the existing methods in metric accuracy per transmission byte without model accuracy reduction in more than 7% sampling ratio (# sampling size / # model size); (2) in privacy-preserving, CRS-FL achieves no accuracy reduction compared with LDP baselines while holding the efficiency, even exceeding them in model accuracy under more sampling ratio conditions.

Processing（編程語言） · 可理解性 · 核化 · 損失 · 歸納偏好 ·

2023 年 6 月 6 日

Physics Inspired Approaches To Understanding Gaussian Processes

Maximilian P. Niroomand,Luke Dicks,Edward O. Pyzer-Knapp,David J. Wales

from arxiv, 9 pages, 4 figures

Prior beliefs about the latent function to shape inductive biases can be incorporated into a Gaussian Process (GP) via the kernel. However, beyond kernel choices, the decision-making process of GP models remains poorly understood. In this work, we contribute an analysis of the loss landscape for GP models using methods from physics. We demonstrate $\nu$-continuity for Matern kernels and outline aspects of catastrophe theory at critical points in the loss landscape. By directly including $\nu$ in the hyperparameter optimisation for Matern kernels, we find that typical values of $\nu$ are far from optimal in terms of performance, yet prevail in the literature due to the increased computational speed. We also provide an a priori method for evaluating the effect of GP ensembles and discuss various voting approaches based on physical properties of the loss landscape. The utility of these approaches is demonstrated for various synthetic and real datasets. Our findings provide an enhanced understanding of the decision-making process behind GPs and offer practical guidance for improving their performance and interpretability in a range of applications.

泛函 · INFORMS · 互信息 · CASE · 極大 ·

2023 年 6 月 6 日

Functional sufficient dimension reduction through information maximization with application to classification

Xinyu Li,Jianjun Xu,Wenquan Cui,Haoyang Cheng

Considering the case where the response variable is a categorical variable and the predictor is a random function, two novel functional sufficient dimensional reduction (FSDR) methods are proposed based on mutual information and square loss mutual information. Compared to the classical FSDR methods, such as functional sliced inverse regression and functional sliced average variance estimation, the proposed methods are appealing because they are capable of estimating multiple effective dimension reduction directions in the case of a relatively small number of categories, especially for the binary response. Moreover, the proposed methods do not require the restrictive linear conditional mean assumption and the constant covariance assumption. They avoid the inverse problem of the covariance operator which is often encountered in the functional sufficient dimension reduction. The functional principal component analysis with truncation be used as a regularization mechanism. Under some mild conditions, the statistical consistency of the proposed methods is established. It is demonstrated that the two methods are competitive compared with some existing FSDR methods by simulations and real data analyses.

Minimax · Learning · Performer · 統計量 · 線性的 ·

2023 年 6 月 6 日

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Zhenyu Sun,Ermin Wei

from arxiv, Accepted by NeurIPS 2022

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.

Performer · Integration · ISCC · 優化器 · 閾值 ·

2023 年 6 月 6 日

Integrated Sensing, Computation, and Communication: System Framework and Performance Optimization

Yinghui He,Guanding Yu,Yunlong Cai,Haiyan Luo

Integrated sensing, computation, and communication (ISCC) has been recently considered as a promising technique for beyond 5G systems. In ISCC systems, the competition for communication and computation resources between sensing tasks for ambient intelligence and computation tasks from mobile devices becomes an increasingly challenging issue. To address it, we first propose an efficient sensing framework with a novel action detection module. In this module, a threshold is used for detecting whether the sensing target is static and thus the overhead can be reduced. Subsequently, we mathematically analyze the sensing performance of the proposed framework and theoretically prove its effectiveness with the help of the sampling theorem. Based on sensing performance models, we formulate a sensing performance maximization problem while guaranteeing the quality-of-service (QoS) requirements of tasks. To solve it, we propose an optimal resource allocation strategy, in which the minimum resource is allocated to computation tasks, and the rest is devoted to the sensing task. Besides, a threshold selection policy is derived and the results further demonstrate the necessity of the proposed sensing framework. Finally, a real-world test of action recognition tasks based on USRP B210 is conducted to verify the sensing performance analysis. Extensive experiments demonstrate the performance improvement of our proposal by comparing it with some benchmark schemes.

Learning · Continuity · Processing（編程語言） · CASES · GPS ·

2023 年 6 月 6 日

Memory-Based Dual Gaussian Processes for Sequential Learning

Paul E. Chang,Prakhar Verma,S. T. John,Arno Solin,Mohammad Emtiyaz Khan

from arxiv, International Conference on Machine Learning (ICML) 2023

Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dual sparse variational GP. Our method enables accurate inference for generic likelihoods and improves learning by actively building and updating a memory of past data. We demonstrate its effectiveness in several applications involving Bayesian optimization, active learning, and continual learning.

Learning · 泛函 · 過估計 · 經驗池 · 可約的 ·

2023 年 6 月 6 日

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Tianying Ji,Yu Luo,Fuchun Sun,Xianyuan Zhan,Jianwei Zhang,Huazhe Xu

Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms. Previous works focus on addressing the value overestimation issue, an outcome of adopting function approximators and off-policy learning. Deviating from the common viewpoint, we observe that Q-values are indeed underestimated in the latter stage of the RL training process, primarily related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer. We hypothesize that this long-neglected phenomenon potentially hinders policy learning and reduces sample efficiency. Our insight to address this issue is to incorporate sufficient exploitation of past successes while maintaining exploration optimism. We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates Q-value using both historical best-performing actions and the current policy. The instantiations of our method in both model-free and model-based settings outperform state-of-the-art methods in various continuous control tasks and achieve strong performance in failure-prone scenarios and real-world robot tasks.

Networking · 優化器 · 泛函 · 端到端 · 極大 ·

2023 年 6 月 5 日

Optimal Resource Allocation with Delay Guarantees for Network Slicing in Disaggregated RAN

Flávio G. C. Rocha,Gabriel M. F. de Almeida,Kleber V. Cardoso,Cristiano B. Both,José F. de Rezende

from arxiv, 21 pages, 10 figures. For the associated GitHub repository, see //github.com/LABORA-INF-UFG/paper-FGKCJ-2023

In this article, we propose a novel formulation for the resource allocation problem of a sliced and disaggregated Radio Access Network (RAN) and its transport network. Our proposal assures an end-to-end delay bound for the Ultra-Reliable and Low-Latency Communication (URLLC) use case while jointly considering the number of admitted users, the transmission rate allocation per slice, the functional split of RAN nodes and the routing paths in the transport network. We use deterministic network calculus theory to calculate delay along the transport network connecting disaggregated RANs deploying network functions at the Radio Unit (RU), Distributed Unit (DU), and Central Unit (CU) nodes. The maximum end-to-end delay is a constraint in the optimization-based formulation that aims to maximize Mobile Network Operator (MNO) profit, considering a cash flow analysis to model revenue and operational costs using data from one of the world's leading MNOs. The optimization model leverages a Flexible Functional Split (FFS) approach to provide a new degree of freedom to the resource allocation strategy. Simulation results reveal that, due to its non-linear nature, there is no trivial solution to the proposed optimization problem formulation. Our proposal guarantees a maximum delay for URLLC services while satisfying minimal bandwidth requirements for enhanced Mobile BroadBand (eMBB) services and maximizing the MNO's profit.

控制器 · 情景 · 極大 · 縮放 · 聲明 ·

2023 年 6 月 4 日

Towards Efficient Controller Synthesis Techniques for Logical LTL Games

Stanly Samuel,Deepak D'Souza,Raghavan Komondoor

Two-player games are a fruitful way to represent and reason about several important synthesis tasks. These tasks include controller synthesis (where one asks for a controller for a given plant such that the controlled plant satisfies a given temporal specification), program repair (setting values of variables to avoid exceptions), and synchronization synthesis (adding lock/unlock statements in multi-threaded programs to satisfy safety assertions). In all these applications, a solution directly corresponds to a winning strategy for one of the players in the induced game. In turn, \emph{logically-specified} games offer a powerful way to model these tasks for large or infinite-state systems. Much of the techniques proposed for solving such games typically rely on abstraction-refinement or template-based solutions. In this paper, we show how to apply classical fixpoint algorithms, that have hitherto been used in explicit, finite-state, settings, to a symbolic logical setting. We implement our techniques in a tool called GenSys-LTL and show that they are not only effective in synthesizing valid controllers for a variety of challenging benchmarks from the literature, but often compute maximal winning regions and maximally-permissive controllers. We achieve \textbf{46.38X speed-up} over the state of the art and also scale well for non-trivial LTL specifications.

泛函 · Principle · SAT · 估計/估計量 · 評分函數 ·

2023 年 6 月 2 日

Formalizing Preferences Over Runtime Distributions

Devon R. Graham,Kevin Leyton-Brown,Tim Roughgarden

When trying to solve a computational problem, we are often faced with a choice between algorithms that are guaranteed to return the right answer but differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions. It might seem that we should simply prefer the algorithm that minimizes expected runtime. However, such preferences would be driven by exactly how slow our algorithm is on bad inputs, whereas in practice we are typically willing to cut off occasional, sufficiently long runs before they finish. We propose a principled alternative, taking a utility-theoretic approach to characterize the scoring functions that describe preferences over algorithms. These functions depend on the way our value for solving our problem decreases with time and on the distribution from which captimes are drawn. We describe examples of realistic utility functions and show how to leverage a maximum-entropy approach for modeling underspecified captime distributions. Finally, we show how to efficiently estimate an algorithm's expected utility from runtime samples.