亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='9r1a2'><strong id='9r1a2'></strong><small id='9r1a2'></small><button id='9r1a2'></button><li id='9r1a2'><noscript id='9r1a2'><big id='9r1a2'></big><dt id='9r1a2'></dt></noscript></li></tr><ol id='9r1a2'><option id='9r1a2'><table id='9r1a2'><blockquote id='9r1a2'><tbody id='9r1a2'></tbody></blockquote></table></option></ol><u id='9r1a2'></u><kbd id='9r1a2'><kbd id='9r1a2'></kbd></kbd>

<code id='9r1a2'><strong id='9r1a2'></strong></code>

<fieldset id='9r1a2'></fieldset>

<span id='9r1a2'></span>

<ins id='9r1a2'></ins>

<acronym id='9r1a2'><em id='9r1a2'></em><td id='9r1a2'><div id='9r1a2'></div></td></acronym><address id='9r1a2'><big id='9r1a2'><big id='9r1a2'></big><legend id='9r1a2'></legend></big></address>

<i id='9r1a2'><div id='9r1a2'><ins id='9r1a2'></ins></div></i>

<i id='9r1a2'></i>

·

約束 · 代價函數 · state-of-the-art · 學成 · 代價 ·

2022 年 1 月 8 日

Lazy Lagrangians with Predictions for Online Learning

Daron Anderson,George Iosifidis,Douglas J. Leith

We consider the general problem of online convex optimization with time-varying additive constraints in the presence of predictions for the next cost and constraint functions. A novel primal-dual algorithm is designed by combining a Follow-The-Regularized-Leader iteration with prediction-adaptive dynamic steps. The algorithm achieves $\mathcal O(T^{\frac{3-\beta}{4}})$ regret and $\mathcal O(T^{\frac{1+\beta}{2}})$ constraint violation bounds that are tunable via parameter $\beta\!\in\![1/2,1)$ and have constant factors that shrink with the predictions quality, achieving eventually $\mathcal O(1)$ regret for perfect predictions. Our work extends the FTRL framework for this constrained OCO setting and outperforms the respective state-of-the-art greedy-based solutions, without imposing conditions on the quality of predictions, the cost functions or the geometry of constraints, beyond convexity.

相關內容

Performer · 在線 · 學成 · Networking · 可約的 ·

2022 年 4 月 20 日

Online Caching with Optimistic Learning

Naram Mhaisen,George Iosifidis,Douglas Leith

from arxiv, To appear in IFIP Networking 2022

The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of optimistic online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with fixed-size caches or elastic leased caches subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity, and hence can naturally reduce the caching network's uncertainty about future requests. We prove that the proposed optimistic learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions, and maintain the best achievable regret bound $O(\sqrt T)$ even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.

Performer · 學成 · 在線 · Networking · 可約的 ·

2022 年 4 月 20 日

Online Caching with no Regret: Optimistic Learning via Recommendations

Naram Mhaisen,George Iosifidis,Douglas Leith

from arxiv, arXiv admin note: substantial text overlap with arXiv:2202.10590

The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of optimistic online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework, which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with fixed-size caches or elastic leased caches subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity and hence can naturally reduce the caching network's uncertainty about future requests. We also extend the framework to learn and utilize the best request predictor in cases where many are available. We prove that the proposed {optimistic} learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions, and maintain the sub-linear regret bound $O(\sqrt T)$, which is the best achievable bound for policies that do not use predictions, even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.

平穩分布 · 估計/估計量 · 平穩的 · 學成 · 約束 ·

2022 年 4 月 19 日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Jongmin Lee,Cosmin Paduraru,Daniel J. Mankowitz,Nicolas Heess,Doina Precup,Kee-Eung Kim,Arthur Guez

from arxiv, 24 pages, 6 figures, Accepted at ICLR 2022 (spotlight)

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. This problem setting is appealing in many real-world scenarios, where direct interaction with the environment is costly or risky, and where the resulting policy should comply with safety constraints. However, it is challenging to compute a policy that guarantees satisfying the cost constraints in the offline RL setting, since the off-policy evaluation inherently has an estimation error. In this paper, we present an offline constrained RL algorithm that optimizes the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction. Experimental results show that COptiDICE attains better policies in terms of constraint satisfaction and return-maximization, outperforming baseline algorithms.

優化器 · Performer · 代價 · 學成 · ML ·

2022 年 4 月 18 日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Pengfei Li,Jianyi Yang,Shaolei Ren

from arxiv, Pengfei Li and Jianyi Yang contributed equally. This paper has been accepted by and will be presented at the ACM SIGMETRICS/IFIP Performance 2022

We study online convex optimization with switching costs, a practically important but also extremely challenging problem due to the lack of complete offline information. By tapping into the power of machine learning (ML) based optimizers, ML-augmented online algorithms (also referred to as expert calibration in this paper) have been emerging as state of the art, with provable worst-case performance guarantees. Nonetheless, by using the standard practice of training an ML model as a standalone optimizer and plugging it into an ML-augmented algorithm, the average cost performance can be even worse than purely using ML predictions. In order to address the "how to learn" challenge, we propose EC-L2O (expert-calibrated learning to optimize), which trains an ML-based optimizer by explicitly taking into account the downstream expert calibrator. To accomplish this, we propose a new differentiable expert calibrator that generalizes regularized online balanced descent and offers a provably better competitive ratio than pure ML predictions when the prediction error is large. For training, our loss function is a weighted sum of two different losses -- one minimizing the average ML prediction error for better robustness, and the other one minimizing the post-calibration average cost. We also provide theoretical analysis for EC-L2O, highlighting that expert calibration can be even beneficial for the average cost performance and that the high-percentile tail ratio of the cost achieved by EC-L2O to that of the offline optimal oracle (i.e., tail cost ratio) can be bounded. Finally, we test EC-L2O by running simulations for sustainable datacenter demand response. Our results demonstrate that EC-L2O can empirically achieve a lower average cost as well as a lower competitive ratio than the existing baseline algorithms.

主動學習 · 學成 · 查準率/準確率 · Processing（編程語言） · 輸入空間 ·

2022 年 4 月 18 日

Active Learning with Weak Labels for Gaussian Processes

Amanda Olmin,Jakob Lindqvist,Lennart Svensson,Fredrik Lindsten

Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation costs. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.

非線性共軛梯度 · 共軛梯度 · 共軛 · Continuity · 泛函 ·

2022 年 4 月 17 日

A Modified Nonlinear Conjugate Gradient Algorithm for Functions with Non-Lipschitz Gradient

Bingjie Li,Tianhao Ni,Zhenyue Zhang

from arxiv, arXiv admin note: text overlap with arXiv:2102.08048

In this paper, we propose a modified nonlinear conjugate gradient (NCG) method for functions with a non-Lipschitz continuous gradient. First, we present a new formula for the conjugate coefficient \beta_k in NCG, conducting a search direction that provides an adequate function decrease. We can derive that our NCG algorithm guarantees strongly convergent for continuous differential functions without Lipschitz continuous gradient. Second, we present a simple interpolation approach that could automatically achieve shrinkage, generating a step length satisfying the standard Wolfe conditions in each step. Our framework considerably broadens the applicability of NCG and preserves the superior numerical performance of the PRP-type methods.

Machine Learning · 學成 · Conformer · Performer · CASES ·

2022 年 4 月 15 日

Characterizing metastable states with the help of machine learning

Pietro Novelli,Luigi Bonati,Massimiliano Pontil,Michele Parrinello

from arxiv, Main text: 10 pages, 4 figures. Supplementary Info: 4 pages, 5, figures

Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.

Performer · 多樣性 · 近似 · state-of-the-art · 學成 ·

2022 年 4 月 15 日

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Bryon Tjanaka,Matthew C. Fontaine,Julian Togelius,Stefanos Nikolaidis

from arxiv, Published as a conference paper at the 2022 Genetic and Evolutionary Computation Conference (GECCO '22); Online article available at //dqd-rl.github.io

Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at //github.com/icaros-usc/dqd-rl

主動學習 · Weight · 預測器/決策函數 · 學成 · 相互獨立的 ·

2022 年 4 月 14 日

Active Learning for Regression and Classification by Inverse Distance Weighting

Alberto Bemporad

from arxiv, 17 pages, 8 figures. Submitted for publication

This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{//cse.lab.imtlucca.it/~bemporad/ideal}.

學成 · 替代損失 · 在線 · Bandits · 賭博機/老虎機 ·

2019 年 12 月 31 日

A Modern Introduction to Online Learning

Francesco Orabona

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

state-of-the-art

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='9r1a2'></tfoot>

<legend id='9r1a2'><style id='9r1a2'><dir id='9r1a2'><q id='9r1a2'></q></dir></style></legend>

<i id='9r1a2'><tr id='9r1a2'><dt id='9r1a2'><q id='9r1a2'><span id='9r1a2'><b id='9r1a2'><form id='9r1a2'><ins id='9r1a2'></ins><ul id='9r1a2'></ul><sub id='9r1a2'></sub></form><legend id='9r1a2'></legend><bdo id='9r1a2'><pre id='9r1a2'><center id='9r1a2'></center></pre></bdo></b><th id='9r1a2'></th></span></q></dt></tr></i><div id='9r1a2'><tfoot id='9r1a2'></tfoot><dl id='9r1a2'><fieldset id='9r1a2'></fieldset></dl></div>