成人午夜性影院视频,美女午夜一区视频在线播放,久久国产综合尤物免费观看

In performative prediction, a predictive model impacts the distribution that generates future data, a phenomenon that is being ignored in classical supervised learning. In this closed-loop setting, the natural measure of performance, denoted the performative risk, captures the expected loss incurred by a predictive model after deployment. The core difficulty of minimizing the performative risk is that the data distribution itself depends on the model parameters. This dependence is governed by the environment and not under the control of the learner. As a consequence, even the choice of a convex loss function can result in a highly non-convex performative risk minimization problem. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity of the performative risk. In this paper, we relax these assumptions and focus on obtaining weaker notions of convexity, without sacrificing the amenability of the performative risk minimization problem for iterative optimization methods.

相關內容

Performer

關注 10

潛變量/隱變量 · 知識 (knowledge) · 任務對話系統 · 潛在 · 未標記 ·

2022 年 10 月 21 日

Discovering New Intents Using Latent Variables

Yunhua Zhou,Peiju Liu,Yuxin Wang,Xipeng QIu

Discovering new intents is of great significance to establishing Bootstrapped Task-Oriented Dialogue System. Most existing methods either lack the ability to transfer prior knowledge in the known intent data or fall into the dilemma of forgetting prior knowledge in the follow-up. More importantly, these methods do not deeply explore the intrinsic structure of unlabeled data, so they can not seek out the characteristics that make an intent in general. In this paper, starting from the intuition that discovering intents could be beneficial to the identification of the known intents, we propose a probabilistic framework for discovering intents where intent assignments are treated as latent variables. We adopt Expectation Maximization framework for optimization. Specifically, In E-step, we conduct discovering intents and explore the intrinsic structure of unlabeled data by the posterior of intent assignments. In M-step, we alleviate the forgetting of prior knowledge transferred from known intents by optimizing the discrimination of labeled data. Extensive experiments conducted in three challenging real-world datasets demonstrate our method can achieve substantial improvements.

GPS · 泛函 · 優化器 · Processing（編程語言） · MoDELS ·

2022 年 10 月 20 日

Optimal plug-in Gaussian processes for modelling derivatives

Zejian Liu,Meng Li

from arxiv, This paper supersedes the second part of the technical report available at arXiv:2011.13967v1. That technical report has been split: The first part on equivalence theory will be extended and become 2011.13967v2. The results on Bayesian inference for function derivatives have evolved into this paper

Derivatives are a key nonparametric functional in wide-ranging applications where the rate of change of an unknown function is of interest. In the Bayesian paradigm, Gaussian processes (GPs) are routinely used as a flexible prior for unknown functions, and are arguably one of the most popular tools in many areas. However, little is known about the optimal modelling strategy and theoretical properties when using GPs for derivatives. In this article, we study a plug-in strategy by differentiating the posterior distribution with GP priors for derivatives of any order. This practically appealing plug-in GP method has been previously perceived as suboptimal and degraded, but this is not necessarily the case. We provide posterior contraction rates for plug-in GPs and establish that they remarkably adapt to derivative orders. We show that the posterior measure of the regression function and its derivatives, with the same choice of hyperparameter that does not depend on the order of derivatives, converges at the minimax optimal rate up to a logarithmic factor for functions in certain classes. This to the best of our knowledge provides the first positive result for plug-in GPs in the context of inferring derivative functionals, and leads to a practically simple nonparametric Bayesian method with guided hyperparameter tuning for simultaneously estimating the regression function and its derivatives. Simulations show competitive finite sample performance of the plug-in GP method. A climate change application on analyzing the global sea-level rise is discussed.

Performer · 線性的 · 正則化項 · 廣義線性模型 · 協變量偏移 ·

2022 年 10 月 20 日

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization

Daniel LeJeune,Jiayu Liu,Reinhard Heckel

from arxiv, 33 pages, 7 figures

Machine learning systems are often applied to data that is drawn from a different distribution than the training distribution. Recent work has shown that for a variety of classification and signal reconstruction problems, the out-of-distribution performance is strongly linearly correlated with the in-distribution performance. If this relationship or more generally a monotonic one holds, it has important consequences. For example, it allows to optimize performance on one distribution as a proxy for performance on the other. In this paper, we study conditions under which a monotonic relationship between the performances of a model on two distributions is expected. We prove an exact asymptotic linear relation for squared error and a monotonic relation for misclassification error for ridge-regularized general linear models under covariate shift, as well as an approximate linear relation for linear inverse problems.

Continuity · 優化器 · state-of-the-art · 散度 · 控制器 ·

2022 年 10 月 20 日

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Antonio Terpin,Nicolas Lanzetti,Batuhan Yardim,Florian D?rfler,Giorgia Ramponi

from arxiv, Accepted for presentation at, and publication in the proceedings of, the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces. We circumvent the infinite-dimensional optimization problem for PO by providing a one-dimensional dual reformulation for which strong duality holds. We then analytically derive the optimal policy update given the solution of the dual problem. This way, we bypass the computation of optimal transport costs and of optimal transport maps, which we implicitly characterize by solving the dual formulation. Finally, we provide an experimental evaluation of our approach across various control tasks. Our results show that optimal transport discrepancies can offer an advantage over state-of-the-art approaches.

條件獨立的 · 相互獨立的 · 情景 · 對數幾率回歸 · 預測器/決策函數 ·

2022 年 10 月 20 日

Anytime Valid Tests of Conditional Independence Under Model-X

Peter Grünwald,Alexander Henzi,Tyron Lardy

We propose a sequential, anytime valid method to test the conditional independence of a response $Y$ and a predictor $X$ given a random vector $Z$. The proposed test is based on e-statistics and test martingales, which generalize likelihood ratios and allow valid inference at arbitrary stopping times. In accordance with the recently introduced model-X setting, our test depends on the availability of the conditional distribution of $X$ given $Z$, or at least a sufficiently sharp approximation thereof. Within this setting, we derive a full characterization of e-statistics for testing conditional independence, investigate growth-rate optimal e-statistics and their power properties, and show that our method yields tests with asymptotic power one in the special case of a logistic regression model. A simulation study is done to demonstrate that the approach is robust with respect to violations of the model-X assumption and competitive in terms of power when compared to established sequential and non-sequential testing methods.

BLIP · 分解的 · 控制器 · 潛在 · 線性的 ·

2022 年 10 月 20 日

Synthetic Blip Effects: Generalizing Synthetic Controls for the Dynamic Treatment Regime

Anish Agarwal,Vasilis Syrgkanis

We propose a generalization of the synthetic control and synthetic interventions methodology to the dynamic treatment regime. We consider the estimation of unit-specific treatment effects from panel data collected via a dynamic treatment regime and in the presence of unobserved confounding. That is, each unit receives multiple treatments sequentially, based on an adaptive policy, which depends on a latent endogenously time-varying confounding state of the treated unit. Under a low-rank latent factor model assumption and a technical overlap assumption we propose an identification strategy for any unit-specific mean outcome under any sequence of interventions. The latent factor model we propose admits linear time-varying and time-invariant dynamical systems as special cases. Our approach can be seen as an identification strategy for structural nested mean models under a low-rank latent factor assumption on the blip effects. Our method, which we term "synthetic blip effects", is a backwards induction process, where the blip effect of a treatment at each period and for a target unit is recursively expressed as linear combinations of blip effects of a carefully chosen group of other units that received the designated treatment. Our work avoids the combinatorial explosion in the number of units that would be required by a vanilla application of prior synthetic control and synthetic intervention methods in such dynamic treatment regime settings.

混合時間 · Markovian · 優化器 · 混合 · Learning ·

2022 年 10 月 19 日

Adapting to Mixing Time in Stochastic Optimization with Markovian Data

Ron Dorfman,Kfir Y. Levy

from arxiv, ICML 2022. Code: //github.com/Rondorf/BOReL

We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.

泛函 · Processing（編程語言） · 樣本 · 優化器 · MoDELS ·

2022 年 10 月 19 日

Gaussian Process Sampling and Optimization with Approximate Upper and Lower Bounds

Vu Nguyen,Marc Peter Deisenroth,Michael A. Osborne

from arxiv, 20 pages

Many functions have approximately-known upper and/or lower bounds, potentially aiding the modeling of such functions. In this paper, we introduce Gaussian process models for functions where such bounds are (approximately) known. More specifically, we propose the first use of such bounds to improve Gaussian process (GP) posterior sampling and Bayesian optimization (BO). That is, we transform a GP model satisfying the given bounds, and then sample and weight functions from its posterior. To further exploit these bounds in BO settings, we present bounded entropy search (BES) to select the point gaining the most information about the underlying function, estimated by the GP samples, while satisfying the output constraints. We characterize the sample variance bounds and show that the decision made by BES is explainable. Our proposed approach is conceptually straightforward and can be used as a plug in extension to existing methods for GP posterior sampling and Bayesian optimization.

優化器 · Processing（編程語言） · MoDELS · 學成 · 最優化 ·

2021 年 12 月 19 日

Introduction to Online Convex Optimization

Elad Hazan

from arxiv, arXiv admin note: text overlap with arXiv:1909.03550

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

優化器 · MoDELS · 分布式機器學習 · Performer · CIFAR-10 ·

2020 年 2 月 18 日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Yikai Yan,Chaoyue Niu,Yucheng Ding,Zhenzhe Zheng,Fan Wu,Guihai Chen,Shaojie Tang,Zhihua Wu

from arxiv, ICML 2020 Submission

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.