97SE亚洲国产综合在线_精品亚洲中文一区二区三区_日本一卡2卡3卡4卡乱码免费网站_亚洲无码性爱视频一_亚洲国产日韩欧美第一区二区三区_日韩精品一区二区三区试看_日韩丝袜无码一区二区三区在

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while commonly constrained to match empirically estimated feature expectations. We seek to generalize this principle to scenarios where the empirical feature expectations cannot be computed because the model variables are only partially observed, which introduces a dependency on the learned model. Extending and generalizing the principle of latent maximum entropy, we introduce uncertain maximum entropy and describe an expectation-maximization based solution to approximately solve these problems. We show that our technique generalizes the principle of maximum entropy and latent maximum entropy and discuss a generally applicable regularization technique for adding error terms to feature expectation constraints in the event of limited data.

相關內容

Principle

關注 2

迄今為止(zhi)，產品設計師最(zui)友好的(de)交(jiao)互(hu)動畫軟(ruan)件。

Weight · 估計/估計量 · 泛化理論 · MoDELS · 樣本 ·

2021 年 11 月 2 日

Leveraging Population Outcomes to Improve the Generalization of Experimental Results

Melody Huang,Naoki Egami,Erin Hartman,Luke Miratrix

Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various weighting estimators for the population average treatment effect (PATE), many of these methods result in large variance because the experimental sample often differs substantially from the target population, and estimated sampling weights are extreme. To improve efficiency in practice, we propose post-residualized weighting in which we use the outcome measured in the observational population data to build a flexible predictive model (e.g., machine learning methods) and residualize the outcome in the experimental data before using conventional weighting methods. We show that the proposed PATE estimator is consistent under the same assumptions required for existing weighting methods, importantly without assuming the correct specification of the predictive model. We demonstrate the efficiency gains from this approach through simulations and our application based on a set of job training experiments.

Processing（編程語言） · MoDELS · 可辨認的 · 泛化理論 · 模型選擇 ·

2021 年 11 月 1 日

An Information-theoretic Approach to Distribution Shifts

Marco Federici,Ryota Tomioka,Patrick Forré

Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

置信度 · 成比例 · 估計/估計量 · 平滑 · 協方差矩陣 ·

2021 年 10 月 31 日

Localizing differences in smooths with simultaneous confidence bounds on the true discovery proportion

David Swanson

We demonstrate a method for localizing where two smooths differ using a true discovery proportion (TDP) based interpretation. The procedure yields a statement on the proportion of some region where true differences exist between two smooths, which results from use of hypothesis tests on collections of basis coefficients parametrizing the smooths. The methodology avoids otherwise ad hoc means of doing so such as performing hypothesis tests on entire smooths of subsetted data. TDP estimates are 1-alpha confidence bounded simultaneously, assuring that the estimate for a region is a lower bound on the proportion of actual difference, or true discoveries, in that region with high confidence regardless of the number, location, or size of regions for which TDP is estimated. Our procedure is based on closed-testing using Simes local test. We develop expressions for the covariance of quadratic forms because of the multiple regression framework in which we use closed-testing results, which are shown to be non-negative in many settings. Our procedure is well-powered because of a result on the off-diagonal decay structure of the covariance matrix of penalized B-splines of degree two or less. We demonstrate achievement of estimated TDP in simulation for different specified alpha levels and degree of difference and analyze a data set of walking gait of cerebral palsy patients. Keywords: splines; smoothing; multiple testing; closed-testing; simultaneous confidence

策略改進 · 優化器 · 樣本 · Performer · 評論員 ·

2021 年 10 月 29 日

Generalized Proximal Policy Optimization with Sample Reuse

James Queeney,Ioannis Ch. Paschalidis,Christos G. Cassandras

from arxiv, To appear in 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency.

賭博機/老虎機 · CASE · 情景 · 樣本 · CASES ·

2021 年 10 月 29 日

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue,Tor Lattimore

We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic sampling' (VBOS). The procedure works for any posteriors, \ie, it does not require the posterior to have any special properties, such as log-concavity, unimodality, or smoothness. The variational view of the problem has many useful properties, including the ability to tune the exploration-exploitation tradeoff, add regularization, incorporate constraints, and linearly parameterize the policy.

貝葉斯推斷 · 正則化項 · 推斷 · 正則化 · 學成 ·

2020 年 12 月 3 日

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Kien Do,Truyen Tran,Svetha Venkatesh

from arxiv, Accepted to AAAI 2021

We propose two generic methods for improving semi-supervised learning (SSL). The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods. We implement WP by leveraging variational Bayesian inference (VBI). The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR). While most consistency losses act on perturbations in the vicinity of each data point, MUR actively searches for "virtual" points situated beyond this region that cause the most uncertain class predictions. This allows MUR to impose smoothness on a wider area in the input-output manifold. Our experiments show clear improvements in classification errors of various CR based methods when they are combined with VBI or MUR or both.

學成 · 表示學習 · 簇 · INFORMS · Principle ·

2019 年 11 月 13 日

Self-labelling via simultaneous clustering and representation learning

Yuki Markus Asano,Christian Rupprecht,Andrea Vedaldi

Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard cross-entropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Compared to the best previous method in this class, namely DeepCluster, our formulation minimizes a single objective function for both representation learning and clustering; it also significantly outperforms DeepCluster in standard benchmarks and reaches state of the art for learning a ResNet-50 self-supervisedly.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

Continuity · 控制器 · 學成 · 強化學習 · 優化器 ·

2018 年 6 月 25 日

A Tour of Reinforcement Learning: The View from Continuous Control

Benjamin Recht

This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.

優化器 · Extensibility · 對偶問題 · 平滑 · INTERACT ·

2017 年 12 月 1 日

Optimal Algorithms for Distributed Optimization

César A. Uribe,Soomin Lee,Alexander Gasnikov,Angelia Nedi?

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.