亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The convergence of deterministic policy gradient under the Hadamard parametrization is studied in the tabular setting and the global linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_0$ iterations, where $k_0$ is a constant that only depends on the MDP problem and the step size. Overall, the algorithm displays a linear convergence rate for all the iterations with a loose constant than that for the local linear convergence rate.

相關內容

When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.

While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.

In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies in settings with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s) and period(s). We show that the standard SC estimator is generally biased under random assignment. We propose a Modified Unbiased Synthetic Control (MUSC) estimator that guarantees unbiasedness under random assignment and derive its exact, randomization-based, finite-sample variance. We also propose an unbiased estimator for this variance. We document in settings with real data that under random assignment, SC-type estimators can have root mean-squared errors that are substantially lower than that of other common estimators. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. While our results only directly apply in settings where treatment is assigned randomly, we believe that they can complement model-based approaches even for observational studies.

For time-dependent PDEs, the numerical schemes can be rendered bound-preserving without losing conservation and accuracy, by a post processing procedure of solving a constrained minimization in each time step. Such a constrained optimization can be formulated as a nonsmooth convex minimization, which can be efficiently solved by first order optimization methods, if using the optimal algorithm parameters. By analyzing the asymptotic linear convergence rate of the generalized Douglas-Rachford splitting method, optimal algorithm parameters can be approximately expressed as a simple function of the number of out-of-bounds cells. We demonstrate the efficiency of this simple choice of algorithm parameters by applying such a limiter to cell averages of a discontinuous Galerkin scheme solving phase field equations for 3D demanding problems. Numerical tests on a sophisticated 3D Cahn-Hilliard-Navier-Stokes system indicate that the limiter is high order accurate, very efficient, and well-suited for large-scale simulations. For each time step, it takes at most $20$ iterations for the Douglas-Rachford splitting to enforce bounds and conservation up to the round-off error, for which the computational cost is at most $80N$ with $N$ being the total number of cells.

Decision trees are among the most popular machine learning models and are used routinely in applications ranging from revenue management and medicine to bioinformatics. In this paper, we consider the problem of learning optimal binary classification trees with univariate splits. Literature on the topic has burgeoned in recent years, motivated both by the empirical suboptimality of heuristic approaches and the tremendous improvements in mixed-integer optimization (MIO) technology. Yet, existing MIO-based approaches from the literature do not leverage the power of MIO to its full extent: they rely on weak formulations, resulting in slow convergence and large optimality gaps. To fill this gap in the literature, we propose an intuitive flow-based MIO formulation for learning optimal binary classification trees. Our formulation can accommodate side constraints to enable the design of interpretable and fair decision trees. Moreover, we show that our formulation has a stronger linear optimization relaxation than existing methods in the case of binary data. We exploit the decomposable structure of our formulation and max-flow/min-cut duality to derive a Benders' decomposition method to speed-up computation. We propose a tailored procedure for solving each decomposed subproblem that provably generates facets of the feasible set of the MIO as constraints to add to the main problem. We conduct extensive computational experiments on standard benchmark datasets on which we show that our proposed approaches are 29 times faster than state-of-the-art MIO-based techniques and improve out-of-sample performance by up to 8%.

This paper is concerned with the designing, analyzing and implementing linear and nonlinear discretization scheme for the distributed optimal control problem (OCP) with the Cahn-Hilliard (CH) equation as constrained. We propose three difference schemes to approximate and investigate the solution behaviour of the OCP for the CH equation. We present the convergence analysis of the proposed discretization. We verify our findings by presenting numerical experiments.

We investigate a fundamental vertex-deletion problem called (Induced) Subgraph Hitting: given a graph $G$ and a set $\mathcal{F}$ of forbidden graphs, the goal is to compute a minimum-sized set $S$ of vertices of $G$ such that $G-S$ does not contain any graph in $\mathcal{F}$ as an (induced) subgraph. This is a generic problem that encompasses many well-known problems that were extensively studied on their own, particularly (but not only) from the perspectives of both approximation and parameterization. In this paper, we study the approximability of the problem on a large variety of graph classes. Our first result is a linear-time $(1+\varepsilon)$-approximation reduction from (Induced) Subgraph Hitting on any graph class $\mathcal{G}$ of bounded expansion to the same problem on bounded degree graphs within $\mathcal{G}$. This directly yields linear-size $(1+\varepsilon)$-approximation lossy kernels for the problems on any bounded-expansion graph classes. Our second result is a linear-time approximation scheme for (Induced) Subgraph Hitting on any graph class $\mathcal{G}$ of polynomial expansion, based on the local-search framework of Har-Peled and Quanrud [SICOMP 2017]. This approximation scheme can be applied to a more general family of problems that aim to hit all subgraphs satisfying a certain property $\pi$ that is efficiently testable and has bounded diameter. Both of our results have applications to Subgraph Hitting (not induced) on wide classes of geometric intersection graphs, resulting in linear-size lossy kernels and (near-)linear time approximation schemes for the problem.

Selection of a group of representatives satisfying certain fairness constraints, is a commonly occurring scenario. Motivated by this, we initiate a systematic algorithmic study of a \emph{fair} version of \textsc{Hitting Set}. In the classical \textsc{Hitting Set} problem, the input is a universe $\mathcal{U}$, a family $\mathcal{F}$ of subsets of $\mathcal{U}$, and a non-negative integer $k$. The goal is to determine whether there exists a subset $S \subseteq \mathcal{U}$ of size $k$ that \emph{hits} (i.e., intersects) every set in $\mathcal{F}$. Inspired by several recent works, we formulate a fair version of this problem, as follows. The input additionally contains a family $\mathcal{B}$ of subsets of $\mathcal{U}$, where each subset in $\mathcal{B}$ can be thought of as the group of elements of the same \emph{type}. We want to find a set $S \subseteq \mathcal{U}$ of size $k$ that (i) hits all sets of $\mathcal{F}$, and (ii) does not contain \emph{too many} elements of each type. We call this problem \textsc{Fair Hitting Set}, and chart out its tractability boundary from both classical as well as multivariate perspective. Our results use a multitude of techniques from parameterized complexity including classical to advanced tools, such as, methods of representative sets for matroids, FO model checking, and a generalization of best known kernels for \textsc{Hitting Set}.

The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization. Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.

北京阿比特科技有限公司