亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the effect of baselines in on-policy stochastic policy gradient optimization, and close the gap between the theory and practice of policy optimization methods. Our first contribution is to show that the \emph{state value} baseline allows on-policy stochastic \emph{natural} policy gradient (NPG) to converge to a globally optimal policy at an $O(1/t)$ rate, which was not previously known. The analysis relies on two novel findings: the expected progress of the NPG update satisfies a stochastic version of the non-uniform \L{}ojasiewicz (N\L{}) inequality, and with probability 1 the state value baseline prevents the optimal action's probability from vanishing, thus ensuring sufficient exploration. Importantly, these results provide a new understanding of the role of baselines in stochastic policy gradient: by showing that the variance of natural policy gradient estimates remains unbounded with or without a baseline, we find that variance reduction \emph{cannot} explain their utility in this setting. Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance. That is, we demonstrate that a finite variance is \emph{not necessary} for almost sure convergence of stochastic NPG, while controlling update aggressiveness is both necessary and sufficient. Additional experimental results verify these theoretical findings.

相關內容

Numerical predictions of quantities of interest measured within physical systems rely on the use of mathematical models that should be validated, or at best, not invalidated. Model validation usually involves the comparison of experimental data (outputs from the system of interest) and model predictions, both obtained at a specific validation scenario. The design of this validation experiment should be directly relevant to the objective of the model, that of predicting a quantity of interest at a prediction scenario. In this paper, we address two specific issues arising when designing validation experiments. The first issue consists in determining an appropriate validation scenario in cases where the prediction scenario cannot be carried out in a controlled environment. The second issue concerns the selection of observations when the quantity of interest cannot be readily observed. The proposed methodology involves the computation of influence matrices that characterize the response surface of given model functionals. Minimization of the distance between influence matrices allow one for selecting a validation experiment most representative of the prediction scenario. We illustrate our approach on two numerical examples. The first example considers the validation of a simple model based on an ordinary differential equation governing an object in free fall to put in evidence the importance of the choice of the validation experiment. The second numerical experiment focuses on the transport of a pollutant and demonstrates the impact that the choice of the quantity of interest has on the validation experiment to be performed.

We consider the problem of information aggregation in federated decision making, where a group of agents collaborate to infer the underlying state of nature without sharing their private data with the central processor or each other. We analyze the non-Bayesian social learning strategy in which agents incorporate their individual observations into their opinions (i.e., soft-decisions) with Bayes rule, and the central processor aggregates these opinions by arithmetic or geometric averaging. Building on our previous work, we establish that both pooling strategies result in asymptotic normality characterization of the system, which, for instance, can be utilized in order to give approximate expressions for the error probability. We verify the theoretical findings with simulations and compare both strategies.

We explore the features of two methods of stabilization, aggregation and supremizer methods, for reduced-order modeling of parametrized optimal control problems. In both methods, the reduced basis spaces are augmented to guarantee stability. For the aggregation method, the reduced basis approximation spaces for the state and adjoint variables are augmented in such a way that the spaces are identical. For the supremizer method, the reduced basis approximation space for the state-control product space is augmented with the solutions of a supremizer equation. We implement both of these methods for solving several parametrized control problems and assess their performance. Results indicate that the number of reduced basis vectors needed to approximate the solution space to some tolerance with the supremizer method is much larger, possibly double, that for aggregation. There are also some cases where the supremizer method fails to produce a converged solution. We present results to compare the accuracy, efficiency, and computational costs associated with both methods of stabilization which suggest that stabilization by aggregation is a superior stabilization method for control problems.

In a task where many similar inverse problems must be solved, evaluating costly simulations is impractical. Therefore, replacing the model $y$ with a surrogate model $y_s$ that can be evaluated quickly leads to a significant speedup. The approximation quality of the surrogate model depends strongly on the number, position, and accuracy of the sample points. With an additional finite computational budget, this leads to a problem of (computer) experimental design. In contrast to the selection of sample points, the trade-off between accuracy and effort has hardly been studied systematically. We therefore propose an adaptive algorithm to find an optimal design in terms of position and accuracy. Pursuing a sequential design by incrementally appending the computational budget leads to a convex and constrained optimization problem. As a surrogate, we construct a Gaussian process regression model. We measure the global approximation error in terms of its impact on the accuracy of the identified parameter and aim for a uniform absolute tolerance, assuming that $y_s$ is computed by finite element calculations. A priori error estimates and a coarse estimate of computational effort relate the expected improvement of the surrogate model error to computational effort, resulting in the most efficient combination of sample point and evaluation tolerance. We also allow for improving the accuracy of already existing sample points by continuing previously truncated finite element solution procedures.

The Erd\H{o}s-R\'enyi random graph is the simplest model for node degree distribution, and it is one of the most widely studied. In this model, pairs of $n$ vertices are selected and connected uniformly at random with probability $p$, consequently, the degrees for a given vertex follow the binomial distribution. If the number of vertices is large, the binomial can be approximated by Normal using the Central Limit Theorem, which is often allowed when $\min (np, n(1-p)) > 5$. This is true for every node independently. However, due to the fact that the degrees of nodes in a graph are not independent, we aim in this paper to test whether the degrees of per node collectively in the Erd\H{o}s-R\'enyi graph have a multivariate normal distribution MVN. A chi square goodness of fit test for the hypothesis that binomial is a distribution for the whole set of nodes is rejected because of the dependence between degrees. Before testing MVN we show that the covariance and correlation between the degrees of any pair of nodes in the graph are $p(1-p)$ and $1/(n-1)$, respectively. We test MVN considering two assumptions: independent and dependent degrees, and we obtain our results based on the percentages of rejected statistics of chi square, the $p$-values of Anderson Darling test, and a CDF comparison. We always achieve a good fit of multivariate normal distribution with large values of $n$ and $p$, and very poor fit when $n$ or $p$ are very small. The approximation seems valid when $np \geq 10$. We also compare the maximum likelihood estimate of $p$ in MVN distribution where we assume independence and dependence. The estimators are assessed using bias, variance and mean square error.

This paper combines modern numerical computation with theoretical results to improve our understanding of the growth factor problem for Gaussian elimination. On the computational side we obtain lower bounds for the maximum growth for complete pivoting for $n=1:75$ and $n=100$ using the Julia JuMP optimization package. At $n=100$ we obtain a growth factor bigger than $3n$. The numerical evidence suggests that the maximum growth factor is bigger than $n$ if and only if $n \ge 11$. We also present a number of theoretical results. We show that the maximum growth factor over matrices with entries restricted to a subset of the reals is nearly equal to the maximum growth factor over all real matrices. We also show that the growth factors under floating point arithmetic and exact arithmetic are nearly identical. Finally, through numerical search, and stability and extrapolation results, we provide improved lower bounds for the maximum growth factor. Specifically, we find that the largest growth factor is bigger than $1.0045n$, and the lim sup of the ratio with $n$ is greater than or equal to $3.317$. In contrast to the old conjecture that growth might never be bigger than $n$, it seems likely that the maximum growth divided by $n$ goes to infinity as $n \rightarrow \infty$.

Benchmarking plays a major role in the development and analysis of optimization algorithms. As such, the way in which the used benchmark problems are defined significantly affects the insights that can be gained from any given benchmark study. One way to easily extend the range of available benchmark functions is through affine combinations between pairs of functions. From the perspective of landscape analysis, these function combinations smoothly transition between the two base functions. In this work, we show how these affine function combinations can be used to analyze the behavior of optimization algorithms. In particular, we highlight that by varying the weighting between the combined problems, we can gain insights into the effects of added global structure on the performance of optimization algorithms. By analyzing performance trajectories on more function combinations, we also show that aspects such as the scaling of objective functions and placement of the optimum can greatly impact how these results are interpreted.

When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

Machine reading comprehension (MRC) aims to teach machines to read and comprehend human languages, which is a long-standing goal of natural language processing (NLP). With the burst of deep neural networks and the evolution of contextualized language models (CLMs), the research of MRC has experienced two significant breakthroughs. MRC and CLM, as a phenomenon, have a great impact on the NLP community. In this survey, we provide a comprehensive and comparative review on MRC covering overall research topics about 1) the origin and development of MRC and CLM, with a particular focus on the role of CLMs; 2) the impact of MRC and CLM to the NLP community; 3) the definition, datasets, and evaluation of MRC; 4) general MRC architecture and technical methods in the view of two-stage Encoder-Decoder solving architecture from the insights of the cognitive process of humans; 5) previous highlights, emerging topics, and our empirical analysis, among which we especially focus on what works in different periods of MRC researches. We propose a full-view categorization and new taxonomies on these topics. The primary views we have arrived at are that 1) MRC boosts the progress from language processing to understanding; 2) the rapid improvement of MRC systems greatly benefits from the development of CLMs; 3) the theme of MRC is gradually moving from shallow text matching to cognitive reasoning.

北京阿比特科技有限公司