亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also provide a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution.

相關內容

Stochastic gradient algorithms are widely used for both optimization and sampling in large-scale learning and inference problems. However, in practice, tuning these algorithms is typically done using heuristics and trial-and-error rather than rigorous, generalizable theory. To address this gap between theory and practice, we novel insights into the effect of tuning parameters by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size. In the optimization setting, our results show that iterate averaging with a large fixed step size can result in statistically efficient approximation of the (local) M-estimator. In the sampling context, our results show that with appropriate choices of tuning parameters, the limiting stationary covariance can match either the Bernstein--von Mises limit of the posterior, adjustments to the posterior for model misspecification, or the asymptotic distribution of the MLE; and that with a naive tuning the limit corresponds to none of these. Moreover, we argue that an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. We validate our asymptotic results in realistic finite-sample regimes via several experiments using simulated and real data. Overall, we demonstrate that properly tuned stochastic gradient algorithms with constant step size offer a computationally efficient and statistically robust approach to obtaining point estimates or posterior-like samples.

The paper is devoted to an approach to solving a problem of the efficiency of parallel computing. The theoretical basis of this approach is the concept of a $Q$-determinant. Any numerical algorithm has a $Q$-determinant. The $Q$-determinant of the algorithm has clear structure and is convenient for implementation. The $Q$-determinant consists of $Q$-terms. Their number is equal to the number of output data items. Each $Q$-term describes all possible ways to compute one of the output data items based on the input data. We also describe a software $Q$-system for studying the parallelism resource of numerical algorithms. This system enables to compute and compare the parallelism resources of numerical algorithms. The application of the $Q$-system is shown on the example of numerical algorithms with different structures of $Q$-determinants. Furthermore, we suggest a method for designing of parallel programs for numerical algorithms. This method is based on a representation of a numerical algorithm in the form of a $Q$-determinant. As a result, we can obtain the program using the parallelism resource of the algorithm completely. Such programs are called $Q$-effective. The results of this research can be applied to increase the implementation efficiency of numerical algorithms, methods, as well as algorithmic problems on parallel computing systems.

The challenge of simulating random variables is a central problem in Statistics and Machine Learning. Given a tractable proposal distribution $P$, from which we can draw exact samples, and a target distribution $Q$ which is absolutely continuous with respect to $P$, the A* sampling algorithm allows simulating exact samples from $Q$, provided we can evaluate the Radon-Nikodym derivative of $Q$ with respect to $P$. Maddison et al. originally showed that for a target distribution $Q$ and proposal distribution $P$, the runtime of A* sampling is upper bounded by $\mathcal{O}(\exp(D_{\infty}[Q||P]))$ where $D_{\infty}[Q||P]$ is the Renyi divergence from $Q$ to $P$. This runtime can be prohibitively large for many cases of practical interest. Here, we show that with additional restrictive assumptions on $Q$ and $P$, we can achieve much faster runtimes. Specifically, we show that if $Q$ and $P$ are distributions on $\mathbb{R}$ and their Radon-Nikodym derivative is unimodal, the runtime of A* sampling is $\mathcal{O}(D_{\infty}[Q||P])$, which is exponentially faster than A* sampling without assumptions.

It is a well-known fact that there is no complete and discrete invariant on the collection of all multiparameter persistence modules. Nonetheless, many invariants have been proposed in the literature to study multiparameter persistence modules, though each invariant will lose some amount of information. One such invariant is the generalized rank invariant. This invariant is known to be complete on the class of interval decomposable persistence modules in general, under mild assumptions on the indexing poset $P$. There is often a trade-off, where the stronger an invariant is, the more expensive it is to compute in practice. The generalized rank invariant on its own is difficult to compute, whereas the standard rank invariant is readily computable through software implementations such as RIVET. We can interpolate between these two to induce new invariants via restricting the domain of the generalized rank invariant, and this family exhibits the aforementioned trade-off. This work studies the tension which exists between computational efficiency and retaining strength when restricting the domain of the generalized rank invariant. We provide a characterization result on where such restrictions are complete invariants in the setting where $P$ is finite, and furthermore show that such restricted generalized rank invariants are stable.

This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate. These improve over the current best $\mathcal{\widetilde{O}}(T^{-1/2})$ rate of symmetric policy optimization type algorithms. We also extend this algorithm to multi-player general-sum Markov Games and show an $\mathcal{\widetilde{O}}(T^{-3/4})$ convergence rate to Coarse Correlated Equilibria (CCE). Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using "eager" value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with $H=2$ layers.

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains $ \tilde{O}\left( \max\{H,{1}/{p_{min}}\}H|S|^{3/2}\sqrt{|A|T\log(\max\{|\mathcal{F}|,|\mathcal{P}|\}/\delta)} \right)$ regret bound, with probability $1-\delta$, where $\mathcal{P}$ and $\mathcal{F}$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes. To our knowledge, our approach is the first optimistic approach applied to contextual MDPs with general function approximation (i.e., without additional knowledge regarding the function class, such as it being linear and etc.). In addition, we present a lower bound of $\Omega(\sqrt{T H |S| |A| \ln(|\mathcal{F}|/|S|)/\ln(|A|)})$, on the expected regret which holds even in the case of known dynamics.

In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.

Shape-constrained density estimation is an important topic in mathematical statistics. We focus on densities on $\mathbb{R}^d$ that are log-concave, and we study geometric properties of the maximum likelihood estimator (MLE) for weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the optimal log-concave density is piecewise linear and supported on a regular subdivision of the samples. This defines a map from the space of weights to the set of regular subdivisions of the samples, i.e. the face poset of their secondary polytope. We prove that this map is surjective. In fact, every regular subdivision arises in the MLE for some set of weights with positive probability, but coarser subdivisions appear to be more likely to arise than finer ones. To quantify these results, we introduce a continuous version of the secondary polytope, whose dual we name the Samworth body. This article establishes a new link between geometric combinatorics and nonparametric statistics, and it suggests numerous open problems.

We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm \epsilon n$ using those of a randomly sampled $\tilde {O}\left (\frac{\log^3 n}{\epsilon^3}\right ) \times \tilde O\left (\frac{\log^3 n}{\epsilon^3}\right )$ principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of $\pm \epsilon \sqrt{\text{nnz}(\mathbf{A})}$ and $\pm \epsilon \|\mathbf A\|_F$ when the rows of $\mathbf A$ can be sampled with probabilities proportional to their sparsities or their squared $\ell_2$ norms respectively. Here $\text{nnz}(\mathbf{A})$ is the number of non-zero entries in $\mathbf{A}$ and $\|\mathbf A\|_F$ is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS '20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of $\mathbf A$. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.

The aim of this research is to identify an efficient model to describe the fluctuations around the trend of the soil temperatures monitored in the volcanic caldera of the Campi Flegrei area in Naples (Italy). The study focuses on the data concerning the temperatures in the mentioned area through a seven-year period. The research is initially finalized to identify the deterministic component of the model, given by the seasonal trend of the temperatures, which is obtained through an adapted regression method on the time series. Subsequently, the stochastic component from the time series is tested to represent a fractional Brownian motion (fBm). An estimation based on the periodogram of the data is used to estabilish that the data series follows a fBm motion, rather then a fractional Gaussian noise. An estimation of the Hurst exponent $H$ of the process is also obtained. Finally, an inference test based on the detrended moving average of the data is adopted in order to assess the hypothesis that the time series follows a suitably estimated fBm.

北京阿比特科技有限公司