亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

One of the main problem in prediction theory of stationary processes $X(t)$ is to describe the asymptotic behavior of the best linear mean squared prediction error in predicting $X(0)$ given $ X(t),$ $-n\le t\le-1$, as $n$ goes to infinity. This behavior depends on the regularity (deterministic or non-deterministic) of the process $X(t)$. In his seminal paper {\it 'Some purely deterministic processes' (J. of Math. and Mech.,} {\bf 6}(6), 801-810, 1957), for a specific spectral density that has a very high order contact with zero M. Rosenblatt showed that the prediction error behaves like a power as $n\to\f$. In the paper Babayan et al. {\it 'Extensions of Rosenblatt's results on the asymptotic behavior of the prediction error for deterministic stationary sequences' (J. Time Ser. Anal.} {\bf 42}, 622-652, 2021), Rosenblatt's result was extended to the class of spectral densities of the form $f=f_dg$, where $f_d$ is the spectral density of a deterministic process that has a very high order contact with zero, while $g$ is a function that can have polynomial type singularities. In this paper, we describe new extensions of the above quoted results in the case where the function $g$ can have {\it arbitrary power type singularities}. Examples illustrate the obtained results.

相關內容

We propose the homotopic policy mirror descent (HPMD) method for solving discounted, infinite horizon MDPs with finite state and action space, and study its policy convergence. We report three properties that seem to be new in the literature of policy gradient methods: (1) The policy first converges linearly, then superlinearly with order $\gamma^{-2}$ to the set of optimal policies, after $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state. No regularization is added to the optimization objective and hence the second observation arises solely as an algorithmic property of the homotopic policy gradient method. (3) For the stochastic HPMD method, we further demonstrate a better than $\mathcal{O}(|\mathcal{S}| |\mathcal{A}| / \epsilon^2)$ sample complexity for small optimality gap $\epsilon$, when assuming a generative model for policy evaluation.

Optimal linear prediction (aka. kriging) of a random field $\{Z(x)\}_{x\in\mathcal{X}}$ indexed by a compact metric space $(\mathcal{X},d_{\mathcal{X}})$ can be obtained if the mean value function $m\colon\mathcal{X}\to\mathbb{R}$ and the covariance function $\varrho\colon\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ of $Z$ are known. We consider the problem of predicting the value of $Z(x^*)$ at some location $x^*\in\mathcal{X}$ based on observations at locations $\{x_j\}_{j=1}^n$ which accumulate at $x^*$ as $n\to\infty$ (or, more generally, predicting $\varphi(Z)$ based on $\{\varphi_j(Z)\}_{j=1}^n$ for linear functionals $\varphi,\varphi_1,\ldots,\varphi_n$). Our main result characterizes the asymptotic performance of linear predictors (as $n$ increases) based on an incorrect second order structure $(\tilde{m},\tilde{\varrho})$, without any restrictive assumptions on $\varrho,\tilde{\varrho}$ such as stationarity. We, for the first time, provide necessary and sufficient conditions on $(\tilde{m},\tilde{\varrho})$ for asymptotic optimality of the corresponding linear predictor holding uniformly with respect to $\varphi$. These general results are illustrated by weakly stationary random fields on $\mathcal{X}\subset\mathbb{R}^d$ with Mat\'ern or periodic covariance functions, and on the sphere $\mathcal{X}=\mathbb{S}^2$ for the case of two isotropic covariance functions.

A theoretical, and potentially also practical, problem with stochastic gradient descent is that trajectories may escape to infinity. In this note, we investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant. Under smoothness and $R$-dissipativity of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values. Several important applications that satisfy these assumptions, including phase retrieval problems, Gaussian mixture models and some neural network classifiers, are discussed in detail.

Software measurement is an essential management tool to develop robust and maintainable software systems. Software metrics can be used to control the inherent complexities in software design. To guarantee that the components of the software are inevitably testable, the testability attribute is used, which is a sub-characteristics of the software's maintabilility as well as quality assurance. This study investigates the relationship between static code and test metrics and testability and test cases effectiveness. The study answers three formulated research questions. The results of the analysis showed that size and complexity metrics are suitable for predicting the testability of object-oriented classes.

A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data. But far away from them, ReLU Bayesian neural networks (BNNs) can still underestimate uncertainty and thus be asymptotically overconfident. This issue arises since the output variance of a BNN with finitely many features is quadratic in the distance from the data region. Meanwhile, Bayesian linear models with ReLU features converge, in the infinite-width limit, to a particular Gaussian process (GP) with a variance that grows cubically so that no asymptotic overconfidence can occur. While this may seem of mostly theoretical interest, in this work, we show that it can be used in practice to the benefit of BNNs. We extend finite ReLU BNNs with infinite ReLU features via the GP and show that the resulting model is asymptotically maximally uncertain far away from the data while the BNNs' predictive power is unaffected near the data. Although the resulting model approximates a full GP posterior, thanks to its structure, it can be applied \emph{post-hoc} to any pre-trained ReLU BNN at a low cost.

We study synchronous Q-learning with Polyak-Ruppert averaging (a.k.a., averaged Q-learning) in a $\gamma$-discounted MDP. We establish a functional central limit theorem (FCLT) for the averaged iteration $\bar{\boldsymbol{Q}}_T$ and show its standardized partial-sum process weakly converges to a rescaled Brownian motion. Furthermore, we show that $\bar{\boldsymbol{Q}}_T$ is actually a regular asymptotically linear (RAL) estimator for the optimal Q-value function $\boldsymbol{Q}^*$ with the most efficient influence function. This implies the averaged Q-learning iteration has the smallest asymptotic variance among all RAL estimators. In addition, we present a non-asymptotic analysis for the $\ell_{\infty}$ error $\mathbb{E}\|\bar{\boldsymbol{Q}}_T-\boldsymbol{Q}^*\|_{\infty}$, showing for polynomial step sizes it matches the instance-dependent lower bound as well as the optimal minimax complexity lower bound. In short, our theoretical analysis shows averaged Q-learning is statistically efficient.

We present and investigate a new type of implicit fractional linear multistep method of order two for fractional initial value problems. The method is obtained from the second order super convergence of the Gr\"unwald-Letnikov approximation of the fractional derivative at a non-integer shift point. The proposed method is of order two consistency and coincides with the backward difference method of order two for classical initial value problems when the order of the derivative is one. The weight coefficients of the proposed method are obtained from the Gr\"unwald weights and hence computationally efficient compared with that of the fractional backward difference formula of order two. The stability properties are analyzed and shown that the stability region of the method is larger than that of the fractional Adams-Moulton method of order two and the fractional trapezoidal method. Numerical result and illustrations are presented to justify the analytical theories.

We study orbit-finite systems of linear equations, in the setting of sets with atoms. Our principal contribution is a decision procedure for solvability of such systems. The procedure works for every field (and even commutative ring) under mild effectiveness assumptions, and reduces a given orbit-finite system to a number of finite ones: exponentially many in general, but polynomially many when atom dimension of input systems is fixed. Towards obtaining the procedure we push further the theory of vector spaces generated by orbit-finite sets, and show that each such vector space admits an orbit-finite basis. This fundamental property is a key tool in our development, but should be also of wider interest.

We analyze the orthogonal greedy algorithm when applied to dictionaries $\mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $\mathbb{D}$ decays at a rate of $O(n^{-\frac{1}{2}-\alpha})$ for $\alpha > 0$, then the orthogonal greedy algorithm converges at the same rate on the variation space of $\mathbb{D}$. This improves upon the well-known $O(n^{-\frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. These results hold under no additional assumptions on the dictionary beyond the decay rate of the entropy of its convex hull. In addition, they are robust to noise in the target function and can be extended to convergence rates on the interpolation spaces of the variation norm. We show empirically that the predicted rates are obtained for the dictionary corresponding to shallow neural networks with Heaviside activation function in two dimensions. Finally, we show that these improved rates are sharp and prove a negative result showing that the iterates generated by the orthogonal greedy algorithm cannot in general be bounded in the variation norm of $\mathbb{D}$.

Click-through rate (CTR) prediction is one of the fundamental tasks for e-commerce search engines. As search becomes more personalized, it is necessary to capture the user interest from rich behavior data. Existing user behavior modeling algorithms develop different attention mechanisms to emphasize query-relevant behaviors and suppress irrelevant ones. Despite being extensively studied, these attentions still suffer from two limitations. First, conventional attentions mostly limit the attention field only to a single user's behaviors, which is not suitable in e-commerce where users often hunt for new demands that are irrelevant to any historical behaviors. Second, these attentions are usually biased towards frequent behaviors, which is unreasonable since high frequency does not necessarily indicate great importance. To tackle the two limitations, we propose a novel attention mechanism, termed Kalman Filtering Attention (KFAtt), that considers the weighted pooling in attention as a maximum a posteriori (MAP) estimation. By incorporating a priori, KFAtt resorts to global statistics when few user behaviors are relevant. Moreover, a frequency capping mechanism is incorporated to correct the bias towards frequent behaviors. Offline experiments on both benchmark and a 10 billion scale real production dataset, together with an Online A/B test, show that KFAtt outperforms all compared state-of-the-arts. KFAtt has been deployed in the ranking system of a leading e commerce website, serving the main traffic of hundreds of millions of active users everyday.

北京阿比特科技有限公司