亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables. The mean values of these variables are known a priori from historical data and can be used as control variates. Leveraging the theory of control variates, we obtain mean estimates with smaller variance and tighter confidence bounds. We develop an upper confidence bound based algorithm named UCB-CV and characterize the regret bounds in terms of the correlation between rewards and control variates when they follow a multivariate normal distribution. We also extend UCB-CV to other distributions using resampling methods like Jackknifing and Splitting. Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms.

相關內容

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.

This paper focuses on stochastic saddle point problems with decision-dependent distributions. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two solution types is bounded provided that the objective has a strongly-convex-strongly-concave payoff and a Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration. Moreover, we show convergence to a neighborhood almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures that the objective is strongly-convex-strongly-concave. We tailor the convergence results for the primal-dual algorithms to this opposing mixture dominance setup.

Collision avoidance is a widely investigated topic in robotic applications. When applying collision avoidance techniques to a mobile robot, how to deal with the spatial structure of the robot still remains a challenge. In this paper, we design a configuration-aware safe control law by solving a Quadratic Programming (QP) with designed Control Barrier Functions (CBFs) constraints, which can safely navigate a mobile robotic arm to a desired region while avoiding collision with environmental obstacles. The advantage of our approach is that it correctly and in an elegant way incorporates the spatial structure of the mobile robotic arm. This is achieved by merging geometric restrictions among mobile robotic arm links into CBFs constraints. Simulations on a rigid rod and the modeled mobile robotic arm are performed to verify the feasibility and time-efficiency of proposed method. Numerical results about the time consuming for different degrees of freedom illustrate that our method scales well with dimension.

This paper studies the application of reconfigurable intelligent surface (RIS) to cooperative non-orthogonal multiple access (C-NOMA) networks with simultaneous wireless information and power transfer (SWIPT). We aim for maximizing the rate of the strong user with guaranteed weak user's quality of service (QoS) by jointly optimizing power splitting factors, beamforming coefficients, and RIS reflection coefficients in two transmission phases. The formulated problem is difficult to solve due to its complex and non-convex constraints. To tackle this challenging problem, we first use alternating optimization (AO) framework to transform it into three subproblems, and then use the penalty-based arithmetic-geometric mean approximation (PBAGM) algorithm and the successive convex approximation (SCA)-based method to solve them. Numerical results verify the superiority of the proposed algorithm over the baseline schemes.

In this study, we examine a clustering problem in which the covariates of each individual element in a dataset are associated with an uncertainty specific to that element. More specifically, we consider a clustering approach in which a pre-processing applying a non-linear transformation to the covariates is used to capture the hidden data structure. To this end, we approximate the sets representing the propagated uncertainty for the pre-processed features empirically. To exploit the empirical uncertainty sets, we propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over such sets, yielding more condensed clusters. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way. The GOC algorithm demonstrates an improved performance in finding sibling stars originating from the same dwarf galaxy. These realistic datasets have also been made publicly available.

We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems, which we call dynamical dimension reduction (DDR). In the DDR model, each point is evolved via a nonlinear flow towards a lower-dimensional subspace; the projection onto the subspace gives the low-dimensional embedding. Training the model involves identifying the nonlinear flow and the subspace. Following the equation discovery method, we represent the vector field that defines the flow using a linear combination of dictionary elements, where each element is a pre-specified linear/nonlinear candidate function. A regularization term for the average total kinetic energy is also introduced and motivated by optimal transport theory. We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method. We also show how the DDR method can be trained using a gradient-based optimization method, where the gradients are computed using the adjoint method from optimal control theory. The DDR method is implemented and compared on synthetic and example datasets to other dimension reductions methods, including PCA, t-SNE, and Umap.

Kernel smooth is the most fundamental technique for data density and regression estimation. However, time-consuming is the biggest obstacle for the application that the direct evaluation of kernel smooth for $N$ samples needs ${O}\left( {{N}^{2}} \right)$ operations. People have developed fast smooth algorithms using the idea of binning with FFT. Unfortunately, the accuracy is not controllable, and the implementation for multivariable and its bandwidth selection for the fast method is not available. Hence, we introduce a new MATLAB toolbox for fast multivariate kernel regression with the idea of non-uniform FFT (NUFFT), which implemented the algorithm for $M$ gridding points with ${O}\left( N+M\log M \right)$ complexity and accuracy controllability. The bandwidth selection problem utilizes the Fast Monte-Carlo algorithm to estimate the degree of freedom (DF), saving enormous cross-validation time even better when data share the same grid space for multiple regression. Up to now, this is the first toolbox for fast-binning high-dimensional kernel regression. Moreover, the estimation for local polynomial regression, the conditional variance for the heteroscedastic model, and the complex-valued datasets are also implemented in this toolbox. The performance is demonstrated with simulations and an application on the quantitive EEG.

Holonomic functions play an essential role in Computer Algebra since they allow the application of many symbolic algorithms. Among all algorithmic attempts to find formulas for power series, the holonomic property remains the most important requirement to be satisfied by the function under consideration. The targeted functions mainly summarize that of meromorphic functions. However, expressions like $\tan(z)$, $z/(\exp(z)-1)$, $\sec(z)$, etc., particularly, reciprocals, quotients and compositions of holonomic functions, are generally not holonomic. Therefore their power series are inaccessible by the holonomic framework. From the mathematical dictionaries, one can observe that most of the known closed-form formulas of non-holonomic power series involve another sequence whose evaluation depends on some finite summations. In the case of $\tan(z)$ and $\sec(z)$ the corresponding sequences are the Bernoulli and Euler numbers, respectively. Thus providing a symbolic approach that yields complete representations when linear summations for power series coefficients of non-holonomic functions appear, might be seen as a step forward towards the representation of non-holonomic power series. By adapting the method of ansatz with undetermined coefficients, we build an algorithm that computes least-order quadratic differential equations with polynomial coefficients for a large class of non-holonomic functions. A differential equation resulting from this procedure is converted into a recurrence equation by applying the Cauchy product formula and rewriting powers into polynomials and derivatives into shifts. Finally, using enough initial values we are able to give normal form representations to characterize several non-holonomic power series and prove non-trivial identities. We discuss this algorithm and its implementation for Maple 2022.

北京阿比特科技有限公司