亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.

相關內容

Simulating physical problems involving multi-time scale coupling is challenging due to the need of solving these multi-time scale processes simultaneously. In response to this challenge, this paper proposed an explicit multi-time step algorithm coupled with a solid dynamic relaxation scheme. The explicit scheme simplifies the equation system in contrast to the implicit scheme, while the multi-time step algorithm allows the equations of different physical processes to be solved under different time step sizes. Furthermore, an implicit viscous damping relaxation technique is applied to significantly reduce computational iterations required to achieve equilibrium in the comparatively fast solid response process. To validate the accuracy and efficiency of the proposed algorithm, two distinct scenarios, i.e., a nonlinear hardening bar stretching and a fluid diffusion coupled with Nafion membrane flexure, are simulated. The results show good agreement with experimental data and results from other numerical methods, and the simulation time is reduced firstly by independently addressing different processes with the multi-time step algorithm and secondly decreasing solid dynamic relaxation time through the incorporation of damping techniques.

In this research work, we propose a high-order time adapted scheme for pricing a coupled system of fixed-free boundary constant elasticity of variance (CEV) model on both equidistant and locally refined space-grid. The performance of our method is substantially enhanced to improve irregularities in the model which are both inherent and induced. Furthermore, the system of coupled PDEs is strongly nonlinear and involves several time-dependent coefficients that include the first-order derivative of the early exercise boundary. These coefficients are approximated from a fourth-order analytical approximation which is derived using a regularized square-root function. The semi-discrete equation for the option value and delta sensitivity is obtained from a non-uniform fourth-order compact finite difference scheme. Fifth-order 5(4) Dormand-Prince time integration method is used to solve the coupled system of discrete equations. Enhancing the performance of our proposed method with local mesh refinement and adaptive strategies enables us to obtain highly accurate solution with very coarse space grids, hence reducing computational runtime substantially. We further verify the performance of our methodology as compared with some of the well-known and better-performing existing methods.

Digital memcomputing machines (DMMs) are a new class of computing machines that employ non-quantum dynamical systems with memory to solve combinatorial optimization problems. Here, we show that the time to solution (TTS) of DMMs follows an inverse Gaussian distribution, with the TTS self-averaging with increasing problem size, irrespective of the problem they solve. We provide both an analytical understanding of this phenomenon and numerical evidence by solving instances of the 3-SAT (satisfiability) problem. The self-averaging property of DMMs with problem size implies that they are increasingly insensitive to the detailed features of the instances they solve. This is in sharp contrast to traditional algorithms applied to the same problems, illustrating another advantage of this physics-based approach to computation.

Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.

In this paper we establish limit theorems for power variations of stochastic processes controlled by fractional Brownian motions with Hurst parameter $H\leq 1/2$. We show that the power variations of such processes can be decomposed into the mix of several weighted random sums plus some remainder terms, and the convergences of power variations are dominated by different combinations of those weighted sums depending on whether $H<1/4$, $H=1/4$, or $H>1/4$. We show that when $H\geq 1/4$ the centered power variation converges stably at the rate $n^{-1/2}$, and when $H<1/4$ it converges in probability at the rate $n^{-2H}$. We determine the limit of the mixed weighted sum based on a rough path approach developed in \cite{LT20}.

A novel and fully distributed optimization method is proposed for the distributed robust convex program (DRCP) over a time-varying unbalanced directed network without imposing any differentiability assumptions. Firstly, a tractable approximated DRCP (ADRCP) is introduced by discretizing the semi-infinite constraints into a finite number of inequality constraints and restricting the right-hand side of the constraints with a proper positive parameter, which will be iteratively solved by a random-fixed projection algorithm. Secondly, a cutting-surface consensus approach is proposed for locating an approximately optimal consensus solution of the DRCP with guaranteed feasibility. This approach is based on iteratively approximating the DRCP by successively reducing the restriction parameter of the right-hand constraints and populating the cutting-surfaces into the existing finite set of constraints. Thirdly, to ensure finite-time convergence of the distributed optimization, a distributed termination algorithm is developed based on uniformly local consensus and zeroth-order optimality under uniformly strongly connected graphs. Fourthly, it is proved that the cutting-surface consensus approach converges within a finite number of iterations. Finally, the effectiveness of the approach is illustrated through a numerical example.

A multi-step extended maximum residual Kaczmarz method is presented for the solution of the large inconsistent linear system of equations by using the multi-step iterations technique. Theoretical analysis proves the proposed method is convergent and gives an upper bound on its convergence rate. Numerical experiments show that the proposed method is effective and outperforms the existing extended Kaczmarz methods in terms of the number of iteration steps and the computational costs.

Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.

We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous time.

We consider the problem of minimizing the makespan on batch processing identical machines, subject to compatibility constraints, where two jobs are compatible if they can be processed simultaneously in a same batch. These constraints are modeled by an undirected graph $G$, in which compatible jobs are represented by adjacent vertices. We show that several subproblems are polynomial. We propose some exact polynomial algorithms to solve these subproblems. To solve the general case, we propose a mixed-integer linear programming (MILP) formulation alongside with heuristic approaches. Furthermore, computational experiments are carried out to measure the performance of the proposed methods.

北京阿比特科技有限公司