好诱人的搜子好爽免费观看,99视频在线播放喷射,亚洲中文欧美日韩在线观看

We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.

相關內容

Markov

關注 1

Processing（編程語言） · 可約的 · 縮放 · 衰減 · 相互獨立的 ·

2023 年 9 月 7 日

An explicit multi-time stepping algorithm for multi-time scale coupling problems in SPH

Xiaojing Tang,Dong Wu,Zhengtong Wang,Oskar Haidn,Xiangyu Hu

from arxiv, 37 pages 20 figures

Simulating physical problems involving multi-time scale coupling is challenging due to the need of solving these multi-time scale processes simultaneously. In response to this challenge, this paper proposed an explicit multi-time step algorithm coupled with a solid dynamic relaxation scheme. The explicit scheme simplifies the equation system in contrast to the implicit scheme, while the multi-time step algorithm allows the equations of different physical processes to be solved under different time step sizes. Furthermore, an implicit viscous damping relaxation technique is applied to significantly reduce computational iterations required to achieve equilibrium in the comparatively fast solid response process. To validate the accuracy and efficiency of the proposed algorithm, two distinct scenarios, i.e., a nonlinear hardening bar stretching and a fluid diffusion coupled with Nafion membrane flexure, are simulated. The results show good agreement with experimental data and results from other numerical methods, and the simulation time is reduced firstly by independently addressing different processes with the multi-time step algorithm and secondly decreasing solid dynamic relaxation time through the incorporation of damping techniques.

Performer · 時間步 · MoDELS · 模型評估 · 近似 ·

2023 年 9 月 7 日

Enhancing accuracy for solving American CEV model with high-order compact scheme and adaptive time stepping

Chinonso Nwankwo,Weizhong Dai

In this research work, we propose a high-order time adapted scheme for pricing a coupled system of fixed-free boundary constant elasticity of variance (CEV) model on both equidistant and locally refined space-grid. The performance of our method is substantially enhanced to improve irregularities in the model which are both inherent and induced. Furthermore, the system of coupled PDEs is strongly nonlinear and involves several time-dependent coefficients that include the first-order derivative of the early exercise boundary. These coefficients are approximated from a fourth-order analytical approximation which is derived using a regularized square-root function. The semi-discrete equation for the option value and delta sensitivity is obtained from a non-uniform fourth-order compact finite difference scheme. Fifth-order 5(4) Dormand-Prince time integration method is used to solve the coupled system of discrete equations. Enhancing the performance of our proposed method with local mesh refinement and adaptive strategies enables us to obtain highly accurate solution with very coarse space grids, hence reducing computational runtime substantially. We further verify the performance of our methodology as compared with some of the well-known and better-performing existing methods.

語音合成 · 示例 · 可理解性 · 動力系統 · contrastive ·

2023 年 9 月 7 日

Self-averaging of digital memcomputing machines

Daniel Primosch,Yuan-Hang Zhang,Massimiliano Di Ventra

from arxiv, 9 pages, 13 figures

Digital memcomputing machines (DMMs) are a new class of computing machines that employ non-quantum dynamical systems with memory to solve combinatorial optimization problems. Here, we show that the time to solution (TTS) of DMMs follows an inverse Gaussian distribution, with the TTS self-averaging with increasing problem size, irrespective of the problem they solve. We provide both an analytical understanding of this phenomenon and numerical evidence by solving instances of the 3-SAT (satisfiability) problem. The self-averaging property of DMMs with problem size implies that they are increasingly insensitive to the detailed features of the instances they solve. This is in sharp contrast to traditional algorithms applied to the same problems, illustrating another advantage of this physics-based approach to computation.

穩健性 · Networking · 噪聲 · 決策平面 · 圖片分類 ·

2023 年 9 月 7 日

How adversarial attacks can disrupt seemingly stable accurate classifiers

Oliver J. Sutton,Qinghua Zhou,Ivan Y. Tyukin,Alexander N. Gorban,Alexander Bastounis,Desmond J. Higham

from arxiv, 11 pages, 8 figures, additional supplementary materials

Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.

Processing（編程語言） · Weight · 控制器 · 分解 · 混合 ·

2023 年 9 月 7 日

Power variations and limit theorems for stochastic processes controlled by fractional Brownian motions

Yanghui Liu,Xiaohua Wang

from arxiv, 27 pages

In this paper we establish limit theorems for power variations of stochastic processes controlled by fractional Brownian motions with Hurst parameter $H\leq 1/2$. We show that the power variations of such processes can be decomposed into the mix of several weighted random sums plus some remainder terms, and the convergences of power variations are dominated by different combinations of those weighted sums depending on whether $H<1/4$, $H=1/4$, or $H>1/4$. We show that when $H\geq 1/4$ the centered power variation converges stably at the rate $n^{-1/2}$, and when $H<1/4$ it converges in probability at the rate $n^{-2H}$. We determine the limit of the mixed weighted sum based on a rough path approach developed in \cite{LT20}.

優化器 · 穩健性 · 約束 · 近似 · 易處理的 ·

2023 年 9 月 7 日

A cutting-surface consensus approach for distributed robust optimization of multi-agent systems

Jun Fu,Xunhao Wu

from arxiv, Submitted to IEEE Transactions on Automatic Control

A novel and fully distributed optimization method is proposed for the distributed robust convex program (DRCP) over a time-varying unbalanced directed network without imposing any differentiability assumptions. Firstly, a tractable approximated DRCP (ADRCP) is introduced by discretizing the semi-infinite constraints into a finite number of inequality constraints and restricting the right-hand side of the constraints with a proper positive parameter, which will be iteratively solved by a random-fixed projection algorithm. Secondly, a cutting-surface consensus approach is proposed for locating an approximately optimal consensus solution of the DRCP with guaranteed feasibility. This approach is based on iteratively approximating the DRCP by successively reducing the restriction parameter of the right-hand constraints and populating the cutting-surfaces into the existing finite set of constraints. Thirdly, to ensure finite-time convergence of the distributed optimization, a distributed termination algorithm is developed based on uniformly local consensus and zeroth-order optimality under uniformly strongly connected graphs. Fourthly, it is proved that the cutting-surface consensus approach converges within a finite number of iterations. Finally, the effectiveness of the approach is illustrated through a numerical example.

線性的 · Analysis · 講稿 · 數值分析 ·

2023 年 9 月 6 日

On multi-step extended maximum residual Kaczmarz method for solving large inconsistent linear systems

Aqin Xiao,Junfeng Yin,Ning Zheng

A multi-step extended maximum residual Kaczmarz method is presented for the solution of the large inconsistent linear system of equations by using the multi-step iterations technique. Theoretical analysis proves the proposed method is convergent and gives an upper bound on its convergence rate. Numerical experiments show that the proposed method is effective and outperforms the existing extended Kaczmarz methods in terms of the number of iteration steps and the computational costs.

優化器 · 泛函 · 可約的 · SimPLe · 離散化 ·

2023 年 9 月 6 日

Random postprocessing for combinatorial Bayesian optimization

Keisuke Morita,Yoshihiko Nishikawa,Masayuki Ohzeki

from arxiv, 5 pages, 4 figures

Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.

Continuity · INTERACT · 離散化 · Learning · 近似 ·

2023 年 9 月 6 日

Near-continuous time Reinforcement Learning for continuous state-action spaces

Lorenzo Croissant,Marc Abeille,Bruno Bouchard

We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}}(\sqrt{T})$ is attainable in near-continuous time.

Processing（編程語言） · 圖 · 無向圖 · 無向 · Performer ·

2023 年 9 月 5 日

Compatibility graphs in scheduling on batch processing machines

Khaoula Bouakaz,Mourad Boudhar

from arxiv, 25 pages, 11 figures

We consider the problem of minimizing the makespan on batch processing identical machines, subject to compatibility constraints, where two jobs are compatible if they can be processed simultaneously in a same batch. These constraints are modeled by an undirected graph $G$, in which compatible jobs are represented by adjacent vertices. We show that several subproblems are polynomial. We propose some exact polynomial algorithms to solve these subproblems. To solve the general case, we propose a mixed-integer linear programming (MILP) formulation alongside with heuristic approaches. Furthermore, computational experiments are carried out to measure the performance of the proposed methods.