我的美女教师在线观看免费_亚洲风情无码免费视频_国产精品无码很刺激的视频_九九视频在线免费观看_国产日韩久久久久无码护士_亚洲中文字幕在线一区二区三区_91在线国偷自产国产永久

This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The state-of-the-art algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, a novel framework is proposed, which views the BAI problem as sequential hypothesis testing, and is amenable to tractable analysis for the exponential family of bandits. Based on this framework, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests. This algorithm has three features for both settings: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta-$PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, these approaches, which are focused only on the Gaussian setting, require Thompson sampling from the arm that is deemed the best and a challenger arm. This paper analytically shows that identifying the challenger is computationally expensive and that the proposed algorithm circumvents it. Finally, numerical experiments are provided to support the analysis.

相關內容

賭博機/老虎機

關注 0

Projection · 近似 · 可辨認的 · 可約的 · Continuity ·

2023 年 4 月 25 日

Asymptotic Behaviors and Phase Transitions in Projected Stochastic Approximation: A Jump Diffusion Approach

Jiadong Liang,Yuze Han,Xiang Li,Zhihua Zhang

In this paper we consider linearly constrained optimization problems and propose a loopless projection stochastic approximation (LPSA) algorithm. It performs the projection with probability $p_n$ at the $n$-th iteration to ensure feasibility. Considering a specific family of the probability $p_n$ and step size $\eta_n$, we analyze our algorithm from an asymptotic and continuous perspective. Using a novel jump diffusion approximation, we show that the trajectories connecting those properly rescaled last iterates weakly converge to the solution of specific stochastic differential equations (SDEs). By analyzing SDEs, we identify the asymptotic behaviors of LPSA for different choices of $(p_n, \eta_n)$. We find that the algorithm presents an intriguing asymptotic bias-variance trade-off and yields phase transition phenomenons, according to the relative magnitude of $p_n$ w.r.t. $\eta_n$. This finding provides insights on selecting appropriate ${(p_n, \eta_n)}_{n \geq 1}$ to minimize the projection cost. Additionally, we propose the Debiased LPSA (DLPSA) as a practical application of our jump diffusion approximation result. DLPSA is shown to effectively reduce projection complexity compared to vanilla LPSA.

賭博機/老虎機 · UniFormer · 估計/估計量 · 噪聲 · 學習器 ·

2023 年 4 月 25 日

Communication-Constrained Bandits under Additive Gaussian Noise

Prathamesh Mayekar,Jonathan Scarlett,Vincent Y. F. Tan

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

簇 · 模糊聚類 · Extensibility · AIM · 估計/估計量 ·

2023 年 4 月 24 日

Fuzzy clustering of ordinal time series based on two novel distances with economic applications

ángel López Oriona,Christian Weiss,José Antonio Vilar

Time series clustering is a central machine learning task with applications in many fields. While the majority of the methods focus on real-valued time series, very few works consider series with discrete response. In this paper, the problem of clustering ordinal time series is addressed. To this aim, two novel distances between ordinal time series are introduced and used to construct fuzzy clustering procedures. Both metrics are functions of the estimated cumulative probabilities, thus automatically taking advantage of the ordering inherent to the series' range. The resulting clustering algorithms are computationally efficient and able to group series generated from similar stochastic processes, reaching accurate results even though the series come from a wide variety of models. Since the dynamic of the series may vary over the time, we adopt a fuzzy approach, thus enabling the procedures to locate each series into several clusters with different membership degrees. An extensive simulation study shows that the proposed methods outperform several alternative procedures. Weighted versions of the clustering algorithms are also presented and their advantages with respect to the original methods are discussed. Two specific applications involving economic time series illustrate the usefulness of the proposed approaches.

估計/估計量 · Continuity · Processing（編程語言） · 相關系數 · 周期的 ·

2023 年 4 月 24 日

Estimation problem for continuous time stochastic processes with periodically correlated increments

Maksym Luz,Mikhail Moklyachuk

from arxiv, arXiv admin note: text overlap with arXiv:2007.11581, arXiv:2110.07952

We deal with the problem of optimal estimation of the linear functionals constructed from unobserved values of a continuous time stochastic process with periodically correlated increments based on past observations of this process. To solve the problem, we construct a corresponding to the process sequence of stochastic functions which forms an infinite dimensional vector stationary increment sequence. In the case of known spectral density of the stationary increment sequence, we obtain formulas for calculating values of the mean square errors and the spectral characteristics of the optimal estimates of the functionals. Formulas determining the least favorable spectral densities and the minimax (robust) spectral characteristics of the optimal linear estimates of functionals are derived in the case where the sets of admissible spectral densities are given.

有向 · 得分 · MoDELS · motivation · 統計量 ·

2023 年 4 月 24 日

Direction Augmentation in the Evaluation of Armed Conflict Predictions

Johannes Bracher,Lotta Rüter,Fabian Krüger,Sebastian Lerch,Melanie Schienle

from arxiv, 8 pages, 3 figures

In many forecasting settings, there is a specific interest in predicting the sign of an outcome variable correctly in addition to its magnitude. For instance, when forecasting armed conflicts, positive and negative log-changes in monthly fatalities represent escalation and de-escalation, respectively, and have very different implications. In the ViEWS forecasting challenge, a prediction competition on state-based violence, a novel evaluation score called targeted absolute deviation with direction augmentation (TADDA) has therefore been suggested, which accounts for both for the sign and magnitude of log-changes. While it has a straightforward intuitive motivation, the empirical results of the challenge show that a no-change model always predicting a log-change of zero outperforms all submitted forecasting models under the TADDA score. We provide a statistical explanation for this phenomenon. Analyzing the properties of TADDA, we find that in order to achieve good scores, forecasters often have an incentive to predict no or only modest log-changes. In particular, there is often an incentive to report conservative point predictions considerably closer to zero than the forecaster's actual predictive median or mean. In an empirical application, we demonstrate that a no-change model can be improved upon by tailoring predictions to the particularities of the TADDA score. We conclude by outlining some alternative scoring concepts.

超參數 · 可約的 · SPIN · Performer · 估計/估計量 ·

2023 年 4 月 24 日

Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing

Naoya Onizawa,Kyo Kuroki,Duckgyu Shin,Takahiro Hanyu

from arxiv, 13 pages

This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n^3) of the conventional method to O(1). The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.

INFORMS · 標注 · Learning · 可辨認的 · 組合性 ·

2023 年 4 月 24 日

Demonstration Informed Specification Search

Marcell Vazquez-Chanlatte,Ameesh Shah,Gil Lederman,Sanjit A. Seshia

This paper considers the problem of learning temporal task specifications, e.g. automata and temporal logic, from expert demonstrations. Task specifications are a class of sparse memory augmented rewards with explicit support for temporal and Boolean composition. Three features make learning temporal task specifications difficult: (1) the (countably) infinite number of tasks under consideration; (2) an a-priori ignorance of what memory is needed to encode the task; and (3) the discrete solution space - typically addressed by (brute force) enumeration. To overcome these hurdles, we propose Demonstration Informed Specification Search (DISS): a family of algorithms requiring only black box access to a maximum entropy planner and a task sampler from labeled examples. DISS then works by alternating between conjecturing labeled examples to make the provided demonstrations less surprising and sampling tasks consistent with the conjectured labeled examples. We provide a concrete implementation of DISS in the context of tasks described by Deterministic Finite Automata, and show that DISS is able to efficiently identify tasks from only one or two expert demonstrations.

蒸餾 · 潛在 · Learning · MoDELS · SimPLe ·

2023 年 4 月 21 日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Florent Delgrange,Ann Nowé,Guillermo A. Pérez

from arxiv, ICLR 2023, 10 pages main text, 14 pages appendix (excluding references)

Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.

優化器 · MAC · MATLAB · 泛函 · 粒子群優化算法 ·

2023 年 4 月 14 日

MAC, a novel stochastic optimization method

Attila László Nagy,Goitom Simret Kidane,Tamás Turányi,János Tóth

A novel stochastic optimization method called MAC was suggested. The method is based on the calculation of the objective function at several random points and then an empirical expected value and an empirical covariance matrix are calculated. The empirical expected value is proven to converge to the optimum value of the problem. The MAC algorithm was encoded in Matlab and the code was tested on 20 test problems. Its performance was compared with those of the interior point method (Matlab name: fmincon), simplex, pattern search (PS), simulated annealing (SA), particle swarm optimization (PSO), and genetic algorithm (GA) methods. The MAC method failed two test functions and provided inaccurate results on four other test functions. However, it provided accurate results and required much less CPU time than the widely used optimization methods on the other 14 test functions.

可約的 · MoDELS · 向量化 · 變分自編碼 · Performer ·

2023 年 4 月 11 日

VpROM: A novel Variational AutoEncoder-boosted Reduced Order Model for the treatment of parametric dependencies in nonlinear systems

Thomas Simpson,Konstantinos Vlachas,Anthony Garland,Nikolaos Dervilis,Eleni Chatzi

Reduced Order Models (ROMs) are of considerable importance in many areas of engineering in which computational time presents difficulties. Established approaches employ projection-based reduction such as Proper Orthogonal Decomposition, however, such methods can become inefficient or fail in the case of parameteric or strongly nonlinear models. Such limitations are usually tackled via a library of local reduction bases each of which being valid for a given parameter vector. The success of such methods, however, is strongly reliant upon the method used to relate the parameter vectors to the local bases, this is typically achieved using clustering or interpolation methods. We propose the replacement of these methods with a Variational Autoencoder (VAE) to be used as a generative model which can infer the local basis corresponding to a given parameter vector in a probabilistic manner. The resulting VAE-boosted parametric ROM \emph{VpROM} still retains the physical insights of a projection-based method but also allows for better treatment of problems where model dependencies or excitation traits cause the dynamic behavior to span multiple response regimes. Moreover, the probabilistic treatment of the VAE representation allows for uncertainty quantification on the reduction bases which may then be propagated to the ROM response. The performance of the proposed approach is validated on an open-source simulation benchmark featuring hysteresis and multi-parametric dependencies, and on a large-scale wind turbine tower characterised by nonlinear material behavior and model uncertainty.