男女一边脱一边亲一边膜,国产亚洲欧美丝袜在线观看三区

We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms.

相關內容

Processing（編程語言）

關注 121

Processing 是一門開源編程語言和與之配套的集成開發環境（IDE）的名稱。Processing 在電子藝術和視覺設計社區被用來教授編程基礎，并運用于大量的新媒體和互動藝術作品中。

優化器 · 穩健性 · Integration · INFORMS · 估計/估計量 ·

2022 年 7 月 26 日

Robustifying Conditional Portfolio Decisions via Optimal Transport

Viet Anh Nguyen,Fan Zhang,Jose Blanchet,Erick Delage,Yinyu Ye

from arxiv, 5 figures

We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks.

零空間 · Learning · Continuity · Performer · 可約的 ·

2022 年 7 月 25 日

Balancing Stability and Plasticity through Advanced Null Space in Continual Learning

Yajing Kong,Liu Liu,Zhen Wang,Dacheng Tao

from arxiv, Accepted by ECCV2022 (Oral)

Continual learning is a learning paradigm that learns tasks sequentially with resources constraints, in which the key challenge is stability-plasticity dilemma, i.e., it is uneasy to simultaneously have the stability to prevent catastrophic forgetting of old tasks and the plasticity to learn new tasks well. In this paper, we propose a new continual learning approach, Advanced Null Space (AdNS), to balance the stability and plasticity without storing any old data of previous tasks. Specifically, to obtain better stability, AdNS makes use of low-rank approximation to obtain a novel null space and projects the gradient onto the null space to prevent the interference on the past tasks. To control the generation of the null space, we introduce a non-uniform constraint strength to further reduce forgetting. Furthermore, we present a simple but effective method, intra-task distillation, to improve the performance of the current task. Finally, we theoretically find that null space plays a key role in plasticity and stability, respectively. Experimental results show that the proposed method can achieve better performance compared to state-of-the-art continual learning approaches.

動態貝葉斯網絡 · 貝葉斯網/貝葉斯網絡 · Processing（編程語言） · 混合模型 · 優化器 ·

2022 年 7 月 24 日

Policy Optimization in Dynamic Bayesian Network Hybrid Models of Biomanufacturing Processes

Hua Zheng,Wei Xie,Ilya O. Ryzhov,Dongming Xie

from arxiv, 36 pages, 6 figures

Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicines. Biomanufacturing processes require close monitoring and control, in the presence of complex bioprocess dynamics with many interdependent factors, as well as extremely limited data due to the high cost of experiments as well as the novelty of personalized bio-drugs. We develop a novel model-based reinforcement learning framework that can achieve human-level control in low-data environments. The model uses a dynamic Bayesian network to capture causal interdependencies between factors and predict how the effects of different inputs propagate through the pathways of the bioprocess mechanisms. This enables the design of process control policies that are both interpretable and robust against model risk. We present a computationally efficient, provably convergence stochastic gradient method for optimizing such policies. Validation is conducted on a realistic application with a multi-dimensional, continuous state variable.

Learning · Continuity · 價值函數 · PG · 泛函 ·

2022 年 7 月 23 日

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

Yanwei Jia,Xun Yu Zhou

from arxiv, 52 pages, 1 figure

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples.

優化器 · 離散化 · dynamic programming · 泰勒 · PDE ·

2022 年 7 月 23 日

A New Approach to Drifting Games, Based on Asymptotically Optimal Potentials

Zhilei Wang,Robert V. Kohn

We develop a new approach to drifting games, a class of two-person games with many applications to boosting and online learning settings, including Prediction with Expert Advice and the Hedge game. Our approach involves (a) guessing an asymptotically optimal potential by solving an associated partial differential equation (PDE); then (b) justifying the guess, by proving upper and lower bounds on the final-time loss whose difference scales like a negative power of the number of time steps. The proofs of our potential-based upper bounds are elementary, using little more than Taylor expansion. The proofs of our potential-based lower bounds are also rather elementary, combining Taylor expansion with probabilistic or combinatorial arguments. Most previous work on asymptotically optimal strategies has used potentials obtained by solving a discrete dynamic programming principle; the arguments are complicated by their discrete nature. Our approach is facilitated by the fact that the potentials we use are explicit solutions of PDEs; the arguments are based on basic calculus. Not only is our approach more elementary, but we give new potentials and derive corresponding upper and lower bounds that match each other in the asymptotic regime.

Markov · 估計/估計量 · CASE · Processing（編程語言） · 最大似然估計 ·

2022 年 7 月 22 日

Estimating absorption time distributions of general Markov jump processes

Jamaal Ahmad,Martin Bladt,Mogens Bladt

The estimation of absorption time distributions of Markov jump processes is an important task in various branches of statistics and applied probability. While the time-homogeneous case is classic, the time-inhomogeneous case has recently received increased attention due to its added flexibility and advances in computational power. However, commuting sub-intensity matrices are assumed, which in various cases limits the parsimonious properties of the resulting representation. This paper develops the theory required to solve the general case through maximum likelihood estimation, and in particular, using the expectation-maximization algorithm. A reduction to a piecewise constant intensity matrix function is proposed in order to provide succinct representations, where a parametric linear model binds the intensities together. Practical aspects are discussed and illustrated through the estimation of notoriously demanding theoretical distributions and real data, from the perspective of matrix analytic methods.

可辨認的 · Learning · 動力系統 · 線性的 · 樣本復雜度 ·

2022 年 7 月 22 日

On the sample complexity of stabilizing linear dynamical systems from data

Steffen W. R. Werner,Benjamin Peherstorfer

from arxiv, 29 pages, 4 figures

Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.

MoDELS · 最大似然估計 · 極大似然 · 極大似然估計 · 蒙特卡羅 ·

2022 年 7 月 22 日

Time-Varying Poisson Autoregression

Giovanni Angelini,Giuseppe Cavaliere,Enzo D'Innocenzo,Luca De Angelis

In this paper we propose a new time-varying econometric model, called Time-Varying Poisson AutoRegressive with eXogenous covariates (TV-PARX), suited to model and forecast time series of counts. {We show that the score-driven framework is particularly suitable to recover the evolution of time-varying parameters and provides the required flexibility to model and forecast time series of counts characterized by convoluted nonlinear dynamics and structural breaks.} We study the asymptotic properties of the TV-PARX model and prove that, under mild conditions, maximum likelihood estimation (MLE) yields strongly consistent and asymptotically normal parameter estimates. Finite-sample performance and forecasting accuracy are evaluated through Monte Carlo simulations. The empirical usefulness of the time-varying specification of the proposed TV-PARX model is shown by analyzing the number of new daily COVID-19 infections in Italy and the number of corporate defaults in the US.

GROUP · Analysis · 可交換的 · 講稿 · 原點 ·

2022 年 7 月 22 日

Cryptanalysis of a system based on Twisted Dihedral Group Algebras

Simran Tinani

Several cryptographic protocols constructed based on less-known algorithmic problems, such as those in non-commutative groups, group rings, semigroups, etc., which claim quantum security, have been broken through classical reduction methods within their specific proposed platforms. A rigorous examination of the complexity of these algorithmic problems is therefore an important topic of research. In this paper, we present a cryptanalysis of a public key exchange system based on a decomposition-type problem in the so-called twisted group algebras of the dihedral group $D_{2n}$ over a finite field $\fq$. Our method of analysis relies on an algebraic reduction of the original problem to a set of equations over $\fq$ involving circulant matrices, and a subsequent solution to these equations. Our attack runs in polynomial time and succeeds with probability at least $90$ percent for the parameter values provided by the authors. We also show that the underlying algorithmic problem, while based on a non-commutative structure, may be formulated as a commutative semigroup action problem.

binary · FPT · 圖 · 約束 · 類別 ·

2022 年 7 月 14 日

Component twin-width as a parameter for BINARY-CSP and its semiring generalisations

Ambroise Baril,Miguel Couceiro,Victor Lagerkvist

from arxiv, 25 pages

We investigate the fine-grained and the parameterized complexity of several generalizations of binary constraint satisfaction problems (BINARY-CSPs), that subsume variants of graph colouring problems. Our starting point is the observation that several algorithmic approaches that resulted in complexity upper bounds for these problems, share a common structure. We thus explore an algebraic approach relying on semirings that unifies different generalizations of BINARY-CSPs (such as the counting, the list, and the weighted versions), and that facilitates a general algorithmic approach to efficiently solving them. The latter is inspired by the (component) twin-width parameter introduced by Bonnet et al., which we generalize via edge-labelled graphs in order to formulate it to arbitrary binary constraints. We consider input instances with bounded component twin-width, as well as constraint templates of bounded component twin-width, and obtain an FPT algorithm as well as an improved, exponential-time algorithm, for broad classes of binary constraints. We illustrate the advantages of this framework by instantiating our general algorithmic approach on several classes of problems (e.g., the $H$-coloring problem and its variants), and showing that it improves the best complexity upper bounds in the literature for several well-known problems.