We consider the problem of controlling a mutated diffusion process with an unknown mutation time. The problem is formulated as the quickest intervention problem with the mutation modeled by a change-point, which is a generalization of the quickest change-point detection (QCD). Our goal is to intervene in the mutated process as soon as possible while maintaining a low intervention cost with optimally chosen intervention actions. This model and the proposed algorithms can be applied to pandemic prevention (such as Covid-19) or misinformation containment. We formulate the problem as a partially observed Markov decision process (POMDP) and convert it to an MDP through the belief state of the change-point. We first propose a grid approximation approach to calculate the optimal intervention policy, whose computational complexity could be very high when the number of grids is large. In order to reduce the computational complexity, we further propose a low-complexity threshold-based policy through the analysis of the first-order approximation of the value functions in the ``local intervention'' regime. Simulation results show the low-complexity algorithm has a similar performance as the grid approximation and both perform much better than the QCD-based algorithms.
We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks.
Continual learning is a learning paradigm that learns tasks sequentially with resources constraints, in which the key challenge is stability-plasticity dilemma, i.e., it is uneasy to simultaneously have the stability to prevent catastrophic forgetting of old tasks and the plasticity to learn new tasks well. In this paper, we propose a new continual learning approach, Advanced Null Space (AdNS), to balance the stability and plasticity without storing any old data of previous tasks. Specifically, to obtain better stability, AdNS makes use of low-rank approximation to obtain a novel null space and projects the gradient onto the null space to prevent the interference on the past tasks. To control the generation of the null space, we introduce a non-uniform constraint strength to further reduce forgetting. Furthermore, we present a simple but effective method, intra-task distillation, to improve the performance of the current task. Finally, we theoretically find that null space plays a key role in plasticity and stability, respectively. Experimental results show that the proposed method can achieve better performance compared to state-of-the-art continual learning approaches.
Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicines. Biomanufacturing processes require close monitoring and control, in the presence of complex bioprocess dynamics with many interdependent factors, as well as extremely limited data due to the high cost of experiments as well as the novelty of personalized bio-drugs. We develop a novel model-based reinforcement learning framework that can achieve human-level control in low-data environments. The model uses a dynamic Bayesian network to capture causal interdependencies between factors and predict how the effects of different inputs propagate through the pathways of the bioprocess mechanisms. This enables the design of process control policies that are both interpretable and robust against model risk. We present a computationally efficient, provably convergence stochastic gradient method for optimizing such policies. Validation is conducted on a realistic application with a multi-dimensional, continuous state variable.
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples.
We develop a new approach to drifting games, a class of two-person games with many applications to boosting and online learning settings, including Prediction with Expert Advice and the Hedge game. Our approach involves (a) guessing an asymptotically optimal potential by solving an associated partial differential equation (PDE); then (b) justifying the guess, by proving upper and lower bounds on the final-time loss whose difference scales like a negative power of the number of time steps. The proofs of our potential-based upper bounds are elementary, using little more than Taylor expansion. The proofs of our potential-based lower bounds are also rather elementary, combining Taylor expansion with probabilistic or combinatorial arguments. Most previous work on asymptotically optimal strategies has used potentials obtained by solving a discrete dynamic programming principle; the arguments are complicated by their discrete nature. Our approach is facilitated by the fact that the potentials we use are explicit solutions of PDEs; the arguments are based on basic calculus. Not only is our approach more elementary, but we give new potentials and derive corresponding upper and lower bounds that match each other in the asymptotic regime.
The estimation of absorption time distributions of Markov jump processes is an important task in various branches of statistics and applied probability. While the time-homogeneous case is classic, the time-inhomogeneous case has recently received increased attention due to its added flexibility and advances in computational power. However, commuting sub-intensity matrices are assumed, which in various cases limits the parsimonious properties of the resulting representation. This paper develops the theory required to solve the general case through maximum likelihood estimation, and in particular, using the expectation-maximization algorithm. A reduction to a piecewise constant intensity matrix function is proposed in order to provide succinct representations, where a parametric linear model binds the intensities together. Practical aspects are discussed and illustrated through the estimation of notoriously demanding theoretical distributions and real data, from the perspective of matrix analytic methods.
Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.
In this paper we propose a new time-varying econometric model, called Time-Varying Poisson AutoRegressive with eXogenous covariates (TV-PARX), suited to model and forecast time series of counts. {We show that the score-driven framework is particularly suitable to recover the evolution of time-varying parameters and provides the required flexibility to model and forecast time series of counts characterized by convoluted nonlinear dynamics and structural breaks.} We study the asymptotic properties of the TV-PARX model and prove that, under mild conditions, maximum likelihood estimation (MLE) yields strongly consistent and asymptotically normal parameter estimates. Finite-sample performance and forecasting accuracy are evaluated through Monte Carlo simulations. The empirical usefulness of the time-varying specification of the proposed TV-PARX model is shown by analyzing the number of new daily COVID-19 infections in Italy and the number of corporate defaults in the US.
Several cryptographic protocols constructed based on less-known algorithmic problems, such as those in non-commutative groups, group rings, semigroups, etc., which claim quantum security, have been broken through classical reduction methods within their specific proposed platforms. A rigorous examination of the complexity of these algorithmic problems is therefore an important topic of research. In this paper, we present a cryptanalysis of a public key exchange system based on a decomposition-type problem in the so-called twisted group algebras of the dihedral group $D_{2n}$ over a finite field $\fq$. Our method of analysis relies on an algebraic reduction of the original problem to a set of equations over $\fq$ involving circulant matrices, and a subsequent solution to these equations. Our attack runs in polynomial time and succeeds with probability at least $90$ percent for the parameter values provided by the authors. We also show that the underlying algorithmic problem, while based on a non-commutative structure, may be formulated as a commutative semigroup action problem.
We investigate the fine-grained and the parameterized complexity of several generalizations of binary constraint satisfaction problems (BINARY-CSPs), that subsume variants of graph colouring problems. Our starting point is the observation that several algorithmic approaches that resulted in complexity upper bounds for these problems, share a common structure. We thus explore an algebraic approach relying on semirings that unifies different generalizations of BINARY-CSPs (such as the counting, the list, and the weighted versions), and that facilitates a general algorithmic approach to efficiently solving them. The latter is inspired by the (component) twin-width parameter introduced by Bonnet et al., which we generalize via edge-labelled graphs in order to formulate it to arbitrary binary constraints. We consider input instances with bounded component twin-width, as well as constraint templates of bounded component twin-width, and obtain an FPT algorithm as well as an improved, exponential-time algorithm, for broad classes of binary constraints. We illustrate the advantages of this framework by instantiating our general algorithmic approach on several classes of problems (e.g., the $H$-coloring problem and its variants), and showing that it improves the best complexity upper bounds in the literature for several well-known problems.