In this paper, we propose a class of discrete-time approximation schemes for stochastic optimal control problems under the $G$-expectation framework. The proposed schemes are constructed recursively based on piecewise constant policy. We prove the convergence of the discrete schemes and determine the convergence rates. Several numerical examples are presented to illustrate the effectiveness of the obtained results.
We study a new class of NP search problems, those which can be proved total using standard combinatorial reasoning based on approximate counting. Our model for this kind of reasoning is the bounded arithmetic theory $\mathrm{APC}_2$ of [Je\v{r}\'abek 2009]. In particular, the Ramsey and weak pigeonhole search problems lie in the new class. We give a purely computational characterization of this class and show that, relative to an oracle, it does not contain the problem CPLS, a strengthening of PLS. As CPLS is provably total in the theory $T^2_2$, this shows that $\mathrm{APC}_2$ does not prove every $\forall \Sigma^b_1$ sentence which is provable in bounded arithmetic. This answers the question posed in [Buss, Ko{\l}odziejczyk, Thapen 2014] and represents some progress in the programme of separating the levels of the bounded arithmetic hierarchy by low-complexity sentences. Our main technical tool is an extension of the "fixing lemma" from [Pudl\'ak, Thapen 2017], a form of switching lemma, which we use to show that a random partial oracle from a certain distribution will, with high probability, determine an entire computation of a $\textrm{P}^{\textrm{NP}}$ oracle machine. The introduction to the paper is intended to make the statements and context of the results accessible to someone unfamiliar with NP search problems or with bounded arithmetic.
To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, K\"ohler and No\'e introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by general Markov kernels and establish proofs via Radon-Nikodym derivatives which allows to incorporate distributions without densities in a sound way. Further, we generalize the results for sampling from posterior distributions as required in inverse problems. The performance of the proposed conditional stochastic normalizing flow is demonstrated by numerical examples.
In this paper, we propose a deep learning based reduced order modeling method for stochastic underground flow problems in highly heterogeneous media. We aim to utilize supervised learning to build a reduced surrogate model from the stochastic parameter space that characterizes the possible highly heterogeneous media to the solution space of a stochastic flow problem to have fast online simulations. Dominant POD modes obtained from a well-designed spectral problem in a global snapshot space are used to represent the solution of the flow problem. Due to the small dimension of the solution, the complexity of the neural network is significantly reduced. We adopt the generalized multiscale finite element method (GMsFEM), in which a set of local multiscale basis functions that can capture the heterogeneity of the media and source information are constructed to efficiently generate globally defined snapshot space. Rigorous theoretical analyses are provided and extensive numerical experiments for linear and nonlinear stochastic flows are provided to verify the superior performance of the proposed method.
We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic convergence guarantees for stochastic algorithms with iterate averaging, focusing on bounds valid both in expectation and with high probability. The efficiency estimates we obtain clearly decouple the contributions of optimization error, gradient noise, and time drift. Notably, we show that the tracking efficiency of the proximal stochastic gradient method depends only logarithmically on the initialization quality, when equipped with a step-decay schedule. Numerical experiments illustrate our results.
In this work we consider the well-known Secretary Problem -- a number $n$ of elements, each having an adversarial value, are arriving one-by-one according to some random order, and the goal is to choose the highest value element. The decisions are made online and are irrevocable -- if the algorithm decides to choose or not to choose the currently seen element, based on the previously observed values, it cannot change its decision later regarding this element. The measure of success is the probability of selecting the highest value element, minimized over all adversarial assignments of values. We show existential and constructive upper bounds on approximation of the success probability in this problem, depending on the entropy of the randomly chosen arrival order, including the lowest possible entropy $O(\log\log (n))$ for which the probability of success could be constant. We show that below entropy level $\mathcal{H}<0.5\log\log n$, all algorithms succeed with probability $0$ if random order is selected uniformly at random from some subset of permutations, while we are able to construct in polynomial time a non-uniform distribution with entropy $\mathcal{H}$ resulting in success probability of at least $\Omega\left(\frac{1}{(\log\log n +3\log\log\log n -\mathcal{H})^{2+\epsilon}}\right)$, for any constant $\epsilon>0$. We also prove that no algorithm using entropy $\mathcal{H}=O((\log\log n)^a)$ can improve our result by more than polynomially, for any constant $0<a<1$. For entropy $\log\log (n)$ and larger, our analysis precisely quantifies both multiplicative and additive approximation of the success probability. In particular, we improve more than doubly exponentially on the best previously known additive approximation guarantee for the secretary problem.
This paper considers a novel multi-agent linear stochastic approximation algorithm driven by Markovian noise and general consensus-type interaction, in which each agent evolves according to its local stochastic approximation process which depends on the information from its neighbors. The interconnection structure among the agents is described by a time-varying directed graph. While the convergence of consensus-based stochastic approximation algorithms when the interconnection among the agents is described by doubly stochastic matrices (at least in expectation) has been studied, less is known about the case when the interconnection matrix is simply stochastic. For any uniformly strongly connected graph sequences whose associated interaction matrices are stochastic, the paper derives finite-time bounds on the mean-square error, defined as the deviation of the output of the algorithm from the unique equilibrium point of the associated ordinary differential equation. For the case of interconnection matrices being stochastic, the equilibrium point can be any unspecified convex combination of the local equilibria of all the agents in the absence of communication. Both the cases with constant and time-varying step-sizes are considered. In the case when the convex combination is required to be a straight average and interaction between any pair of neighboring agents may be uni-directional, so that doubly stochastic matrices cannot be implemented in a distributed manner, the paper proposes a push-sum-type distributed stochastic approximation algorithm and provides its finite-time bound for the time-varying step-size case by leveraging the analysis for the consensus-type algorithm with stochastic matrices and developing novel properties of the push-sum algorithm.
Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the state-of-the-art.
Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.