Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.
Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the $\ell_1$ penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.
The most scalable approaches to certifying neural network robustness depend on computing sound linear lower and upper bounds for the network's activation functions. Current approaches are limited in that the linear bounds must be handcrafted by an expert, and can be sub-optimal, especially when the network's architecture composes operations using, for example, multiplication such as in LSTMs and the recently popular Swish activation. The dependence on an expert prevents the application of robustness certification to developments in the state-of-the-art of activation functions, and furthermore the lack of tightness guarantees may give a false sense of insecurity about a particular model. To the best of our knowledge, we are the first to consider the problem of automatically computing tight linear bounds for arbitrary n-dimensional activation functions. We propose LinSyn, the first approach that achieves tight bounds for any arbitrary activation function, while only leveraging the mathematical definition of the activation function itself. Our approach leverages an efficient heuristic approach to synthesize bounds that are tight and usually sound, and then verifies the soundness (and adjusts the bounds if necessary) using the highly optimized branch-and-bound SMT solver, dReal. Even though our approach depends on an SMT solver, we show that the runtime is reasonable in practice, and, compared with state of the art, our approach often achieves 2-5X tighter final output bounds and more than quadruple certified robustness.
We are concerned with the problem of decomposing the parameter space of a parametric system of polynomial equations, and possibly some polynomial inequality constraints, with respect to the number of real solutions that the system attains. Previous studies apply a two step approach to this problem, where first the discriminant variety of the system is computed via a Groebner Basis (GB), and then a Cylindrical Algebraic Decomposition (CAD) of this is produced to give the desired computation. However, even on some reasonably small applied examples this process is too expensive, with computation of the discriminant variety alone infeasible. In this paper we develop new approaches to build the discriminant variety using resultant methods (the Dixon resultant and a new method using iterated univariate resultants). This reduces the complexity compared to GB and allows for a previous infeasible example to be tackled. We demonstrate the benefit by giving a symbolic solution to a problem from population dynamics - the analysis of the steady states of three connected populations which exhibit Allee effects - which previously could only be tackled numerically.
Phase-type (PH) distributions are a popular tool for the analysis of univariate risks in numerous actuarial applications. Their multivariate counterparts (MPH$^\ast$), however, have not seen such a proliferation, due to lack of explicit formulas and complicated estimation procedures. A simple construction of multivariate phase-type distributions -- mPH -- is proposed for the parametric description of multivariate risks, leading to models of considerable probabilistic flexibility and statistical tractability. The main idea is to start different Markov processes at the same state, and allow them to evolve independently thereafter, leading to dependent absorption times. By dimension augmentation arguments, this construction can be cast into the umbrella of MPH$^\ast$ class, but enjoys explicit formulas which the general specification lacks, including common measures of dependence. Moreover, it is shown that the class is still rich enough to be dense on the set of multivariate risks supported on the positive orthant, and it is the smallest known sub-class to have this property. In particular, the latter result provides a new short proof of the denseness of the MPH$^\ast$ class. In practice this means that the mPH class allows for modeling of bivariate risks with any given correlation or copula. We derive an EM algorithm for its statistical estimation, and illustrate it on bivariate insurance data. Extensions to more general settings are outlined.
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given the fundamental difficulty of calculating a Nash equilibrium (NE), we instead aim at finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents' strategies. We propose an algorithm in which each agent independently runs optimistic V-learning (a variant of Q-learning) to efficiently explore the unknown environment, while using a stabilized online mirror descent (OMD) subroutine for policy updates. We show that the agents can find an $\epsilon$-approximate CCE in at most $\widetilde{O}( H^6S A /\epsilon^2)$ episodes, where $S$ is the number of states, $A$ is the size of the largest individual action space, and $H$ is the length of an episode. This appears to be the first sample complexity result for learning in generic general-sum Markov games. Our results rely on a novel investigation of an anytime high-probability regret bound for OMD with a dynamic learning rate and weighted regret, which would be of independent interest. One key feature of our algorithm is that it is fully \emph{decentralized}, in the sense that each agent has access to only its local information, and is completely oblivious to the presence of others. This way, our algorithm can readily scale up to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.
In recent years, the literature on Bayesian high-dimensional variable selection has rapidly grown. It is increasingly important to understand whether these Bayesian methods can consistently estimate the model parameters. To this end, shrinkage priors are useful for identifying relevant signals in high-dimensional data. For multivariate linear regression models with Gaussian response variables, Bai and Ghosh (2018) proposed a multivariate Bayesian model with shrinkage priors (MBSP) for estimation and variable selection in high-dimensional settings. However, the proofs of posterior consistency for the MBSP method (Theorems 3 and 4 of Bai and Ghosh (2018) were incorrect. In this paper, we provide a corrected proof of Theorems 3 and 4 of Bai and Ghosh (2018). We leverage these new proofs to extend the MBSP model to multivariate generalized linear models (GLMs). Under our proposed model (MBSP-GLM), multiple responses belonging to the exponential family are simultaneously modeled and mixed-type responses are allowed. We show that the MBSP-GLM model achieves strong posterior consistency when $p$ grows at a subexponential rate with $n$. Furthermore, we quantify the posterior contraction rate at which the posterior shrinks around the true regression coefficients and allow the dimension of the responses $q$ to grow as $n$ grows. Thus, we strengthen the previous results on posterior consistency, which did not provide rate results. This greatly expands the scope of the MBSP model to include response variables of many data types, including binary and count data. To the best of our knowledge, this is the first posterior contraction result for multivariate Bayesian GLMs.
Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some choices of the penalty function. Moreover, we prove that these estimators consistently select variables with probability tending to 1 and derive their asymptotic distribution.
Yedidia, Freeman, Weiss have shown in their reference article, "Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms", that there is a variational principle underlying the General Belief Propagation, by introducing a region-based free energy approximation of the MaxEnt free energy, that we will call the Generalized Bethe free energy. They sketched a proof that fixed points of the General Belief Propagation are critical points of this free energy, this proof was completed in the thesis of Peltre. In this paper we identify a class of optimization problems defined as patching local optimization problems and associated message passing algorithms for which such correspondence between critical points and fix points of the algorithms holds. This framework holds many applications one of which being a PCA for filtered data and a region-based approximation of MaxEnT with stochastic compatibility constraints on the region probabilities. Such approach is particularly adapted for inference with multimodal integration, inference on scenes with multiple views.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.