Continuous DR-submodular functions are a class of generally non-convex/non-concave functions that satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions. Existing work has studied monotone continuous DR-submodular maximization subject to a convex constraint and provided efficient algorithms with approximation guarantees. In many applications, such as computing the stability number of a graph, the monotone DR-submodular objective function has the additional property of being strongly concave along non-negative directions (i.e., strongly DR-submodular). In this paper, we consider a subclass of $L$-smooth monotone DR-submodular functions that are strongly DR-submodular and have a bounded curvature, and we show how to exploit such additional structure to obtain faster algorithms with stronger guarantees for the maximization problem. We propose a new algorithm that matches the provably optimal $1-\frac{c}{e}$ approximation ratio after only $\lceil\frac{L}{\mu}\rceil$ iterations, where $c\in[0,1]$ and $\mu\geq 0$ are the curvature and the strong DR-submodularity parameter. Furthermore, we study the Projected Gradient Ascent (PGA) method for this problem, and provide a refined analysis of the algorithm with an improved $\frac{1}{1+c}$ approximation ratio (compared to $\frac{1}{2}$ in prior works) and a linear convergence rate. Experimental results illustrate and validate the efficiency and effectiveness of our proposed algorithms.
In this paper we propose a deep learning based numerical scheme for strongly coupled FBSDE, stemming from stochastic control. It is a modification of the deep BSDE method in which the initial value to the backward equation is not a free parameter, and with a new loss function being the weighted sum of the cost of the control problem, and a variance term which coincides with the means square error in the terminal condition. We show by a numerical example that a direct extension of the classical deep BSDE method to FBSDE, fails for a simple linear-quadratic control problem, and motivate why the new method works. Under regularity and boundedness assumptions on the exact controls of time continuous and time discrete control problems we provide an error analysis for our method. We show empirically that the method converges for three different problems, one being the one that failed for a direct extension of the deep BSDE method.
In 1979, Miller proved that for a group $G$ of odd order, two minimal group codes in $\mathbb{F}_2G$ are $G$-equivalent if and only they have identical weight distribution. In 2014, Ferraz-Guerreiro-Polcino Milies disprove Miller's result by giving an example of two non-$G$-equivalent minimal codes with identical weight distribution. In this paper, we give a characterization of finite abelian groups so that over a specific set of group codes, equality of important parameters of two codes implies the $G$-equivalence of these two codes. As a corollary, we prove that two minimal codes with the same weight distribution are $G$-equivalent if and only if for each prime divisor $p$ of $|G|$, the Sylow $p$-subgroup of $G$ is homocyclic.
Motivated by applications to the theory of rank-metric codes, we study the problem of estimating the number of common complements of a family of subspaces over a finite field in terms of the cardinality of the family and its intersection structure. We derive upper and lower bounds for this number, along with their asymptotic versions as the field size tends to infinity. We then use these bounds to describe the general behaviour of common complements with respect to sparseness and density, showing that the decisive property is whether or not the number of spaces to be complemented is negligible with respect to the field size. By specializing our results to matrix spaces, we obtain upper and lower bounds for the number of MRD codes in the rank metric. In particular, we answer an open question in coding theory, proving that MRD codes are sparse for all parameter sets as the field size grows, with only very few exceptions. We also investigate the density of MRD codes as their number of columns tends to infinity, obtaining a new asymptotic bound. Using properties of the Euler function from number theory, we then show that our bound improves on known results for most parameter sets. We conclude the paper by establishing general structural properties of the density function of rank-metric codes.
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.
Stochastic PDE eigenvalue problems often arise in the field of uncertainty quantification, whereby one seeks to quantify the uncertainty in an eigenvalue, or its eigenfunction. In this paper we present an efficient multilevel quasi-Monte Carlo (MLQMC) algorithm for computing the expectation of the smallest eigenvalue of an elliptic eigenvalue problem with stochastic coefficients. Each sample evaluation requires the solution of a PDE eigenvalue problem, and so tackling this problem in practice is notoriously computationally difficult. We speed up the approximation of this expectation in four ways: we use a multilevel variance reduction scheme to spread the work over a hierarchy of FE meshes and truncation dimensions; we use QMC methods to efficiently compute the expectations on each level; we exploit the smoothness in parameter space and reuse the eigenvector from a nearby QMC point to reduce the number of iterations of the eigensolver; and we utilise a two-grid discretisation scheme to obtain the eigenvalue on the fine mesh with a single linear solve. The full error analysis of a basic MLQMC algorithm is given in the companion paper [Gilbert and Scheichl, 2022], and so in this paper we focus on how to further improve the efficiency and provide theoretical justification for using nearby QMC points and two-grid methods. Numerical results are presented that show the efficiency of our algorithm, and also show that the four strategies we employ are complementary.
Stochastic PDE eigenvalue problems are useful models for quantifying the uncertainty in several applications from the physical sciences and engineering, e.g., structural vibration analysis, the criticality of a nuclear reactor or photonic crystal structures. In this paper we present a simple multilevel quasi-Monte Carlo (MLQMC) method for approximating the expectation of the minimal eigenvalue of an elliptic eigenvalue problem with coefficients that are given as a series expansion of countably-many stochastic parameters. The MLQMC algorithm is based on a hierarchy of discretisations of the spatial domain and truncations of the dimension of the stochastic parameter domain. To approximate the expectations, randomly shifted lattice rules are employed. This paper is primarily dedicated to giving a rigorous analysis of the error of this algorithm. A key step in the error analysis requires bounds on the mixed derivatives of the eigenfunction with respect to both the stochastic and spatial variables simultaneously. Under stronger smoothness assumptions on the parametric dependence, our analysis also extends to multilevel higher-order quasi-Monte Carlo rules. An accompanying paper [Gilbert and Scheichl, 2022], focusses on practical extensions of the MLQMC algorithm to improve efficiency, and presents numerical results.
The principle of majorization-minimization (MM) provides a general framework for eliciting effective algorithms to solve optimization problems. However, they often suffer from slow convergence, especially in large-scale and high-dimensional data settings. This has drawn attention to acceleration schemes designed exclusively for MM algorithms, but many existing designs are either problem-specific or rely on approximations and heuristics loosely inspired by the optimization literature. We propose a novel, rigorous quasi-Newton method for accelerating any valid MM algorithm, cast as seeking a fixed point of the MM \textit{algorithm map}. The method does not require specific information or computation from the objective function or its gradient and enjoys a limited-memory variant amenable to efficient computation in high-dimensional settings. By connecting our approach to Broyden's classical root-finding methods, we establish convergence guarantees and identify conditions for linear and super-linear convergence. These results are validated numerically and compared to peer methods in a thorough empirical study, showing that it achieves state-of-the-art performance across a diverse range of problems.
Statistical depths provide a fundamental generalization of quantiles and medians to data in higher dimensions. This paper proposes a new type of globally defined statistical depth, based upon control theory and eikonal equations, which measures the smallest amount of probability density that has to be passed through in a path to points outside the support of the distribution: for example spatial infinity. This depth is easy to interpret and compute, expressively captures multi-modal behavior, and extends naturally to data that is non-Euclidean. We prove various properties of this depth, and provide discussion of computational considerations. In particular, we demonstrate that this notion of depth is robust under an aproximate isometrically constrained adversarial model, a property which is not enjoyed by the Tukey depth. Finally we give some illustrative examples in the context of two-dimensional mixture models and MNIST.
Hamilton and Moitra (2021) showed that, in certain regimes, it is not possible to accelerate Riemannian gradient descent in the hyperbolic plane if we restrict ourselves to algorithms which make queries in a (large) bounded domain and which receive gradients and function values corrupted by a (small) amount of noise. We show that acceleration remains unachievable for any deterministic algorithm which receives exact gradient and function-value information (unbounded queries, no noise). Our results hold for the classes of strongly and nonstrongly geodesically convex functions, and for a large class of Hadamard manifolds including hyperbolic spaces and the symmetric space $\mathrm{SL}(n) / \mathrm{SO}(n)$ of positive definite $n \times n$ matrices of determinant one. This cements a surprising gap between the complexity of convex optimization and geodesically convex optimization: for hyperbolic spaces, Riemannian gradient descent is optimal on the class of smooth and and strongly geodesically convex functions, in the regime where the condition number scales with the radius of the optimization domain. The key idea for proving the lower bound consists of perturbing the hard functions of Hamilton and Moitra (2021) with sums of bump functions chosen by a resisting oracle.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.