We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3$\times$ faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and five baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a $72\%$ higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on two real-world robotic tasks.
We derive optimality conditions for the optimum sample allocation problem in stratified sampling, formulated as the determination of the fixed strata sample sizes that minimize the total cost of the survey, under the assumed level of variance of the stratified $\pi$ estimator of the population total (or mean) and one-sided upper bounds imposed on sample sizes in strata. In this context, we presume that the variance function is of some generic form that, in particular, covers the case of the simple random sampling without replacement design in strata. The optimality conditions mentioned above will be derived from the Karush-Kuhn-Tucker conditions. Based on the established optimality conditions, we provide a formal proof of the optimality of the existing procedure, termed here as LRNA, which solves the allocation problem considered. We formulate the LRNA in such a way that it also provides the solution to the classical optimum allocation problem (i.e. minimization of the estimator's variance under a fixed total cost) under one-sided lower bounds imposed on sample sizes in strata. In this context, the LRNA can be considered as a counterparty to the popular recursive Neyman allocation procedure that is used to solve the classical problem of an optimum sample allocation with added one-sided upper bounds. Ready-to-use R-implementation of the LRNA is available through our stratallo package, which is published on the Comprehensive R Archive Network (CRAN) package repository.
The multiobjective evolutionary optimization algorithm (MOEA) is a powerful approach for tackling multiobjective optimization problems (MOPs), which can find a finite set of approximate Pareto solutions in a single run. However, under mild regularity conditions, the Pareto optimal set of a continuous MOP could be a low dimensional continuous manifold that contains infinite solutions. In addition, structure constraints on the whole optimal solution set, which characterize the patterns shared among all solutions, could be required in many real-life applications. It is very challenging for existing finite population based MOEAs to handle these structure constraints properly. In this work, we propose the first model-based algorithmic framework to learn the whole solution set with structure constraints for multiobjective optimization. In our approach, the Pareto optimality can be traded off with a preferred structure among the whole solution set, which could be crucial for many real-world problems. We also develop an efficient evolutionary learning method to train the set model with structure constraints. Experimental studies on benchmark test suites and real-world application problems demonstrate the promising performance of our proposed framework.
Uniformly valid inference for cointegrated vector autoregressive processes has so far proven difficult due to certain discontinuities arising in the asymptotic distribution of the least squares estimator. We extend asymptotic results from the univariate case to multiple dimensions and show how inference can be based on these results. Furthermore, we show that lag augmentation and a recent instrumental variable procedure can also yield uniformly valid tests and confidence regions. We verify the theoretical findings and investigate finite sample properties in simulation experiments for two specific examples.
We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
Vitrimer is a new class of sustainable polymers with the ability of self-healing through rearrangement of dynamic covalent adaptive networks. However, a limited choice of constituent molecules restricts their property space, prohibiting full realization of their potential applications. Through a combination of molecular dynamics (MD) simulations and machine learning (ML), particularly a novel graph variational autoencoder (VAE) model, we establish a method for generating novel vitrimers and guide their inverse design based on desired glass transition temperature (Tg). We build the first vitrimer dataset of one million and calculate Tg on 8,424 of them by high-throughput MD simulations calibrated by a Gaussian process model. The proposed VAE employs dual graph encoders and a latent dimension overlapping scheme which allows for individual representation of multi-component vitrimers. By constructing a continuous latent space containing necessary information of vitrimers, we demonstrate high accuracy and efficiency of our framework in discovering novel vitrimers with desirable Tg beyond the training regime. The proposed vitrimers with reasonable synthesizability cover a wide range of Tg and broaden the potential widespread usage of vitrimeric materials.
Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.
We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.
We study threshold testing, an elementary probing model with the goal to choose a large value out of $n$ i.i.d. random variables. An algorithm can test each variable $X_i$ once for some threshold $t_i$, and the test returns binary feedback whether $X_i \ge t_i$ or not. Thresholds can be chosen adaptively or non-adaptively by the algorithm. Given the results for the tests of each variable, we then select the variable with highest conditional expectation. We compare the expected value obtained by the testing algorithm with expected maximum of the variables. Threshold testing is a semi-online variant of the gambler's problem and prophet inequalities. Indeed, the optimal performance of non-adaptive algorithms for threshold testing is governed by the standard i.i.d. prophet inequality of approximately $0.745+o(1)$ as $n \to \infty$. We show how adaptive algorithms can significantly improve upon this ratio. Our adaptive testing strategy guarantees a competitive ratio of at least $0.869-o(1)$. Moreover, we show that there are distributions that admit only a constant ratio $c < 1$, even when $n \to \infty$. Finally, when each box can be tested multiple times (with $n$ tests in total), we design an algorithm that achieves a ratio of $1-o(1)$.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.