In real-world decision-making, uncertainty is important yet difficult to handle. Stochastic dominance provides a theoretically sound approach for comparing uncertain quantities, but optimization with stochastic dominance constraints is often computationally expensive, which limits practical applicability. In this paper, we develop a simple yet efficient approach for the problem, the Light Stochastic Dominance Solver (light-SD), that leverages useful properties of the Lagrangian. We recast the inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability and leads to tractable updates or even closed-form solutions for gradient calculations. We prove convergence of the algorithm and test it empirically. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
Graph Neural Networks (GNNs) have received extensive research attention for their promising performance in graph machine learning. Despite their extraordinary predictive accuracy, existing approaches, such as GCN and GPRGNN, are not robust in the face of homophily changes on test graphs, rendering these models vulnerable to graph structural attacks and with limited capacity in generalizing to graphs of varied homophily levels. Although many methods have been proposed to improve the robustness of GNN models, most of these techniques are restricted to the spatial domain and employ complicated defense mechanisms, such as learning new graph structures or calculating edge attentions. In this paper, we study the problem of designing simple and robust GNN models in the spectral domain. We propose EvenNet, a spectral GNN corresponding to an even-polynomial graph filter. Based on our theoretical analysis in both spatial and spectral domains, we demonstrate that EvenNet outperforms full-order models in generalizing across homophilic and heterophilic graphs, implying that ignoring odd-hop neighbors improves the robustness of GNNs. We conduct experiments on both synthetic and real-world datasets to demonstrate the effectiveness of EvenNet. Notably, EvenNet outperforms existing defense models against structural attacks without introducing additional computational costs and maintains competitiveness in traditional node classification tasks on homophilic and heterophilic graphs.
Tabular data is prevalent in many high stakes domains, such as financial services or public policy. Gradient boosted decision trees (GBDT) are popular in these settings due to performance guarantees and low cost. However, in consequential decision-making fairness is a foremost concern. Despite GBDT's popularity, existing in-processing Fair ML methods are either inapplicable to GBDT, or incur in significant train time overhead, or are inadequate for problems with high class imbalance -- a typical issue in these domains. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we have to employ a "proxy-Lagrangian" formulation using smooth convex error rate proxies to enable gradient-based optimization. Our implementation shows an order of magnitude speedup in training time when compared with related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
When training Convolutional Neural Networks (CNNs) there is a large emphasis on creating efficient optimization algorithms and highly accurate networks. The state-of-the-art method of optimizing the networks is done by using gradient descent algorithms, such as Stochastic Gradient Descent (SGD). However, there are some limitations presented when using gradient descent methods. The major drawback is the lack of exploration, and over-reliance on exploitation. Hence, this research aims to analyze an alternative approach to optimizing neural network (NN) weights, with the use of population-based metaheuristic algorithms. A hybrid between Grey Wolf Optimizer (GWO) and Genetic Algorithms (GA) is explored, in conjunction with SGD; producing a Genetically Modified Wolf optimization algorithm boosted with SGD (GMW-SGD). This algorithm allows for a combination between exploitation and exploration, whilst also tackling the issue of high-dimensionality, affecting the performance of standard metaheuristic algorithms. The proposed algorithm was trained and tested on CIFAR-10 where it performs comparably to the SGD algorithm, reaching high test accuracy, and significantly outperforms standard metaheuristic algorithms.
Many real-world applications of reinforcement learning (RL) require making decisions in continuous action environments. In particular, determining the optimal dose level plays a vital role in developing medical treatment regimes. One challenge in adapting existing RL algorithms to medical applications, however, is that the popular infinite support stochastic policies, e.g., Gaussian policy, may assign riskily high dosages and harm patients seriously. Hence, it is important to induce a policy class whose support only contains near-optimal actions, and shrink the action-searching area for effectiveness and reliability. To achieve this, we develop a novel \emph{quasi-optimal learning algorithm}, which can be easily optimized in off-policy settings with guaranteed convergence under general function approximations. Theoretically, we analyze the consistency, sample complexity, adaptability, and convergence of the proposed algorithm. We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.
We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons, which allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also introduce two approaches for handling the inherent uncertainty of learning from data. First, we characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and propose a more flexible formulation to deal with low-density regions and high-dimensional datasets. Then, we propose an ensemble learning approach that enforces constraint satisfaction over multiple bootstrapped estimators or multiple algorithms. In combination with domain-driven components, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both World Food Programme planning and chemotherapy optimization. The case studies illustrate the framework's ability to generate high-quality prescriptions as well as the value added by the trust region, the use of ensembles to control model robustness, the consideration of multiple machine learning methods, and the inclusion of multiple learned constraints.
In this paper, I try to tame "Basu's elephants" (data with extreme selection on observables). I propose new practical large-sample and finite-sample methods for estimating and inferring heterogeneous causal effects (under unconfoundedness) in the empirically relevant context of limited overlap. I develop a general principle called "Stable Probability Weighting" (SPW) that can be used as an alternative to the widely used Inverse Probability Weighting (IPW) technique, which relies on strong overlap. I show that IPW (or its augmented version), when valid, is a special case of the more general SPW (or its doubly robust version), which adjusts for the extremeness of the conditional probabilities of the treatment states. The SPW principle can be implemented using several existing large-sample parametric, semiparametric, and nonparametric procedures for conditional moment models. In addition, I provide new finite-sample results that apply when unconfoundedness is plausible within fine strata. Since IPW estimation relies on the problematic reciprocal of the estimated propensity score, I develop a "Finite-Sample Stable Probability Weighting" (FPW) set-estimator that is unbiased in a sense. I also propose new finite-sample inference methods for testing a general class of weak null hypotheses. The associated computationally convenient methods, which can be used to construct valid confidence sets and to bound the finite-sample confidence distribution, are of independent interest. My large-sample and finite-sample frameworks extend to the setting of multivalued treatments.
Thanks to their easy implementation via Radial Basis Functions (RBFs), meshfree kernel methods have been proved to be an effective tool for e.g. scattered data interpolation, PDE collocation, classification and regression tasks. Their accuracy might depend on a length scale hyperparameter, which is often tuned via cross validation schemes. Here we leverage approaches and tools from the machine learning community to introduce two-layered kernel machines, which generalize the classical RBF approaches that rely on a single hyperparameter. Indeed, the proposed learning strategy returns a kernel that is optimized not only in the Euclidean directions, but that further incorporates kernel rotations. The kernel optimization is shown to be robust by using recently improved calculations of cross validation scores. Finally, the use of greedy approaches, and specifically of the Vectorial Kernel Orthogonal Greedy Algorithm (VKOGA), allows us to construct an optimized basis that adapts to the data. Beyond a rigorous analysis on the convergence of the so-constructed two-Layered (2L)-VKOGA, its benefits are highlighted on both synthesized and real benchmark data sets.
Tuning optimizer hyperparameters, notably the learning rate to a particular optimization instance, is an important but nonconvex problem. Therefore iterative optimization methods such as hypergradient descent lack global optimality guarantees in general. We propose an online nonstochastic control methodology for mathematical optimization. The choice of hyperparameters for gradient based methods, including the learning rate, momentum parameter and preconditioner, is described as feedback control. The optimal solution to this control problem is shown to encompass preconditioned adaptive gradient methods with varying acceleration and momentum parameters. Although the optimal control problem by itself is nonconvex, we show how recent methods from online nonstochastic control based on convex relaxation can be applied to compete with the best offline solution. This guarantees that in episodic optimization, we converge to the best optimization method in hindsight.
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.