This paper deals with the scenario approach to robust optimization. This relies on a random sampling of the possibly infinite number of constraints induced by uncertainties in the parameters of an optimization problem. Solving the resulting random program yields a solution for which the quality is measured in terms of the probability of violating the constraints for a random value of the uncertainties, typically unseen before. Another central issue is the determination of the sample complexity, i.e., the number of random constraints (or scenarios) that one must consider in order to guarantee a certain level of reliability. In this paper, we introduce the notion of margin to improve upon standard results in this field. In particular, using tools from statistical learning theory, we show that the sample complexity of a class of random programs does not explicitly depend on the number of variables. In addition, within the considered class, that includes polynomial constraints among others, this result holds for both convex and nonconvex instances with the same level of guarantees. We also derive a posteriori bounds on the probability of violation and sketch a regularization approach that could be used to improve the reliability of computed solutions on the basis of these bounds.
A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth (nonconvex) optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform a projected-(stochastic)-gradient method.
We present algorithms based on satisfiability problem (SAT) solving, as well as answer set programming (ASP), for solving the problem of determining inconsistency degrees in propositional knowledge bases. We consider six different inconsistency measures whose respective decision problems lie on the first level of the polynomial hierarchy. Namely, these are the contension inconsistency measure, the forgetting-based inconsistency measure, the hitting set inconsistency measure, the max-distance inconsistency measure, the sum-distance inconsistency measure, and the hit-distance inconsistency measure. In an extensive experimental analysis, we compare the SAT-based and ASP-based approaches with each other, as well as with a set of naive baseline algorithms. Our results demonstrate that overall, both the SAT-based and the ASP-based approaches clearly outperform the naive baseline methods in terms of runtime. The results further show that the proposed ASP-based approaches perform superior to the SAT-based ones with regard to all six inconsistency measures considered in this work. Moreover, we conduct additional experiments to explain the aforementioned results in greater detail.
To estimate causal effects, analysts performing observational studies in health settings utilize several strategies to mitigate bias due to confounding by indication. There are two broad classes of approaches for these purposes: use of confounders and instrumental variables (IVs). Because such approaches are largely characterized by untestable assumptions, analysts must operate under an indefinite paradigm that these methods will work imperfectly. In this tutorial, we formalize a set of general principles and heuristics for estimating causal effects in the two approaches when the assumptions are potentially violated. This crucially requires reframing the process of observational studies as hypothesizing potential scenarios where the estimates from one approach are less inconsistent than the other. While most of our discussion of methodology centers around the linear setting, we touch upon complexities in non-linear settings and flexible procedures such as target minimum loss-based estimation (TMLE) and double machine learning (DML). To demonstrate the application of our principles, we investigate the use of donepezil off-label for mild cognitive impairment (MCI). We compare and contrast results from confounder and IV methods, traditional and flexible, within our analysis and to a similar observational study and clinical trial.
We present a dimension-incremental algorithm for the nonlinear approximation of high-dimensional functions in an arbitrary bounded orthonormal product basis. Our goal is to detect a suitable truncation of the basis expansion of the function, where the corresponding basis support is assumed to be unknown. Our method is based on point evaluations of the considered function and adaptively builds an index set of a suitable basis support such that the approximately largest basis coefficients are still included. For this purpose, the algorithm only needs a suitable search space that contains the desired index set. Throughout the work, there are various minor modifications of the algorithm discussed as well, which may yield additional benefits in several situations. For the first time, we provide a proof of a detection guarantee for such an index set in the function approximation case under certain assumptions on the sub-methods used within our algorithm, which can be used as a foundation for similar statements in various other situations as well. Some numerical examples in different settings underline the effectiveness and accuracy of our method.
This paper presents an accelerated proximal gradient method for multiobjective optimization, in which each objective function is the sum of a continuously differentiable, convex function and a closed, proper, convex function. Extending first-order methods for multiobjective problems without scalarization has been widely studied, but providing accelerated methods with accurate proofs of convergence rates remains an open problem. Our proposed method is a multiobjective generalization of the accelerated proximal gradient method, also known as the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), for scalar optimization. The key to this successful extension is solving a subproblem with terms exclusive to the multiobjective case. This approach allows us to demonstrate the global convergence rate of the proposed method ($O(1 / k^2)$), using a merit function to measure the complexity. Furthermore, we present an efficient way to solve the subproblem via its dual representation, and we confirm the validity of the proposed method through some numerical experiments.
Survival analysis is a crucial semi-supervised task in machine learning with numerous real-world applications, particularly in healthcare. Currently, the most common approach to survival analysis is based on Cox's partial likelihood, which can be interpreted as a ranking model optimized on a lower bound of the concordance index. This relation between ranking models and Cox's partial likelihood considers only pairwise comparisons. Recent work has developed differentiable sorting methods which relax this pairwise independence assumption, enabling the ranking of sets of samples. However, current differentiable sorting methods cannot account for censoring, a key factor in many real-world datasets. To address this limitation, we propose a novel method called Diffsurv. We extend differentiable sorting methods to handle censored tasks by predicting matrices of possible permutations that take into account the label uncertainty introduced by censored samples. We contrast this approach with methods derived from partial likelihood and ranking losses. Our experiments show that Diffsurv outperforms established baselines in various simulated and real-world risk prediction scenarios. Additionally, we demonstrate the benefits of the algorithmic supervision enabled by Diffsurv by presenting a novel method for top-k risk prediction that outperforms current methods.
The scene graph is a new data structure describing objects and their pairwise relationship within image scenes. As the size of scene graph in vision applications grows, how to losslessly and efficiently store such data on disks or transmit over the network becomes an inevitable problem. However, the compression of scene graph is seldom studied before because of the complicated data structures and distributions. Existing solutions usually involve general-purpose compressors or graph structure compression methods, which is weak at reducing redundancy for scene graph data. This paper introduces a new lossless compression framework with adaptive predictors for joint compression of objects and relations in scene graph data. The proposed framework consists of a unified prior extractor and specialized element predictors to adapt for different data elements. Furthermore, to exploit the context information within and between graph elements, Graph Context Convolution is proposed to support different graph context modeling schemes for different graph elements. Finally, a learned distribution model is devised to predict numerical data under complicated conditional constraints. Experiments conducted on labeled or generated scene graphs proves the effectiveness of the proposed framework in scene graph lossless compression task.
We consider the computations of the action ground state for a rotating nonlinear Schr\"odinger equation. It reads as a minimization of the action functional under the Nehari constraint. In the focusing case, we identify an equivalent formulation of the problem which simplifies the constraint. Based on it, we propose a normalized gradient flow method with asymptotic Lagrange multiplier and establish the energy-decaying property. Popular optimization methods are also applied to gain more efficiency. In the defocusing case, we prove that the ground state can be obtained by the unconstrained minimization. Then the direct gradient flow method and unconstrained optimization methods are applied. Numerical experiments show the convergence and accuracy of the proposed methods in both cases, and comparisons on the efficiency are discussed. Finally, the relation between the action and the energy ground states are numerically investigated.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.