Convex optimization with equality and inequality constraints is a ubiquitous problem in several optimization and control problems in large-scale systems. Recently there has been a lot of interest in establishing accelerated convergence of the loss function. A class of high-order tuners was recently proposed in an effort to lead to accelerated convergence for the case when no constraints are present. In this paper, we propose a new high-order tuner that can accommodate the presence of equality constraints. In order to accommodate the underlying box constraints, time-varying gains are introduced in the high-order tuner which leverage convexity and ensure anytime feasibility of the constraints. Numerical examples are provided to support the theoretical derivations.
The distributed computation of a Nash equilibrium in aggregative games is gaining increased traction in recent years. Of particular interest is the mediator-free scenario where individual players only access or observe the decisions of their neighbors due to practical constraints. Given the competitive rivalry among participating players, protecting the privacy of individual players becomes imperative when sensitive information is involved. We propose a fully distributed equilibrium-computation approach for aggregative games that can achieve both rigorous differential privacy and guaranteed computation accuracy of the Nash equilibrium. This is in sharp contrast to existing differential-privacy solutions for aggregative games that have to either sacrifice the accuracy of equilibrium computation to gain rigorous privacy guarantees, or allow the cumulative privacy budget to grow unbounded, hence losing privacy guarantees, as iteration proceeds. Our approach uses independent noises across players, thus making it effective even when adversaries have access to all shared messages as well as the underlying algorithm structure. The encryption-free nature of the proposed approach, also ensures efficiency in computation and communication. The approach is also applicable in stochastic aggregative games, able to ensure both rigorous differential privacy and guaranteed computation accuracy of the Nash equilibrium when individual players only have stochastic estimates of their pseudo-gradient mappings. Numerical comparisons with existing counterparts confirm the effectiveness of the proposed approach.
Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.
Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.
In conventional supervised classification, true labels are required for individual instances. However, it could be prohibitive to collect the true labels for individual instances, due to privacy concerns or unaffordable annotation costs. This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances. CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses -- previous research failed to achieve this goal. Practically, our method works by weighing the importance of each label for each instance in the group, which provides purified supervision for the classifier to learn. Theoretically, our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses. Extensive experiments on various problems of CFAO demonstrate the superiority of our proposed method.
Solving combinatorial optimization problems of the kind that can be codified by quadratic unconstrained binary optimization (QUBO) is a promising application of quantum computation. Some problems of this class suitable for practical applications such as the traveling salesman problem (TSP), the bin packing problem (BPP), or the knapsack problem (KP) have inequality constraints that require a particular cost function encoding. The common approach is the use of slack variables to represent the inequality constraints in the cost function. However, the use of slack variables considerably increases the number of qubits and operations required to solve these problems using quantum devices. In this work, we present an alternative method that does not require extra slack variables and consists of using an unbalanced penalization function to represent the inequality constraints in the QUBO. This function is characterized by larger penalization when the inequality constraint is not achieved than when it is. We evaluate our approach on the TSP, BPP, and KP, successfully encoding the optimal solution of the original optimization problem near the ground state cost Hamiltonian. Additionally, we employ D-Wave Advantage and D-Wave hybrid solvers to solve the BPP, surpassing the performance of the slack variables approach by achieving solutions for up to 29 items, whereas the slack variables approach only handles up to 11 items. This new approach can be used to solve combinatorial problems with inequality constraints with a reduced number of resources compared to the slack variables approach using quantum annealing or variational quantum algorithms.
We establish globally optimal solutions to a class of fractional optimization problems on a class of constraint sets, whose key characteristics are as follows: 1) The numerator and the denominator of the objective function are both convex, semi-algebraic, Lipschitz continuous and differentiable with Lipschitz continuous gradients on the constraint set. 2) The constraint set is closed, convex and semi-algebraic. Compared with Dinkelbach's approach, our novelty falls into the following aspects: 1) Dinkelbach's has to solve a concave maximization problem in each iteration, which is nontrivial to obtain a solution, while ours only needs to conduct one proximity gradient operation in each iteration. 2) Dinkelbach's requires at least one nonnegative point for the numerator to proceed the algorithm, but ours does not, which is available to a much wider class of situations. 3) Dinkelbach's requires a closed and bounded constraint set, while ours only needs the closedness but not necessarily the boundedness. Therefore, our approach is viable for many more practical models, like optimizing the Sharpe ratio (SR) or the Information ratio in mathematical finance. Numerical experiments show that our approach achieves the ground-truth solutions in two simple examples. For real-world financial data, it outperforms several existing approaches for SR maximization.
Integer linear programming (ILP) models a wide range of practical combinatorial optimization problems and has significant impacts in industry and management sectors. This work proposes new characterizations of ILP with the concept of boundary solutions. Motivated by the new characterizations, we develop an efficient local search solver, which is the first local search solver for general ILP validated on a large heterogeneous problem dataset. We propose a new local search framework that switches between three modes, namely Search, Improve, and Restore modes. We design tailored operators adapted to different modes, thus improving the quality of the current solution according to different situations. For the Search and Restore modes, we propose an operator named tight move, which adaptively modifies variables' values, trying to make some constraint tight. For the Improve mode, an efficient operator lift move is proposed to improve the quality of the objective function while maintaining feasibility. Putting these together, we develop a local search solver for integer linear programming called Local-ILP. Experiments conducted on the MIPLIB dataset show the effectiveness of our solver in solving large-scale hard integer linear programming problems within a reasonably short time. Local-ILP is competitive and complementary to the state-of-the-art commercial solver Gurobi and significantly outperforms the state-of-the-art non-commercial solver SCIP. Moreover, our solver establishes new records for 6 MIPLIB open instances. The theoretical analysis of our algorithm is also presented, which shows our algorithm could avoid visiting unnecessary regions and also maintain good connectivity of targeted solutions.
Network interference, where the outcome of an individual is affected by the treatment assignment of those in their social network, is pervasive in real-world settings. However, it poses a challenge to estimating causal effects. We consider the task of estimating the total treatment effect (TTE), or the difference between the average outcomes of the population when everyone is treated versus when no one is, under network interference. Under a Bernoulli randomized design, we provide an unbiased estimator for the TTE when network interference effects are constrained to low order interactions among neighbors of an individual. We make no assumptions on the graph other than bounded degree, allowing for well-connected networks that may not be easily clustered. We derive a bound on the variance of our estimator and show in simulated experiments that it performs well compared with standard estimators for the TTE. We also derive a minimax lower bound on the mean squared error of our estimator which suggests that the difficulty of estimation can be characterized by the degree of interactions in the potential outcomes model. We also prove that our estimator is asymptotically normal under boundedness conditions on the network degree and potential outcomes model. Central to our contribution is a new framework for balancing model flexibility and statistical complexity as captured by this low order interactions structure.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.