This work studies minimization problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth and possibly satisfies additional properties. We consider two kinds of zero-order projected gradient descent algorithms, which differ in the form of the gradient estimator. The first algorithm uses a gradient estimator based on randomization over the $\ell_2$ sphere due to Bach and Perchet (2016). We present an improved analysis of this algorithm on the class of highly smooth and strongly convex functions studied in the prior work, and we derive rates of convergence for two more general classes of non-convex functions. Namely, we consider highly smooth functions satisfying the Polyak-{\L}ojasiewicz condition and the class of highly smooth functions with no additional property. The second algorithm is based on randomization over the $\ell_1$ sphere, and it extends to the highly smooth setting the algorithm that was recently proposed for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the case of noiseless oracle, this novel algorithm enjoys better bounds on bias and variance than the $\ell_2$ randomization and the commonly used Gaussian randomization algorithms, while in the noisy case both $\ell_1$ and $\ell_2$ algorithms benefit from similar improved theoretical guarantees. The improvements are achieved thanks to a new proof techniques based on Poincar\'e type inequalities for uniform distributions on the $\ell_1$ or $\ell_2$ spheres. The results are established under weak (almost adversarial) assumptions on the noise. Moreover, we provide minimax lower bounds proving optimality or near optimality of the obtained upper bounds in several cases.
The vertex cover problem is a fundamental and widely studied combinatorial optimization problem. It is known that its standard linear programming relaxation is integral for bipartite graphs and half-integral for general graphs. As a consequence, the natural rounding algorithm based on this relaxation computes an optimal solution for bipartite graphs and a $2$-approximation for general graphs. This raises the question of whether one can interpolate the rounding curve of the standard linear programming relaxation in a beyond the worst-case manner, depending on how close the graph is to being bipartite. In this paper, we consider a simple rounding algorithm that exploits the knowledge of an induced bipartite subgraph to attain improved approximation ratios. Equivalently, we suppose that we work with a pair $(G, S)$, consisting of a graph with an odd cycle transversal. If $S$ is a stable set, we prove a tight approximation ratio of $1 + 1/\rho$, where $2\rho -1$ denotes the odd girth (i.e., length of the shortest odd cycle) of the contracted graph $\tilde{G} := G /S$ and satisfies $\rho \in [2,\infty]$. If $S$ is an arbitrary set, we prove a tight approximation ratio of $\left(1+1/\rho \right) (1 - \alpha) + 2 \alpha$, where $\alpha \in [0,1]$ is a natural parameter measuring the quality of the set $S$. The technique used to prove tight improved approximation ratios relies on a structural analysis of the contracted graph $\tilde{G}$. Tightness is shown by constructing classes of weight functions matching the obtained upper bounds. As a byproduct of the structural analysis, we obtain improved tight bounds on the integrality gap and the fractional chromatic number of 3-colorable graphs. We also discuss algorithmic applications in order to find good odd cycle transversals and show optimality of the analysis.
Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
This work proposes a wavelet shrinkage rule under asymmetric LINEX loss function and a mixture of a point mass function at zero and the logistic distribution as prior distribution to the wavelet coefficients in a nonparametric regression model with gaussian error. Underestimation of a significant wavelet coefficient can lead to a bad detection of features of the unknown function such as peaks, discontinuities and oscillations. It can also occur under asymmetrically distributed wavelet coefficients. Thus the proposed rule is suitable when overestimation and underestimation have asymmetric losses. Statistical properties of the rule such as squared bias, variance, frequentist and bayesian risks are obtained. Simulation studies are conducted to evaluate the performance of the rule against standard methods and an application in a real dataset involving infrared spectra is provided.
In this work, we consider the problem of distributed computing of functions of structured sources, focusing on the classical setting of two correlated sources and one user that seeks the outcome of the function while benefiting from low-rate side information provided by a helper node. Focusing on the case where the sources are jointly distributed according to a very general mixture model, we here provide an achievable coding scheme that manages to substantially reduce the communication cost of distributed computing by exploiting the nature of the joint distribution of the sources, the side information, as well as the symmetry enjoyed by the desired functions. Our scheme -- which can readily apply in a variety of real-life scenarios including learning, combinatorics, and graph neural network applications -- is here shown to provide substantial reductions in the communication costs, while simultaneously providing computational savings by reducing the exponential complexity of joint decoding techniques to a complexity that is merely linear.
Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification of these algorithms. First, we formulate direct policy optimization in the optimization by continuation framework. The latter is a framework for optimizing nonconvex functions where a sequence of surrogate objective functions, called continuations, are locally optimized. Second, we show that optimizing affine Gaussian policies and performing entropy regularization can be interpreted as implicitly optimizing deterministic policies by continuation. Based on these theoretical results, we argue that exploration in policy-gradient algorithms consists in computing a continuation of the return of the policy at hand, and that the variance of policies should be history-dependent functions adapted to avoid local extrema rather than to maximize the return of the policy.
The selection of Gaussian kernel parameters plays an important role in the applications of support vector classification (SVC). A commonly used method is the k-fold cross validation with grid search (CV), which is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new approach is proposed to train SVC and optimize the selection of Gaussian kernel parameters. We first formulate the training and parameter selection of SVC as a minimax optimization problem named as MaxMin-L2-SVC-NCH, in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal Gaussian kernel parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is not needed. We then propose a projected gradient algorithm (PGA) for training L2-SVC-NCH. The famous sequential minimal optimization (SMO) algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility than the SMO. Furthermore, the solution of the maximization problem is done by a gradient ascent algorithm with dynamic learning rate. The comparative experiments between MaxMin-L2-SVC-NCH and the previous best approaches on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained while maintaining competitive test accuracy. These findings indicate that MaxMin-L2-SVC-NCH is a better choice for SVC tasks.
Randomized controlled trials (RCTs) are a cornerstone of comparative effectiveness because they remove the confounding bias present in observational studies. However, RCTs are typically much smaller than observational studies because of financial and ethical considerations. Therefore it is of great interest to be able to incorporate plentiful observational data into the analysis of smaller RCTs. Previous estimators developed for this purpose rely on unrealistic additional assumptions without which the added data can bias the effect estimate. Recent work proposed an alternative method (prognostic adjustment) that imposes no additional assumption and increases efficiency in the analysis of RCTs. The idea is to use the observational data to learn a prognostic model: a regression of the outcome onto the covariates. The predictions from this model, generated from the RCT subjects' baseline variables, are used as a covariate in a linear model. In this work, we extend this framework to work when conducting inference with nonparametric efficient estimators in trial analysis. Using simulations, we find that this approach provides greater power (i.e., smaller standard errors) than without prognostic adjustment, especially when the trial is small. We also find that the method is robust to observed or unobserved shifts between the observational and trial populations and does not introduce bias. Lastly, we showcase this estimator leveraging real-world historical data on a randomized blood transfusion study of trauma patients.
Images captured in poorly lit conditions are often corrupted by acquisition noise. Leveraging recent advances in graph-based regularization, we propose a fast Retinex-based restoration scheme that denoises and contrast-enhances an image. Specifically, by Retinex theory we first assume that each image pixel is a multiplication of its reflectance and illumination components. We next assume that the reflectance and illumination components are piecewise constant (PWC) and continuous piecewise planar (PWP) signals, which can be recovered via graph Laplacian regularizer (GLR) and gradient graph Laplacian regularizer (GGLR) respectively. We formulate quadratic objectives regularized by GLR and GGLR, which are minimized alternately until convergence by solving linear systems -- with improved condition numbers via proposed preconditioners -- via conjugate gradient (CG) efficiently. Experimental results show that our algorithm achieves competitive visual image quality while reducing computation complexity noticeably.
In this study, we propose an improvement to the direct mating method, a constraint handling approach for multi-objective evolutionary algorithms, by hybridizing it with local mating. Local mating selects another parent from the feasible solution space around the initially selected parent. The direct mating method selects the other parent along the optimal direction in the objective space after the first parent is selected, even if it is infeasible. It shows better exploration performance for constraint optimization problems with coupling NSGA-II, but requires several individuals along the optimal direction. Due to the lack of better solutions dominated by the optimal direction from the first parent, direct mating becomes difficult as the generation proceeds. To address this issue, we propose a hybrid method that uses local mating to select another parent from the neighborhood of the first selected parent, maintaining diversity around good solutions and helping the direct mating process. We evaluate the proposed method on three mathematical problems with unique Pareto fronts and two real-world applications. We use the generation histories of the averages and standard deviations of the hypervolumes as the performance evaluation criteria. Our investigation results show that the proposed method can solve constraint multi-objective problems better than existing methods while maintaining high diversity.
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.