This paper considers the regularization continuation method and the trust-region updating strategy for the nonlinearly equality-constrained optimization problem. Namely, it uses the inverse of the regularization quasi-Newton matrix as the pre-conditioner to improve its computational efficiency in the well-posed phase, and it adopts the inverse of the regularization two-sided projection of the Hessian as the pre-conditioner to improve its robustness in the ill-conditioned phase. Since it only solves a linear system of equations at every iteration and the sequential quadratic programming (SQP) needs to solve a quadratic programming subproblem at every iteration, it is faster than SQP. Numerical results also show that it is more robust and faster than SQP (the built-in subroutine fmincon.m of the MATLAB2020a environment and the subroutine SNOPT executed in GAMS v28.2 (2019) environment). The computational time of the new method is about one third of that of fmincon.m for the large-scale problem. Finally, the global convergence analysis of the new method is also given.
This paper develops projection-free algorithms for online convex optimization with stochastic constraints. We design an online primal-dual projection-free framework that can take any projection-free algorithms developed for online convex optimization with no long-term constraint. With this general template, we deduce sublinear regret and constraint violation bounds for various settings. Moreover, for the case where the loss and constraint functions are smooth, we develop a primal-dual conditional gradient method that achieves $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations. Furthermore, for the setting where the loss and constraint functions are stochastic and strong duality holds for the associated offline stochastic optimization problem, we prove that the constraint violation can be reduced to have the same asymptotic growth as the regret.
It is very well-known that when the exact line search gradient descent method is applied to a convex quadratic objective, the worst case rate of convergence (among all seed vectors) deteriorates as the condition number of the Hessian of the objective grows. By an elegant analysis by H. Akaike, it is generally believed -- but not proved -- that in the ill-conditioned regime the ROC for almost all initial vectors, and hence also the average ROC, is close to the worst case ROC. We complete Akaike's analysis using the theorem of center and stable manifolds. Our analysis also makes apparent the effect of an intermediate eigenvalue in the Hessian by establishing the following somewhat amusing result: In the absence of an intermediate eigenvalue, the average ROC gets arbitrarily fast -- not slow -- as the Hessian gets increasingly ill-conditioned. We discuss in passing some contemporary applications of exact line search GD to polynomial optimization problems arising from imaging and data sciences.
"Data is the new oil", in short, data would be the essential source of the ongoing fourth industrial revolution, which has led some commentators to assimilate too quickly the quantity of data to a source of wealth in itself, and consider the development of big data as an quasi direct cause of profit. Human resources management is not escaping this trend, and the accumulation of large amounts of data on employees is perceived by some entrepreneurs as a necessary and sufficient condition for the construction of predictive models of complex work behaviors such as absenteeism or job performance. In fact, the analogy is somewhat misleading: unlike oil, there are no major issues here concerning the production of data (whose flows are generated continuously and at low cost by various information systems), but rather their ''refining'', i.e. the operations necessary to transform this data into a useful product, namely into knowledge. This transformation is where the methodological challenges of data valuation lie, both for practitioners and for academic researchers. Considerations on the methods applicable to take advantage of the possibilities offered by these massive data are relatively recent, and often highlight the disruptive aspect of the current ''data deluge'' to point out that this evolution would be the source of a revival of empiricism in a ''fourth paradigm'' based on the intensive and ''agnostic'' exploitation of massive amounts of data in order to bring out new knowledge, following a purely inductive logic. Although we do not adopt this speculative point of view, it is clear that data-driven approaches are scarce in quantitative HRM studies. However, there are well-established methods, particularly in the field of data mining, which are based on inductive approaches. This area of quantitative analysis with an inductive aim is still relatively unexplored in HRM ( apart from typological analyses). The objective of this paper is first to give an overview of data driven methods that can be used for HRM research, before proposing an empirical illustration which consists in an exploratory research combining a latent profile analysis and an exploration by Gaussian graphical models.
Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization problem using an interior-point filtering linear-search algorithm. The objective function innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. The best sub-models from each model class are selected to build the NCL ensemble, which performance is better than the simple average and other state-of-the-art weighting methods. It is also possible to improve the NCL ensemble with a regularization term in the objective function. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. Regardless, our method would achieve comparable accuracy as the potential optimal sub-models. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace diversity and accuracy.
Communication compression is an essential strategy for alleviating communication overhead by reducing the volume of information exchanged between computing nodes in large-scale distributed stochastic optimization. Although numerous algorithms with convergence guarantees have been obtained, the optimal performance limit under communication compression remains unclear. In this paper, we investigate the performance limit of distributed stochastic optimization algorithms employing communication compression. We focus on two main types of compressors, unbiased and contractive, and address the best-possible convergence rates one can obtain with these compressors. We establish the lower bounds for the convergence rates of distributed stochastic optimization in six different settings, combining strongly-convex, generally-convex, or non-convex functions with unbiased or contractive compressor types. To bridge the gap between lower bounds and existing algorithms' rates, we propose NEOLITHIC, a nearly optimal algorithm with compression that achieves the established lower bounds up to logarithmic factors under mild conditions. Extensive experimental results support our theoretical findings. This work provides insights into the theoretical limitations of existing compressors and motivates further research into fundamentally new compressor properties.
Quantile regression (QR) can be used to describe the comprehensive relationship between a response and predictors. Prior domain knowledge and assumptions in application are usually formulated as constraints of parameters to improve the estimation efficiency. This paper develops methods based on multi-block ADMM to fit general penalized QR with linear constraints of regression coefficients. Different formulations to handle the linear constraints and general penalty are explored and compared. The most efficient one has explicit expressions for each parameter and avoids nested-loop iterations in some existing algorithms. Additionally, parallel ADMM algorithm for big data is also developed when data are stored in a distributed fashion. The stopping criterion and convergence of the algorithm are established. Extensive numerical experiments and a real data example demonstrate the computational efficiency of the proposed algorithms. The details of theoretical proofs and different algorithm variations are presented in Appendix.
Inverse problems are in many cases solved with optimization techniques. When the underlying model is linear, first-order gradient methods are usually sufficient. With nonlinear models, due to nonconvexity, one must often resort to second-order methods that are computationally more expensive. In this work we aim to approximate a nonlinear model with a linear one and correct the resulting approximation error. We develop a sequential method that iteratively solves a linear inverse problem and updates the approximation error by evaluating it at the new solution. This treatment convexifies the problem and allows us to benefit from established convex optimization methods. We separately consider cases where the approximation is fixed over iterations and where the approximation is adaptive. In the fixed case we show theoretically under what assumptions the sequence converges. In the adaptive case, particularly considering the special case of approximation by first-order Taylor expansion, we show that with certain assumptions the sequence converges to a critical point of the original nonconvex functional. Furthermore, we show that with quadratic objective functions the sequence corresponds to the Gauss-Newton method. Finally, we showcase numerical results superior to the conventional model correction method. We also show, that a fixed approximation can provide competitive results with considerable computational speed-up.
An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. Firstly, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Secondly, we introduce an adaptation of the one-sided sign test that uses a modified \textit{p}-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.