Optimization problems are crucial in artificial intelligence. Optimization algorithms are generally used to adjust the performance of artificial intelligence models to minimize the error of mapping inputs to outputs. Current evaluation methods on optimization algorithms generally consider the performance in terms of quality. However, not all optimization algorithms for all test cases are evaluated equal from quality, the computation time should be also considered for optimization tasks. In this paper, we investigate the quality and computation time of optimization algorithms in optimization problems, instead of the one-for-all evaluation of quality. We select the well-known optimization algorithms (Bayesian optimization and evolutionary algorithms) and evaluate them on the benchmark test functions in terms of quality and computation time. The results show that BO is suitable to be applied in the optimization tasks that are needed to obtain desired quality in the limited function evaluations, and the EAs are suitable to search the optimal of the tasks that are allowed to find the optimal solution with enough function evaluations. This paper provides the recommendation to select suitable optimization algorithms for optimization problems with different numbers of function evaluations, which contributes to the efficiency that obtains the desired quality with less computation time for optimization problems.
Modern wireless cellular networks use massive multiple-input multiple-output (MIMO) technology. This technology involves operations with an antenna array at a base station that simultaneously serves multiple mobile devices which also use multiple antennas on their side. For this, various precoding and detection techniques are used, allowing each user to receive the signal intended for him from the base station. There is an important class of linear precoding called Regularized Zero-Forcing (RZF). In this work, we propose Adaptive RZF (ARZF) with a special kind of regularization matrix with different coefficients for each layer of multi-antenna users. These regularization coefficients are defined by explicit formulas based on SVD decompositions of user channel matrices. We study the optimization problem, which is solved by the proposed algorithm, with the connection to other possible problem statements. We also compare the proposed algorithm with state-of-the-art linear precoding algorithms on simulations with the Quadriga channel model. The proposed approach provides a significant increase in quality with the same computation time as in the reference methods.
The paper studies the multi-user precoding problem as a non-convex optimization problem for wireless multiple input and multiple output (MIMO) systems. In our work, we approximate the target Spectral Efficiency function with a novel computationally simpler function. Then, we reduce the precoding problem to an unconstrained optimization task using a special differential projection method and solve it by the Quasi-Newton L-BFGS iterative procedure to achieve gains in capacity. We are testing the proposed approach in several scenarios generated using Quadriga~-- open-source software for generating realistic radio channel impulse response. Our method shows monotonic improvement over heuristic methods with reasonable computation time. The proposed L-BFGS optimization scheme is novel in this area and shows a significant advantage over the standard approaches. The proposed method has a simple implementation and can be a good reference for other heuristic algorithms in this field.
We introduce Stochastic Asymptotical Regularization (SAR) methods for the uncertainty quantification of the stable approximate solution of ill-posed linear-operator equations, which are deterministic models for numerous inverse problems in science and engineering. We prove the regularizing properties of SAR with regard to mean-square convergence. We also show that SAR is an optimal-order regularization method for linear ill-posed problems provided that the terminating time of SAR is chosen according to the smoothness of the solution. This result is proven for both a priori and a posteriori stopping rules under general range-type source conditions. Furthermore, some converse results of SAR are verified. Two iterative schemes are developed for the numerical realization of SAR, and the convergence analyses of these two numerical schemes are also provided. A toy example and a real-world problem of biosensor tomography are studied to show the accuracy and the advantages of SAR: compared with the conventional deterministic regularization approaches for deterministic inverse problems, SAR can provide the uncertainty quantification of the quantity of interest, which can in turn be used to reveal and explicate the hidden information about real-world problems, usually obscured by the incomplete mathematical modeling and the ascendence of complex-structured noise.
We focus on Maximum Inner Product Search (MIPS), which is an essential problem in many machine learning communities. Given a query, MIPS finds the most similar items with the maximum inner products. Methods for Nearest Neighbor Search (NNS) which is usually defined on metric space don't exhibit the satisfactory performance for MIPS problem since inner product is a non-metric function. However, inner products exhibit many good properties compared with metric functions, such as avoiding vanishing and exploding gradients. As a result, inner product is widely used in many recommendation systems, which makes efficient Maximum Inner Product Search a key for speeding up many recommendation systems. Graph based methods for NNS problem show the superiorities compared with other class methods. Each data point of the database is mapped to a node of the proximity graph. Nearest neighbor search in the database can be converted to route on the proximity graph to find the nearest neighbor for the query. This technique can be used to solve MIPS problem. Instead of searching the nearest neighbor for the query, we search the item with maximum inner product with query on the proximity graph. In this paper, we propose a reinforcement model to train an agent to search on the proximity graph automatically for MIPS problem if we lack the ground truths of training queries. If we know the ground truths of some training queries, our model can also utilize these ground truths by imitation learning to improve the agent's search ability. By experiments, we can see that our proposed mode which combines reinforcement learning with imitation learning shows the superiorities over the state-of-the-art methods
We propose throughput and cost optimal job scheduling algorithms in cloud computing platforms offering Infrastructure as a Service. We first consider online migration and propose job scheduling algorithms to minimize job migration and server running costs. We consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We develop a drift-plus-penalty framework based algorithm that can achieve optimal cost arbitrarily closely. Specifically this algorithm yields a trade-off between delay and costs. We then relax the job-size knowledge assumption and give an algorithm that uses readily offered service to the jobs. We show that this algorithm gives order-wise identical cost as the job size based algorithm. Later, we consider offline job migration that incurs migration delays. We again present throughput optimal algorithms that minimize server running cost. We illustrate the performance of the proposed algorithms and compare these to the existing algorithms via simulation.
The Extended Randomized Kaczmarz method is a well known iterative scheme which can find the Moore-Penrose inverse solution of a possibly inconsistent linear system and requires only one additional column of the system matrix in each iteration in comparison with the standard randomized Kaczmarz method. Also, the Sparse Randomized Kaczmarz method has been shown to converge linearly to a sparse solution of a consistent linear system. Here, we combine both ideas and propose an Extended Sparse Randomized Kaczmarz method. We show linear expected convergence to a sparse least squares solution in the sense that an extended variant of the regularized basis pursuit problem is solved. Moreover, we generalize the additional step in the method and prove convergence to a more abstract optimization problem. We demonstrate numerically that our method can find sparse least squares solutions of real and complex systems if the noise is concentrated in the complement of the range of the system matrix and that our generalization can handle impulsive noise.
The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.
Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, while CIDEr score ensures our captions are syntactically fluent. The PG method we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. We show empirically that our algorithm leads to easier optimization and improved results compared to MIXER. Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.