Iterative solutions of sparse linear systems and sparse eigenvalue problems have a fundamental role in vital fields of scientific research and engineering. The crucial computing kernel for such iterative solutions is the multiplication of a sparse matrix by a dense vector. Efficient implementation of sparse matrix-vector multiplication (SpMV) and linear solvers are therefore essential and has been subjected to extensive research across a variety of computing architectures and accelerators such as central processing units (CPUs), graphical processing units (GPUs), many integrated cores (MICs), and field programmable gate arrays (FPGAs). Unleashing the full potential of an architecture/accelerator requires determining the factors that affect an efficient implementation of SpMV. This article presents the first of its kind, in-depth survey covering over two hundred state-of-the-art optimization schemes for solving sparse iterative linear systems with a focus on computing SpMV. A new taxonomy for iterative solutions and SpMV techniques common to all architectures is proposed. This article includes reviews of SpMV techniques for all architectures to consolidate a single taxonomy to encourage cross-architectural and heterogeneous-architecture developments. However, the primary focus is on GPUs. The major contributions as well as the primary, secondary, and tertiary contributions of the SpMV techniques are first highlighted utilizing the taxonomy and then qualitatively compared. A summary of the current state of the research for each architecture is discussed separately. Finally, several open problems and key challenges for future research directions are outlined.
Bayesian optimization (BO) is widely used to optimize black-box functions. It works by first building a surrogate for the objective and quantifying the uncertainty in that surrogate. It then decides where to sample by maximizing an acquisition function defined by the surrogate model. Prior approaches typically use randomly generated raw samples to initialize the acquisition function maximizer. However, this strategy is ill-suited for high-dimensional BO. Given the large regions of high posterior uncertainty in high dimensions, a randomly initialized acquisition function maximizer is likely to focus on areas with high posterior uncertainty, leading to overly exploring areas that offer little gain. This paper provides the first comprehensive empirical study to reveal the importance of the initialization phase of acquisition function maximization. It proposes a better initialization approach by employing multiple heuristic optimizers to leverage the knowledge of already evaluated samples to generate initial points to be explored by an acquisition function maximizer. We evaluate our approach on widely used synthetic test functions and real-world applications. Experimental results show that our techniques, while simple, can significantly enhance the standard BO and outperforms state-of-the-art high-dimensional BO techniques by a large margin in most test cases.
We present and analyze a parallel implementation of a parallel-in-time collocation method based on $\alpha$-circulant preconditioned Richardson iterations. While many papers explore this family of single-level, time-parallel "all-at-once" integrators from various perspectives, performance results of actual parallel runs are still scarce. This leaves a critical gap, because the efficiency and applicability of any parallel method heavily rely on the actual parallel performance, with only limited guidance from theoretical considerations. Further, challenges like selecting good parameters, finding suitable communication strategies, and performing a fair comparison to sequential time-stepping methods can be easily missed. In this paper, we first extend the original idea of these fixed point iterative approaches based on $\alpha$-circulant preconditioners to high-order collocation methods, adding yet another level of parallelization in time "across the method". We derive an adaptive strategy to select a new $\alpha$-circulant preconditioner for each iteration during runtime for balancing convergence rates, round-off errors, and inexactness of inner system solves for the individual time-steps. After addressing these more theoretical challenges, we present an open-source space- and time-parallel implementation and evaluate its performance for two different test problems.
This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation. In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima. Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions. For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation. The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods. At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory. Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.
In this paper, a general framework for linear secure distributed matrix multiplication (SDMM) is introduced. The model allows for a neat treatment of straggling and Byzantine servers via a star product interpretation as well as simplified security proofs. Known properties of star products also immediately yield a lower bound for the recovery threshold as well as an upper bound for the number of colluding workers the system can tolerate. Another bound on the recovery threshold is given by the decodability condition, which generalizes a bound for GASP codes. The framework produces many of the known SDMM schemes as special cases, thereby providing unification for the previous literature on the topic. Furthermore, error behavior specific to SDMM is discussed and interleaved codes are proposed as a suitable means for efficient error correction in the proposed model. Analysis of the error correction capability under natural assumptions about the error distribution is also provided, largely based on well-known results on interleaved codes. Error detection and other error distributions are also discussed.
Movable antenna (MA) is a promising technology to improve wireless communication performance by varying the antenna position in a given finite area at the transceivers to create more favorable channel conditions. In this paper, we investigate the MA-enhanced multiple-access channel (MAC) for the uplink transmission from multiple users each equipped with a single MA to a base station (BS) with a fixed-position antenna (FPA) array. A field-response based channel model is used to characterize the multi-path channel between the antenna array of the BS and each user's MA with a flexible position. To evaluate the MAC performance gain provided by MAs, we formulate an optimization problem for minimizing the total transmit power of users, subject to a minimum-achievable-rate requirement for each user, where the positions of MAs and the transmit powers of users, as well as the receive combining matrix at the BS are jointly optimized. To solve this non-convex optimization problem involving intricately coupled variables, we develop two algorithms based on zero-forcing (ZF) and minimum mean square error (MMSE) combining methods, respectively. Specifically, for each algorithm, the combining matrix of the BS and the total transmit power of users are expressed as a function of the MAs' position vectors, which are then optimized by using the gradient descent method in an iterative manner. It is shown that the proposed ZF-based and MMSE-based algorithms can converge to high-quality suboptimal solutions with low computational complexities. Simulation results demonstrate that the proposed solutions for MA-enhanced multiple access systems can significantly decrease the total transmit power of users as compared to conventional FPA systems under both perfect and imperfect field-response information.
Reconfigurable intelligent surface has recently emerged as a promising technology for shaping the wireless environment by leveraging massive low-cost reconfigurable elements. Prior works mainly focus on a single-layer metasurface that lacks the capability of suppressing multiuser interference. By contrast, we propose a stacked intelligent metasurface (SIM)-enabled transceiver design for multiuser multiple-input single-output downlink communications. Specifically, the SIM is endowed with a multilayer structure and is deployed at the base station to perform transmit beamforming directly in the electromagnetic wave domain. As a result, an SIM-enabled transceiver overcomes the need for digital beamforming and operates with low-resolution digital-to-analog converters and a moderate number of radio-frequency chains, which significantly reduces the hardware cost and energy consumption, while substantially decreasing the precoding delay benefiting from the processing performed in the wave domain. To leverage the benefits of SIM-enabled transceivers, we formulate an optimization problem for maximizing the sum rate of all the users by jointly designing the transmit power allocated to them and the analog beamforming in the wave domain. Numerical results based on a customized alternating optimization algorithm corroborate the effectiveness of the proposed SIM-enabled analog beamforming design as compared with various benchmark schemes. Most notably, the proposed analog beamforming scheme is capable of substantially decreasing the precoding delay compared to its digital counterpart.
Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.