Efficient topology optimization based on the adaptive auxiliary reduced model reanalysis (AARMR) is proposed to improve computational efficiency and scale. In this method, a projection auxiliary reduced model (PARM) is integrated into the combined approximation reduced model (CARM) to reduce the dimension of the model in different aspects. First, the CARM restricts the solution space to avoid large matrix factorization. Second, the PARM is proposed to construct the CARM dynamically to save computational cost. Furthermore, the multi-grid conjugate gradient method is suggested to update PARM adaptively. Finally, several classic numerical examples are tested to show that the proposed method not only significantly improves computational efficiency, but also can solve large-scale problems that are difficult to solve by direct solvers due to the memory limitations.
This paper presents an efficient solution to 3D-LiDAR-based Monte Carlo localization (MCL). MCL robustly works if particles are exactly sampled around the ground truth. An inertial navigation system (INS) can be used for accurate sampling, but many particles are still needed to be used for solving the 3D localization problem even if INS is available. In particular, huge number of particles are necessary if INS is not available and it makes infeasible to perform 3D MCL in terms of the computational cost. Scan matching (SM), that is optimization-based localization, efficiently works even though INS is not available because SM can ignore movement constraints of a robot and/or device in its optimization process. However, SM sometimes determines an infeasible estimate against movement. We consider that MCL and SM have complemental advantages and disadvantages and propose a fusion method of MCL and SM. Because SM is considered as optimization of a measurement model in terms of the probabilistic modeling, we perform measurement model optimization as SM. The optimization result is then used to approximate the measurement model distribution and the approximated distribution is used to sample particles. The sampled particles are fused with MCL via importance sampling. As a result, the advantages of MCL and SM can be simultaneously utilized while mitigating their disadvantages. Experiments are conducted on the KITTI dataset and other two open datasets. Results show that the presented method can be run on a single CPU thread and accurately perform localization even if INS is not available.
We introduce a computationally efficient variant of the model-based ensemble Kalman filter (EnKF). We propose two changes to the original formulation. First, we phrase the setup in terms of precision matrices instead of covariance matrices, and introduce a new prior for the precision matrix which ensures it to be sparse. Second, we propose to split the state vector into several blocks and formulate an approximate updating procedure for each of these blocks. We study in a simulation example the computational speedup and the approximation error resulting from using the proposed approach. The speedup is substantial for high dimensional state vectors, allowing the proposed filter to be run on much larger problems than can be done with the original formulation. In the simulation example the approximation error resulting from using the introduced block updating is negligible compared to the Monte Carlo variability inherent in both the original and the proposed procedures.
Dataset distillation aims to generate small datasets with little information loss as large-scale datasets for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample generation process by matching synthetic images and the original ones regarding gradients, embedding distributions, or training trajectories. Although there are various matching objectives, currently the method for selecting original images is limited to naive random sampling. We argue that random sampling inevitably involves samples near the decision boundaries, which may provide large or noisy matching targets. Besides, random sampling cannot guarantee the evenness and diversity of the sample distribution. These factors together lead to large optimization oscillations and degrade the matching efficiency. Accordingly, we propose a novel matching strategy named as \textbf{D}ataset distillation by \textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only representative original images are selected for matching. DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the matching iterations by 10 times without performance drop. Given sufficient training time, DREAM further provides significant improvements and achieves state-of-the-art performances.
In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedure such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ Huber loss and truncation scheme to handle heavy-tailed observations, while $\ell_{1}$-regularization is adopted to overcome the curse of dimensionality under a sparse coefficient structure. To account for the time-varying coefficient, we estimate local high-dimensional coefficients which are biased estimators due to the $\ell_{1}$-regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large number property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We show that the RED-LASSO estimator can achieve a near-optimal convergence rate with only finite $\gamma$th moment for any $\gamma>2$. In the empirical study, we apply the RED-LASSO procedure to the high-dimensional integrated coefficient estimation using high-frequency trading data.
Conflict prediction is a vital component of path planning for autonomous vehicles. Prediction methods must be accurate for reliable navigation, but also computationally efficient to enable online path planning. Efficient prediction methods are especially crucial when testing large sets of candidate trajectories. We present a prediction method that has the same accuracy as existing methods, but up to an order of magnitude faster. This is achieved by rewriting the conflict prediction problem in terms of the first-passage time distribution using a dimension-reduction transform. First-passage time distributions are analytically derived for a subset of Gaussian processes describing vehicle motion. The proposed method is applicable to 2-D stochastic processes where the mean can be approximated by line segments, and the conflict boundary can be approximated by piece-wise straight lines. The proposed method was tested in simulation and compared to two probability flow methods, as well as a recent instantaneous conflict probability method. The results demonstrate a significant decrease of computation time.
Federated optimization (FedOpt), which targets at collaboratively training a learning model across a large number of distributed clients, is vital for federated learning. The primary concerns in FedOpt can be attributed to the model divergence and communication efficiency, which significantly affect the performance. In this paper, we propose a new method, i.e., LoSAC, to learn from heterogeneous distributed data more efficiently. Its key algorithmic insight is to locally update the estimate for the global full gradient after {each} regular local model update. Thus, LoSAC can keep clients' information refreshed in a more compact way. In particular, we have studied the convergence result for LoSAC. Besides, the bonus of LoSAC is the ability to defend the information leakage from the recent technique Deep Leakage Gradients (DLG). Finally, experiments have verified the superiority of LoSAC comparing with state-of-the-art FedOpt algorithms. Specifically, LoSAC significantly improves communication efficiency by more than $100\%$ on average, mitigates the model divergence problem and equips with the defense ability against DLG.
This work considers the low-rank approximation of a matrix $A(t)$ depending on a parameter $t$ in a compact set $D \subset \mathbb{R}^d$. Application areas that give rise to such problems include computational statistics and dynamical systems. Randomized algorithms are an increasingly popular approach for performing low-rank approximation and they usually proceed by multiplying the matrix with random dimension reduction matrices (DRMs). Applying such algorithms directly to $A(t)$ would involve different, independent DRMs for every $t$, which is not only expensive but also leads to inherently non-smooth approximations. In this work, we propose to use constant DRMs, that is, $A(t)$ is multiplied with the same DRM for every $t$. The resulting parameter-dependent extensions of two popular randomized algorithms, the randomized singular value decomposition and the generalized Nystr\"{o}m method, are computationally attractive, especially when $A(t)$ admits an affine linear decomposition with respect to $t$. We perform a probabilistic analysis for both algorithms, deriving bounds on the expected value as well as failure probabilities for the $L^2$ approximation error when using Gaussian random DRMs. Both, the theoretical results and numerical experiments, show that the use of constant DRMs does not impair their effectiveness; our methods reliably return quasi-best low-rank approximations.
Due to the sweeping digitalization of processes, increasingly vast amounts of time series data are being produced. Accurate classification of such time series facilitates decision making in multiple domains. State-of-the-art classification accuracy is often achieved by ensemble learning where results are synthesized from multiple base models. This characteristic implies that ensemble learning needs substantial computing resources, preventing their use in resource-limited environments, such as in edge devices. To extend the applicability of ensemble learning, we propose the LightTS framework that compresses large ensembles into lightweight models while ensuring competitive accuracy. First, we propose adaptive ensemble distillation that assigns adaptive weights to different base models such that their varying classification capabilities contribute purposefully to the training of the lightweight model. Second, we propose means of identifying Pareto optimal settings w.r.t. model accuracy and model size, thus enabling users with a space budget to select the most accurate lightweight model. We report on experiments using 128 real-world time series sets and different types of base models that justify key decisions in the design of LightTS and provide evidence that LightTS is able to outperform competitors.
This paper deals with state estimation of stochastic models with linear state dynamics, continuous or discrete in time. The emphasis is laid on a numerical solution to the state prediction by the time-update step of the grid-point-based point-mass filter (PMF), which is the most computationally demanding part of the PMF algorithm. A novel way of manipulating the grid, leading to the time-update in form of a convolution, is proposed. This reduces the PMF time complexity from quadratic to log-linear with respect to the number of grid points. Furthermore, the number of unique transition probability values is greatly reduced causing a significant reduction of the data storage needed. The proposed PMF prediction step is verified in a numerical study.
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.