In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
This paper presents a lightweight, efficient calibration neural network model for denoising low-cost microelectromechanical system (MEMS) gyroscope and estimating the attitude of a robot in real-time. The key idea is extracting local and global features from the time window of inertial measurement units (IMU) measurements to regress the output compensation components for the gyroscope dynamically. Following a carefully deduced mathematical calibration model, LGC-Net leverages the depthwise separable convolution to capture the sectional features and reduce the network model parameters. The Large kernel attention is designed to learn the long-range dependencies and feature representation better. The proposed algorithm is evaluated in the EuRoC and TUM-VI datasets and achieves state-of-the-art on the (unseen) test sequences with a more lightweight model structure. The estimated orientation with our LGC-Net is comparable with the top-ranked visual-inertial odometry systems, although it does not adopt vision sensors. We make our method open-source at: //github.com/huazai665/LGC-Net
Spiking neural networks have significant potential utility in robotics due to their high energy efficiency on specialized hardware, but proof-of-concept implementations have not yet typically achieved competitive performance or capability with conventional approaches. In this paper, we tackle one of the key practical challenges of scalability by introducing a novel modular ensemble network approach, where compact, localized spiking networks each learn and are solely responsible for recognizing places in a local region of the environment only. This modular approach creates a highly scalable system. However, it comes with a high-performance cost where a lack of global regularization at deployment time leads to hyperactive neurons that erroneously respond to places outside their learned region. Our second contribution introduces a regularization approach that detects and removes these problematic hyperactive neurons during the initial environmental learning phase. We evaluate this new scalable modular system on benchmark localization datasets Nordland and Oxford RobotCar, with comparisons to both standard techniques NetVLAD and SAD, and a previous spiking neural network system. Our system substantially outperforms the previous SNN system on its small dataset, but also maintains performance on 27 times larger benchmark datasets where the operation of the previous system is computationally infeasible, and performs competitively with the conventional localization systems.
Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a Jackal robot which is used to fine-tune our model. The dataset, the code, and more visualizations are available at //lhy.xyz/stereovoxelnet/
We compare the solutions of two systems of partial differential equations (PDE), seen as two different interpretations of the same model that describes formation of complex biological networks. Both approaches take into account the time evolution of the medium flowing through the network, and we compute the solution of an elliptic-parabolic PDE system for the conductivity vector $m$, the conductivity tensor $\mathbb{C}$ and the pressure $p$. We use finite differences schemes in a uniform Cartesian grid in the spatially two-dimensional setting to solve the two systems, where the parabolic equation is solved by a semi-implicit scheme in time. Since the conductivity vector and tensor appear also in the Poisson equation for the pressure $p$, the elliptic equation depends implicitly on time. For this reason we compute the solution of three linear systems in the case of the conductivity vector $m\in\mathbb{R}^2$, and four linear systems in the case of the symmetric conductivity tensor $\mathbb{C}\in\mathbb{R}^{2\times 2}$, at each time step. To accelerate the simulations, we make use of the Alternating Direction Implicit (ADI) method. The role of the parameters is important for obtaining detailed solutions. We provide numerous tests with various values of the parameters involved, to see the differences in the solutions of the two systems.
Interval-censored multi-state data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We formulate the effects of potentially time-dependent covariates on multi-state processes through semiparametric proportional intensity models with random effects. We adopt nonparametric maximum likelihood estimation (NPMLE) under general interval censoring and develop a stable expectation-maximization (EM) algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.
In this paper, we study the offline change point localization problem in a sequence of dependent nonparametric random dot product graphs. To be specific, assume that at every time point, a network is generated from a nonparametric random dot product graph model \citep[see e.g.][]{athreya2017statistical}, where the latent positions are generated from unknown underlying distributions. The underlying distributions are piecewise constant in time and change at unknown locations, called change points. Most importantly, we allow for dependence among networks generated between two consecutive change points. This setting incorporates edge-dependence within networks and temporal dependence between networks, which is the most flexible setting in the published literature. To accomplish the task of consistently localizing change points, we propose a novel change point detection algorithm, consisting of two steps. First, we estimate the latent positions of the random dot product model, our theoretical result being a refined version of the state-of-the-art results, allowing the dimension of the latent positions to diverge. Subsequently, we construct a nonparametric version of the CUSUM statistic \citep[e.g.][]{Page1954, padilla2019optimal} that allows for temporal dependence. Consistent localization is proved theoretically and supported by extensive numerical experiments, which illustrate state-of-the-art performance. We also provide in depth discussion of possible extensions to give more understanding and insights.
Neural architecture search (NAS) has become a common approach to developing and discovering new neural architectures for different target platforms and purposes. However, scanning the search space is comprised of long training processes of many candidate architectures, which is costly in terms of computational resources and time. Regression algorithms are a common tool to predicting a candidate architecture's accuracy, which can dramatically accelerate the search procedure. We aim at proposing a new baseline that will support the development of regression algorithms that can predict an architecture's accuracy just from its scheme, or by only training it for a minimal number of epochs. Therefore, we introduce the NAAP-440 dataset of 440 neural architectures, which were trained on CIFAR10 using a fixed recipe. Our experiments indicate that by using off-the-shelf regression algorithms and running up to 10% of the training process, not only is it possible to predict an architecture's accuracy rather precisely, but that the values predicted for the architectures also maintain their accuracy order with a minimal number of monotonicity violations. This approach may serve as a powerful tool for accelerating NAS-based studies and thus dramatically increase their efficiency. The dataset and code used in the study have been made public.
We study the problem of high-dimensional sparse linear regression in a distributed setting under both computational and communication constraints. Specifically, we consider a star topology network whereby several machines are connected to a fusion center, with whom they can exchange relatively short messages. Each machine holds noisy samples from a linear regression model with the same unknown sparse $d$-dimensional vector of regression coefficients $\theta$. The goal of the fusion center is to estimate the vector $\theta$ and its support using few computations and limited communication at each machine. In this work, we consider distributed algorithms based on Orthogonal Matching Pursuit (OMP) and theoretically study their ability to exactly recover the support of $\theta$. We prove that under certain conditions, even at low signal-to-noise-ratios where individual machines are unable to detect the support of $\theta$, distributed-OMP methods correctly recover it with total communication sublinear in $d$. In addition, we present simulations that illustrate the performance of distributed OMP-based algorithms and show that they perform similarly to more sophisticated and computationally intensive methods, and in some cases even outperform them.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.