As LiDAR sensors have become ubiquitous, the need for an efficient LiDAR data compression algorithm has increased. Modern LiDARs produce gigabytes of scan data per hour and are often used in applications with limited compute, bandwidth, and storage resources. We present a fast, lossless compression algorithm for LiDAR range and attribute scan sequences including multiple-return range, signal, reflectivity, and ambient infrared. Our algorithm -- dubbed "Jiffy" -- achieves substantial compression by exploiting spatiotemporal redundancy and sparsity. Speed is accomplished by maximizing use of single-instruction-multiple-data (SIMD) instructions. In autonomous driving, infrastructure monitoring, drone inspection, and handheld mapping benchmarks, the Jiffy algorithm consistently outcompresses competing lossless codecs while operating at speeds in excess of 65M points/sec on a single core. In a typical autonomous vehicle use case, single-threaded Jiffy achieves 6x compression of centimeter-precision range scans at 500+ scans per second. To ensure reproducibility and enable adoption, the software is freely available as an open source library.
To overcome the quadratic cost of self-attention, recent works have proposed various sparse attention modules, most of which fall under one of two groups: 1) sparse attention under a hand-crafted patterns and 2) full attention followed by a sparse variant of softmax such as $\alpha$-entmax. Unfortunately, the first group lacks adaptability to data while the second still requires quadratic cost in training. In this work, we propose SBM-Transformer, a model that resolves both problems by endowing each attention head with a mixed-membership Stochastic Block Model (SBM). Then, each attention head data-adaptively samples a bipartite graph, the adjacency of which is used as an attention mask for each input. During backpropagation, a straight-through estimator is used to flow gradients beyond the discrete sampling step and adjust the probabilities of sampled edges based on the predictive loss. The forward and backward cost are thus linear to the number of edges, which each attention head can also choose flexibly based on the input. By assessing the distribution of graphs, we theoretically show that SBM-Transformer is a universal approximator for arbitrary sequence-to-sequence functions in expectation. Empirical evaluations under the LRA and GLUE benchmarks demonstrate that our model outperforms previous efficient variants as well as the original Transformer with full attention. Our implementation can be found in //github.com/sc782/SBM-Transformer .
Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models -- this opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks. Our experiments suggest a versatile regularizer (or constraint) with many advantages, which avoids some of the common difficulties one faces in such analyses. Code is at //github.com/zhenxingjian/Partial_Distance_Correlation.
The trend of future communication systems is to aim for the steering and control of cyber physical systems. These systems can quickly become congested in environments like those presented in Industry 4.0. In these scenarios, a plethora of sensor data is transmitted wirelessly to multiple in network controllers that compute the control functions of the cyber physical systems. In this paper, we show an implementation of network Functional Compression (FC) as a proof of concept to drastically reduce the data traffic in these scenarios. FC is a form of goal-oriented communication scheme in which the objective of the sender receiver pair is to transmit the minimum amount of information to compute a function at the receiver end. In our scenario, the senders transmit an encoded and compressed version of the sensor data to a destination, an in-network controller interested in computing as its target function, a PID controller. We show that it is possible to achieve compression rates of over 50% in some cases by employing FC. We also show that using FC in a distributed cascade fashion can achieve more significant compression rates while reducing computational costs.
Deep convolutional neural networks (CNNs) with a large number of parameters require intensive computational resources, and thus are hard to be deployed in resource-constrained platforms. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. However, since the compression factor and performance are negatively correlated, the state-of-the-art works either suffer from severe performance degradation or have relatively low compression factors. To overcome this problem, we propose to compress CNNs and alleviate performance degradation via joint matrix decomposition, which is different from existing works that compressed layers separately. The idea is inspired by the fact that there are lots of repeated modules in CNNs. By projecting weights with the same structures into the same subspace, networks can be jointly compressed with larger ranks. In particular, three joint matrix decomposition schemes are developed, and the corresponding optimization approaches based on Singular Value Decomposition are proposed. Extensive experiments are conducted across three challenging compact CNNs for different benchmark data sets to demonstrate the superior performance of our proposed algorithms. As a result, our methods can compress the size of ResNet-34 by 22X with slighter accuracy degradation compared with several state-of-the-art methods.
We introduce a new statistical test based on the observed spacings of ordered data. The statistic is sensitive to detect non-uniformity in random samples, or short-lived features in event time series. Under some conditions, this new test can outperform existing ones, such as the well known Kolmogorov-Smirnov or Anderson-Darling tests, in particular when the number of samples is small and differences occur over a small quantile of the null hypothesis distribution. A detailed description of the test statistic is provided including a detailed discussion of the parameterization of its distribution via asymptotic bootstrapping as well as a novel per-quantile error estimation of the empirical distribution. Two example applications are provided, using the test to boost the sensitivity in generic "bump hunting", and employing the test to detect supernovae. The article is rounded off with an extended performance comparison to other, established goodness-of-fit tests.
LiDAR sensors are an integral part of modern autonomous vehicles as they provide an accurate, high-resolution 3D representation of the vehicle's surroundings. However, it is computationally difficult to make use of the ever-increasing amounts of data from multiple high-resolution LiDAR sensors. As frame-rates, point cloud sizes and sensor resolutions increase, real-time processing of these point clouds must still extract semantics from this increasingly precise picture of the vehicle's environment. One deciding factor of the run-time performance and accuracy of deep neural networks operating on these point clouds is the underlying data representation and the way it is computed. In this work, we examine the relationship between the computational representations used in neural networks and their performance characteristics. To this end, we propose a novel computational taxonomy of LiDAR point cloud representations used in modern deep neural networks for 3D point cloud processing. Using this taxonomy, we perform a structured analysis of different families of approaches. Thereby, we uncover common advantages and limitations in terms of computational efficiency, memory requirements, and representational capacity as measured by semantic segmentation performance. Finally, we provide some insights and guidance for future developments in neural point cloud processing methods.
We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate). While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, we show that oblique regression trees (constructed by recursively minimizing squared error) satisfy a type of oracle inequality and can adapt to a rich library of regression models consisting of linear combinations of ridge functions and their limit points. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, one need not always trade-off interpretability with accuracy. Specifically, we show that, under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models. To address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, our proposed theoretical framework can accommodate many existing computational tools in the literature. Our results rely on (arguably surprising) connections between recursive adaptive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms), which may be of independent theoretical interest.
A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. We also show the use of BTs in automated planning and machine learning. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion.
During the last decade, hyperspectral images have attracted increasing interest from researchers worldwide. They provide more detailed information about an observed area and allow an accurate target detection and precise discrimination of objects compared to classical RGB and multispectral images. Despite the great potentialities of hyperspectral technology, the analysis and exploitation of the large volume data remain a challenging task. The existence of irrelevant redundant and noisy images decreases the classification accuracy. As a result, dimensionality reduction is a mandatory step in order to select a minimal and effective images subset. In this paper, a new filter approach normalized mutual synergy (NMS) is proposed in order to detect relevant bands that are complementary in the class prediction better than the original hyperspectral cube data. The algorithm consists of two steps: images selection through normalized synergy information and pixel classification. The proposed approach measures the discriminative power of the selected bands based on a combination of their maximal normalized synergic information, minimum redundancy and maximal mutual information with the ground truth. A comparative study using the support vector machine (SVM) and k-nearest neighbor (KNN) classifiers is conducted to evaluate the proposed approach compared to the state of art band selection methods. Experimental results on three benchmark hyperspectral images proposed by the NASA "Aviris Indiana Pine", "Salinas" and "Pavia University" demonstrated the robustness, effectiveness and the discriminative power of the proposed approach over the literature approaches. Keywords: Hyperspectral images; target detection; pixel classification; dimensionality reduction; band selection; information theory; mutual information; normalized synergy
Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.