The rapidly growing traffic demands in fiber-optical networks require flexibility and accuracy in configuring lightpaths, for which fast and accurate quality of transmission (QoT) estimation is of pivotal importance. This paper introduces a machine learning (ML)-based QoT estimation approach that meets these requirements. The proposed gradient-boosting ML model uses precomputed per-channel self-channel-interference values as representative and condensed features to estimate non-linear interference in a flexible-grid network. With an enhanced Gaussian noise (GN) model simulation as the baseline, the ML model achieves a mean absolute signal-to-noise ratio error of approximately 0.1 dB, which is an improvement over the GN model. For three different network topologies and network planning approaches of varying complexities, a multi-period network planning study is performed in which ML and GN are compared as path computation elements (PCEs). The results show that the ML PCE is capable of matching or slightly improving the performance of the GN PCE on all topologies while reducing significantly the computation time of network planning by up to 70%.
Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former can be reduced to a variant of the kernel density estimation (KDE) problem, and an efficient KDE solver can be further utilized to accelerate the latter via subsampling-based fast matrix products. Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. Empirically, we verify that KDEformer outperforms other attention approximations in terms of accuracy, memory, and runtime on various pre-trained models. On BigGAN image generation, we achieve better generative scores than the exact computation with over $4\times$ speedup. For ImageNet classification with T2T-ViT, KDEformer shows over $18\times$ speedup while the accuracy drop is less than $0.5\%$.
Coverage path planning is the problem of finding the shortest path that covers the entire free space of a given confined area, with applications ranging from robotic lawn mowing and vacuum cleaning, to demining and search-and-rescue tasks. While offline methods can find provably complete, and in some cases optimal, paths for known environments, their value is limited in online scenarios where the environment is not known beforehand, especially in the presence of non-static obstacles. We propose an end-to-end reinforcement learning-based approach in continuous state and action space, for the online coverage path planning problem that can handle unknown environments. We construct the observation space from both global maps and local sensory inputs, allowing the agent to plan a long-term path, and simultaneously act on short-term obstacle detections. To account for large-scale environments, we propose to use a multi-scale map input representation. Furthermore, we propose a novel total variation reward term for eliminating thin strips of uncovered space in the learned path. To validate the effectiveness of our approach, we perform extensive experiments in simulation with a distance sensor, surpassing the performance of a recent reinforcement learning-based approach.
Procedural Content Generation via Machine Learning (PCGML) faces a significant hurdle that sets it apart from other fields, such as image or text generation, which is limited annotated data. Many existing methods for procedural level generation via machine learning require a secondary representation besides level images. However, the current methods for obtaining such representations are laborious and time-consuming, which contributes to this problem. In this work, we aim to address this problem by utilizing gameplay videos of two human-annotated games to develop a novel multi-tail framework that learns to perform simultaneous level translation and generation. The translation tail of our framework can convert gameplay video frames to an equivalent secondary representation, while its generation tail can produce novel level segments. Evaluation results and comparisons between our framework and baselines suggest that combining the level generation and translation tasks can lead to an overall improved performance regarding both tasks. This represents a possible solution to limited annotated level data, and we demonstrate the potential for future versions to generalize to unseen games.
Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.
In the present work, we introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets. Reduced models provide a real-time numerical approximation by simplifying the original model. The error introduced by the such operation is usually neglected and sacrificed in order to reach a fast computation. We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions. We emphasize that the framework maximizes the exploitation of high-fidelity information, using it for building the reduced order model and for learning the residual. In this work, we explore the integration of proper orthogonal decomposition (POD), and gappy POD for sensors data, with the recent DeepONet architecture. Numerical investigations for a parametric benchmark function and a nonlinear parametric Navier-Stokes problem are presented.
Due to the nature of pure-tone audiometry test, hearing loss data often has a complicated correlation structure. Generalized estimating equation (GEE) is commonly used to investigate the association between exposures and hearing loss, because it is robust to misspecification of the correlation matrix. However, this robustness typically entails a moderate loss of estimation efficiency in finite samples. This paper proposes to model the correlation coefficients and use second-order generalized estimating equations to estimate the correlation parameters. In simulation studies, we assessed the finite sample performance of our proposed method and compared it with other methods, such as GEE with independent, exchangeable and unstructured correlation structures. Our method achieves an efficiency gain which is larger for the coefficients of the covariates corresponding to the within-cluster variation (e.g., ear-level covariates) than the coefficients of cluster-level covariates. The efficiency gain is also more pronounced when the within-cluster correlations are moderate to strong, or when comparing to GEE with an unstructured correlation structure. As a real-world example, we applied the proposed method to data from the Audiology Assessment Arm of the Conservation of Hearing Study, and studied the association between a dietary adherence score and hearing loss.
For letting mobile robots travel flexibly through complicated environments, increasing attention has been paid to the whole-body collision evaluation. Most existing works either opt for the conservative corridor-based methods that impose strict requirements on the corridor generation, or ESDF-based methods that suffer from high computational overhead. It is still a great challenge to achieve fast and accurate whole-body collision evaluation. In this paper, we propose a Robo-centric ESDF (RC-ESDF) that is pre-built in the robot body frame and is capable of seamlessly applied to any-shape mobile robots, even for those with non-convex shapes. RC-ESDF enjoys lazy collision evaluation, which retains only the minimum information sufficient for whole-body safety constraint and significantly speeds up trajectory optimization. Based on the analytical gradients provided by RC-ESDF, we optimize the position and rotation of robot jointly, with whole-body safety, smoothness, and dynamical feasibility taken into account. Extensive simulation and real-world experiments verified the reliability and generalizability of our method.
Network compression is now a mature sub-field of neural network research: over the last decade, significant progress has been made towards reducing the size of models and speeding up inference, while maintaining the classification accuracy. However, many works have observed that focusing on just the overall accuracy can be misguided. E.g., it has been shown that mismatches between the full and compressed models can be biased towards under-represented classes. This raises the important research question, can we achieve network compression while maintaining "semantic equivalence" with the original network? In this work, we study this question in the context of the "long tail" phenomenon in computer vision datasets observed by Feldman, et al. They argue that memorization of certain inputs (appropriately defined) is essential to achieving good generalization. As compression limits the capacity of a network (and hence also its ability to memorize), we study the question: are mismatches between the full and compressed models correlated with the memorized training data? We present positive evidence in this direction for image classification tasks, by considering different base architectures and compression schemes.
In this article, we propose a 6N-dimensional stochastic differential equation (SDE), modelling the activity of N coupled populations of neurons in the brain. This equation extends the Jansen and Rit neural mass model, which has been introduced to describe human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Our contributions are threefold: First, we introduce this stochastic N-population model and construct a reliable and efficient numerical method for its simulation, extending a splitting procedure for one neural population. Second, we present a modified Sequential Monte Carlo Approximate Bayesian Computation (SMC-ABC) algorithm to infer both the continuous and the discrete model parameters, the latter describing the coupling directions within the network. The proposed algorithm further develops a previous reference-table acceptance rejection ABC method, initially proposed for the inference of one neural population. On the one hand, the considered SMC-ABC approach reduces the computational cost due to the basic acceptance-rejection scheme. On the other hand, it is designed to account for both marginal and coupled interacting dynamics, allowing to identify the directed connectivity structure. Third, we illustrate the derived algorithm on both simulated data and real multi-channel EEG data, aiming to infer the brain's connectivity structure during epileptic seizure. The proposed algorithm may be used for parameter and network estimation in other multi-dimensional coupled SDEs for which a suitable numerical simulation method can be derived.
Spiking Neural Networks (SNNs) have garnered widespread interest for their energy efficiency and brain-inspired event-driven properties. While recent methods like Spiking-YOLO have expanded the SNNs to more challenging object detection tasks, they often suffer from high latency and low detection accuracy, making them difficult to deploy on latency sensitive mobile platforms. Furthermore, the conversion method from Artificial Neural Networks (ANNs) to SNNs is hard to maintain the complete structure of the ANNs, resulting in poor feature representation and high conversion errors. To address these challenges, we propose two methods: timesteps compression and spike-time-dependent integrated (STDI) coding. The former reduces the timesteps required in ANN-SNN conversion by compressing information, while the latter sets a time-varying threshold to expand the information holding capacity. We also present a SNN-based ultra-low latency and high accurate object detection model (SUHD) that achieves state-of-the-art performance on nontrivial datasets like PASCAL VOC and MS COCO, with about remarkable 750x fewer timesteps and 30% mean average precision (mAP) improvement, compared to the Spiking-YOLO on MS COCO datasets. To the best of our knowledge, SUHD is the deepest spike-based object detection model to date that achieves ultra low timesteps to complete the lossless conversion.