In numerical simulations of cardiac mechanics, coupling the heart to a model of the circulatory system is essential for capturing physiological cardiac behavior. A popular and efficient technique is to use an electrical circuit analogy, known as a lumped parameter network or zero-dimensional (0D) fluid model, to represent blood flow throughout the cardiovascular system. Due to the strong physical interaction between the heart and the blood circulation, developing accurate and efficient numerical coupling methods remains an active area of research. In this work, we present a modular framework for implicitly coupling three-dimensional (3D) finite element simulations of cardiac mechanics to 0D models of blood circulation. The framework is modular in that the circulation model can be modified independently of the 3D finite element solver, and vice versa. The numerical scheme builds upon a previous work that combines 3D blood flow models with 0D circulation models (3D fluid - 0D fluid). Here, we extend it to couple 3D cardiac tissue mechanics models with 0D circulation models (3D structure - 0D fluid), showing that both mathematical problems can be solved within a unified coupling scheme. The effectiveness, temporal convergence, and computational cost of the algorithm are assessed through multiple examples relevant to the cardiovascular modeling community. Importantly, in an idealized left ventricle example, we show that the coupled model yields physiological pressure-volume loops and naturally recapitulates the isovolumic contraction and relaxation phases of the cardiac cycle without any additional numerical techniques. Furthermore, we provide a new derivation of the scheme inspired by the Approximate Newton Method of Chan (1985), explaining how the proposed numerical scheme combines the stability of monolithic approaches with the modularity and flexibility of partitioned approaches.
With the rapid advancements of sensor technology and deep learning, autonomous driving systems are providing safe and efficient access to intelligent vehicles as well as intelligent transportation. Among these equipped sensors, the radar sensor plays a crucial role in providing robust perception information in diverse environmental conditions. This review focuses on exploring different radar data representations utilized in autonomous driving systems. Firstly, we introduce the capabilities and limitations of the radar sensor by examining the working principles of radar perception and signal processing of radar measurements. Then, we delve into the generation process of five radar representations, including the ADC signal, radar tensor, point cloud, grid map, and micro-Doppler signature. For each radar representation, we examine the related datasets, methods, advantages and limitations. Furthermore, we discuss the challenges faced in these data representations and propose potential research directions. Above all, this comprehensive review offers an in-depth insight into how these representations enhance autonomous system capabilities, providing guidance for radar perception researchers. To facilitate retrieval and comparison of different data representations, datasets and methods, we provide an interactive website at //radar-camera-fusion.github.io/radar.
Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One possible way to help bridge this gap be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference.
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$\times$800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods. More demos and code are available at //guanjunwu.github.io/4dgs/.
Masked reconstruction serves as a fundamental pretext task for self-supervised learning, enabling the model to enhance its feature extraction capabilities by reconstructing the masked segments from extensive unlabeled data. In human activity recognition, this pretext task employed a masking strategy centered on the time dimension. However, this masking strategy fails to fully exploit the inherent characteristics of wearable sensor data and overlooks the inter-channel information coupling, thereby limiting its potential as a powerful pretext task. To address these limitations, we propose a novel masking strategy called Channel Masking. It involves masking the sensor data along the channel dimension, thereby compelling the encoder to extract channel-related features while performing the masked reconstruction task. Moreover, Channel Masking can be seamlessly integrated with masking strategies along the time dimension, thereby motivating the self-supervised model to undertake the masked reconstruction task in both the time and channel dimensions. Integrated masking strategies are named Time-Channel Masking and Span-Channel Masking. Finally, we optimize the reconstruction loss function to incorporate the reconstruction loss in both the time and channel dimensions. We evaluate proposed masking strategies on three public datasets, and experimental results show that the proposed strategies outperform prior strategies in both self-supervised and semi-supervised scenarios.
The extremely large-scale massive multiple-input multiple-output (XL-MIMO) has the potential to achieve boosted spectral efficiency and refined spatial resolution for future wireless networks. However, channel estimation for XL-MIMO is challenging since the large number of antennas results in high computational complexity with the near-field effect. In this letter, we propose a low-complexity sequential angle-distance channel estimation (SADCE) method for near-field XL-MIMO systems equipped with uniformly planar arrays (UPA). Specifically, we first successfully decouple the angle and distance parameters, which allows us to devise a two-dimensional discrete Fourier transform (2D-DFT) method for angle parameters estimation. Then, a low-complexity distance estimation method is proposed with a closed-form solution. Compared with existing methods, the proposed method achieves significant performance gain with noticeably reduced computational complexity.Numerical results verify the superiority of the proposed near-field channel estimation algorithm.
A fundamental question is whether one can maintain a maximum independent set in polylogarithmic update time for a dynamic collection of geometric objects in Euclidean space. Already, for a set of intervals, it is known that no dynamic algorithm can maintain an exact maximum independent set in sublinear update time. Therefore, the typical objective is to explore the trade-off between update time and solution size. Substantial efforts have been made in recent years to understand this question for various families of geometric objects, such as intervals, hypercubes, hyperrectangles, and fat objects. We present the first fully dynamic approximation algorithm for disks of arbitrary radii in the plane that maintains a constant-factor approximate maximum independent set in polylogarithmic expected amortized update time. Moreover, for a fully dynamic set of $n$ disks of unit radius in the plane, we show that a $12$-approximate maximum independent set can be maintained with worst-case update time $O(\log n)$, and optimal output-sensitive reporting. This result generalizes to fat objects of comparable sizes in any fixed dimension $d$, where the approximation ratio depends on the dimension and the fatness parameter. Further, we note that, even for a dynamic set of disks of unit radius in the plane, it is impossible to maintain $O(1+\varepsilon)$-approximate maximum independent set in truly sublinear update time, under standard complexity assumptions.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization. Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.