Estimating tissue parameter maps with high accuracy and precision from highly undersampled measurements presents one of the major challenges in MR fingerprinting (MRF). Many existing works project the recovered voxel fingerprints onto the Bloch manifold to improve reconstruction performance. However, little research focuses on exploiting the latent manifold structure priors among fingerprints. To fill this gap, we propose a novel MRF reconstruction framework based on manifold structured data priors. Since it is difficult to directly estimate the fingerprint manifold structure, we model the tissue parameters as points on a low-dimensional parameter manifold. We reveal that the fingerprint manifold shares the same intrinsic topology as the parameter manifold, although being embedded in different Euclidean spaces. To exploit the non-linear and non-local redundancies in MRF data, we divide the MRF data into spatial patches, and the similarity measurement among data patches can be accurately obtained using the Euclidean distance between the corresponding patches in the parameter manifold. The measured similarity is then used to construct the graph Laplacian operator, which represents the fingerprint manifold structure. Thus, the fingerprint manifold structure is introduced in the reconstruction framework by using the low-dimensional parameter manifold. Additionally, we incorporate the locally low-rank prior in the reconstruction framework to further utilize the local correlations within each patch for improved reconstruction performance. We also adopt a GPU-accelerated NUFFT library to accelerate reconstruction in non-Cartesian sampling scenarios. Experimental results demonstrate that our method can achieve significantly improved reconstruction performance with reduced computational time over the state-of-the-art methods.
Channel Charting aims to construct a map of the radio environment by leveraging similarity relationships found in high-dimensional channel state information. Although resulting channel charts usually accurately represent local neighborhood relationships, even under conditions with strong multipath propagation, they often fall short in capturing global geometric features. On the other hand, classical model-based localization methods, such as triangulation and multilateration, can easily localize signal sources in the global coordinate frame. However, these methods rely heavily on the assumption of line-of-sight channels and distributed antenna deployments. Based on measured data, we compare classical source localization techniques to channel charts with respect to localization performance. We suggest and evaluate methods to enhance Channel Charting with model-based localization approaches: One approach involves using information derived from classical localization methods to map channel chart locations to physical positions after conventional training of the forward charting function. Foremost, though, we suggest to incorporate information from model-based approaches during the training of the forward charting function in what we call "augmented Channel Charting". We demonstrate that Channel Charting can outperform classical localization methods on the considered dataset.
Training Artificial Intelligence (AI) models on 3D images presents unique challenges compared to the 2D case: Firstly, the demand for computational resources is significantly higher, and secondly, the availability of large datasets for pre-training is often limited, impeding training success. This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images. Our method employs attention pooling to learn to assign each slice an importance weight and, by that, obtain a weighted average of all 2D slices. These weights directly quantify the contribution of each slice to the contribution and thus make the model prediction inspectable. We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods. Furthermore, we compare the in-built interpretability of our approach to HiResCam, a state-of-the-art retrospective interpretability approach.
This paper presents an aggressive trajectory tracking method for a small lightweight nano-quadrotor using nonlinear model predictive control (NMPC) based on acados. Controlling a nano quadrotor for accurate trajectory tracking at high speed in dynamic environments is challenging due to complex aerodynamic forces that introduce significant disturbances and large positional tracking errors. These aerodynamic effects are difficult to be identified and require feedback control that compensates for them in real time. NMPC allows the nano-quadrotor to control its motion in real time based on onboard sensor measurements, making it well-suited for tasks such as aggressive maneuvers and navigation in complex and dynamic environments. The software package acados enables the implementation of the NMPC algorithm on embedded systems, which is particularly important for nano-quadrotor due to its limited computational resources. Our autonomous navigation system is developed based on an AI-deck that is a GAP8-based parallel ultra-low power computing platform with onboard sensors of a multi-ranger deck and a flow deck. The proposed method of NMPC-based trajectory tracking control is tested in simulation and the results demonstrate its effectiveness in trajectory tracking while considering the dynamic environments. It is also tested on a real nano quadrotor hardware, 27-g Crazyflie 2.1, with a customized MCU running embedded NMPC, in which accurate trajectory tracking results are achieved in dynamic real-world environments.
Table Structure Recognition (TSR) aims at transforming unstructured table images into structured formats, such as HTML sequences. One type of popular solution is using detection models to detect components of a table, such as columns and rows, then applying a rule-based post-processing method to convert detection results into HTML sequences. However, existing detection-based studies often have the following limitations. First, these studies usually pay more attention to improving the detection performance, which does not necessarily lead to better performance regarding cell-level metrics, such as TEDS. Second, some solutions over-simplify the problem and can miss some critical information. Lastly, even though some studies defined the problem to detect more components to provide as much information as other types of solutions, these studies ignore the fact this problem definition is a multi-label detection because row, projected row header and column header can share identical bounding boxes. Besides, there is often a performance gap between two-stage and transformer-based detection models regarding the structure-only TEDS, even though they have similar performance regarding the COCO metrics. Therefore, we revisit the limitations of existing detection-based solutions, compare two-stage and transformer-based detection models, and identify the key design aspects for the success of a two-stage detection model for the TSR task, including the multi-class problem definition, the aspect ratio for anchor box generation, and the feature generation of the backbone network. We applied simple methods to improve these aspects of the Cascade R-CNN model, achieved state-of-the-art performance, and improved the baseline Cascade R-CNN model by 19.32%, 11.56% and 14.77% regarding the structure-only TEDS on SciTSR, FinTabNet, and PubTables1M datasets.
This research paper focuses on the integration of Artificial Intelligence (AI) into the currency trading landscape, positing the development of personalized AI models, essentially functioning as intelligent personal assistants tailored to the idiosyncrasies of individual traders. The paper posits that AI models are capable of identifying nuanced patterns within the trader's historical data, facilitating a more accurate and insightful assessment of psychological risk dynamics in currency trading. The PRI is a dynamic metric that experiences fluctuations in response to market conditions that foster psychological fragility among traders. By employing sophisticated techniques, a classifying decision tree is crafted, enabling clearer decision-making boundaries within the tree structure. By incorporating the user's chronological trade entries, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risks. The implications of this research extend beyond the confines of currency trading, reaching into the realms of other industries where the judicious application of personalized modeling emerges as an efficient and strategic approach. This paper positions itself at the intersection of cutting-edge technology and the intricate nuances of human psychology, offering a transformative paradigm for decision making support in dynamic and high-pressure environments.
Ray tracing (RT) is instrumental in 6G research in order to generate spatially-consistent and environment-specific channel impulse responses (CIRs). While acquiring accurate scene geometries is now relatively straightforward, determining material characteristics requires precise calibration using channel measurements. We therefore introduce a novel gradient-based calibration method, complemented by differentiable parametrizations of material properties, scattering and antenna patterns. Our method seamlessly integrates with differentiable ray tracers that enable the computation of derivatives of CIRs with respect to these parameters. Essentially, we approach field computation as a large computational graph wherein parameters are trainable akin to weights of a neural network (NN). We have validated our method using both synthetic data and real-world indoor channel measurements, employing a distributed multiple-input multiple-output (MIMO) channel sounder.
This work aims to provide an engagement decision support tool for Beyond Visual Range (BVR) air combat in the context of Defensive Counter Air (DCA) missions. In BVR air combat, engagement decision refers to the choice of the moment the pilot engages a target by assuming an offensive stance and executing corresponding maneuvers. To model this decision, we use the Brazilian Air Force's Aerospace Simulation Environment (\textit{Ambiente de Simula\c{c}\~ao Aeroespacial - ASA} in Portuguese), which generated 3,729 constructive simulations lasting 12 minutes each and a total of 10,316 engagements. We analyzed all samples by an operational metric called the DCA index, which represents, based on the experience of subject matter experts, the degree of success in this type of mission. This metric considers the distances of the aircraft of the same team and the opposite team, the point of Combat Air Patrol, and the number of missiles used. By defining the engagement status right before it starts and the average of the DCA index throughout the engagement, we create a supervised learning model to determine the quality of a new engagement. An algorithm based on decision trees, working with the XGBoost library, provides a regression model to predict the DCA index with a coefficient of determination close to 0.8 and a Root Mean Square Error of 0.05 that can furnish parameters to the BVR pilot to decide whether or not to engage. Thus, using data obtained through simulations, this work contributes by building a decision support system based on machine learning for BVR air combat.
Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. A network based on our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 --- it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code and models will be made publicly available.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext. We demonstrate that CipherGAN is capable of cracking language data enciphered using shift and Vigenere ciphers to a high degree of fidelity and for vocabularies much larger than previously achieved. We present how CycleGAN can be made compatible with discrete data and train in a stable way. We then prove that the technique used in CipherGAN avoids the common problem of uninformative discrimination associated with GANs applied to discrete data.