Magnetic recording devices are still competitive in the storage density race thanks to new technologies such as two-dimensional magnetic recording (TDMR). Error-prone patterns where a bit is surrounded by complementary bits at the four positions with Manhattan distance $1$ on the TDMR grid are called plus isolation (PIS) patterns. Recently, we introduced optimal plus LOCO (OP-LOCO) codes that prevent these patterns from being written. However, as the device ages, error-prone patterns where a bit is surrounded by complementary bits at only three positions with Manhattan distance $1$ emerge, and we call these incomplete PIS (IPIS) patterns. In this paper, we present capacity-achieving codes that forbid both PIS and IPIS patterns in TDMR systems with wide read heads. We collectively call these patterns rotated T isolation (RTIS) patterns, and we call the new codes optimal T LOCO (OT-LOCO) codes. We analyze OT-LOCO codes and derive their encoding-decoding rule. Simulation results demonstrate that OT-LOCO codes entirely eliminate media noise at practical TD densities. We suggest using OP-LOCO codes early in the device lifetime, then reconfiguring to OT-LOCO codes later on. Moreover, we introduce another coding scheme to remove RTIS patterns which offers lower complexity, lower error propagation, and track separation.
Face morphing attack detection is emerging as an increasingly challenging problem owing to advancements in high-quality and realistic morphing attack generation. Reliable detection of morphing attacks is essential because these attacks are targeted for border control applications. This paper presents a multispectral framework for differential morphing-attack detection (D-MAD). The D-MAD methods are based on using two facial images that are captured from the ePassport (also called the reference image) and the trusted device (for example, Automatic Border Control (ABC) gates) to detect whether the face image presented in ePassport is morphed. The proposed multispectral D-MAD framework introduce a multispectral image captured as a trusted capture to acquire seven different spectral bands to detect morphing attacks. Extensive experiments were conducted on the newly created Multispectral Morphed Datasets (MSMD) with 143 unique data subjects that were captured using both visible and multispectral cameras in multiple sessions. The results indicate the superior performance of the proposed multispectral framework compared to visible images.
Precise relative navigation is a critical enabler for distributed satellites to achieve new mission objectives impossible for a monolithic spacecraft. Carrier phase differential GPS (CDGPS) with integer ambiguity resolution (IAR) is a promising means of achieving cm-level accuracy for high-precision Rendezvous, Proximity-Operations and Docking (RPOD), In-Space Servicing, Assembly and Manufacturing (ISAM) as well as satellite formation flying and swarming. However, IAR is sensitive to received GPS signal noise, especially under severe multi-path or high thermal noise. This paper proposes a sensor-fusion approach to achieve IAR under such conditions in two coupling stages. A loose coupling stage fuses through an Extended Kalman Filter the CDGPS measurements with on-board sensor measurements such as range from cross-links, and vision-based bearing angles. A second tight-coupling stage augments the cost function of the integer weighted least-squares minimization with a soft constraint function using noise-weighted observed-minus-computed residuals from these external sensor measurements. Integer acceptance tests are empirically modified to reflect added constraints. Partial IAR is applied to graduate integer fixing. These proposed techniques are packaged into flight-capable software, with ground truths simulated by the Stanford Space Rendezvous Laboratory's S3 library using state-of-the-art force modelling with relevant sources of errors, and validated in two scenarios: (1) a high multi-path scenario involving rendezvous and docking in low Earth orbit, and (2) a high thermal noise scenario relying only on GPS side-lobe signals during proximity operations in geostationary orbit. This study demonstrates successful IAR in both cases, using the proposed sensor-fusion approach, thus demonstrating potential for high-precision state estimation under adverse signal-to-noise conditions.
In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data. Annotation projection has often been formulated as the task of transporting, on parallel corpora, the labels pertaining to a given span in the source language into its corresponding span in the target language. In this paper we present T-Projection, a novel approach for annotation projection that leverages large pretrained text-to-text language models and state-of-the-art machine translation technology. T-Projection decomposes the label projection task into two subtasks: (i) A candidate generation step, in which a set of projection candidates using a multilingual T5 model is generated and, (ii) a candidate selection step, in which the generated candidates are ranked based on translation probabilities. We conducted experiments on intrinsic and extrinsic tasks in 5 Indo-European and 8 low-resource African languages. We demostrate that T-projection outperforms previous annotation projection methods by a wide margin. We believe that T-Projection can help to automatically alleviate the lack of high-quality training data for sequence labeling tasks. Code and data are publicly available.
Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are anticipated to play crucial roles in the development of 5G-and-beyond wireless networks and the Internet of Things (IoT). In this paper, we propose a secure SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay, periodically receiving and relaying small data packets from a remote IoT device to its receiver in two hops with strict latency requirements, in the presence of an eavesdropper. This system requires careful optimization of important design parameters, such as the coding blocklengths of both hops, transmit powers, and the UAV's trajectory. While the overall optimization problem is nonconvex, we tackle it by applying a block successive convex approximation (BSCA) approach to divide the original problem into three subproblems and solve them separately. Then, an overall iterative algorithm is proposed to obtain the final design with guaranteed convergence. Our proposed low-complexity algorithm incorporates robust trajectory design and resource management to optimize the effective average secrecy throughput of the communication system over the course of the UAV-relay's mission. Simulation results demonstrate significant performance improvements compared to various benchmark schemes and provide useful design insights on the coding blocklengths and transmit powers along the trajectory of the UAV.
In the context of IoT deployments, a multitude of devices concurrently require network access to transmit data over a shared communication channel. Employing symmetric strategies can effectively facilitate the collaborative use of the communication medium among these devices. By adopting such strategies, devices collectively optimize their transmission parameters, resulting in minimized collisions and enhanced overall network throughput. Our primary focus centers on the formulation of symmetric (i.e., identical) strategies for the sensors, aiming to optimize a finite horizon team objective. The imposition of symmetric strategies introduces novel facets and complexities into the team problem. To address this, we embrace the common information approach and adapt it to accommodate the use of symmetric strategies. This adaptation yields a dynamic programming framework grounded in common information, wherein each step entails the minimization of a single function mapping from an agent's private information space to the space of probability distributions over possible actions. Our proposed policy/method incurs a reduced cumulative cost compared to other methods employing symmetric strategies, a point substantiated by our simulation results.
With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of frames, resulting in the inability to generate high-fidelity long videos during inference. Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. 1) We first analyze the impact of initial noise in video diffusion models. Then building upon the observation of noise, we propose FreeNoise, a tuning-free and time-efficient paradigm to enhance the generative capabilities of pretrained video diffusion models while preserving content consistency. Specifically, instead of initializing noises for all frames, we reschedule a sequence of noises for long-range correlation and perform temporal attention over them by window-based function. 2) Additionally, we design a novel motion injection method to support the generation of videos conditioned on multiple text prompts. Extensive experiments validate the superiority of our paradigm in extending the generative capabilities of video diffusion models. It is noteworthy that compared with the previous best-performing method which brought about 255% extra time cost, our method incurs only negligible time cost of approximately 17%. Generated video samples are available at our website: //haonanqiu.com/projects/FreeNoise.html.
With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to control the compression errors with a specified error bound. The existing error-bounded lossy compressors, however, are all developed based on inflexible designs or compression pipelines, which cannot adapt to diverse compression quality requirements/metrics favored by different application users. In this paper, we propose a novel dynamic quality metric oriented error-bounded lossy compression framework, namely QoZ. The detailed contribution is three-fold. (1) We design a novel highly-parameterized multi-level interpolation-based data predictor, which can significantly improve the overall compression quality with the same compressed size. (2) We design the error-bounded lossy compression framework QoZ based on the adaptive predictor, which can auto-tune the critical parameters and optimize the compression result according to user-specified quality metrics during online compression. (3) We evaluate QoZ carefully by comparing its compression quality with multiple state-of-the-arts on various real-world scientific application datasets. Experiments show that, compared with the second-best lossy compressor, QoZ can achieve up to 70% compression ratio improvement under the same error bound, up to 150% compression ratio improvement under the same PSNR, or up to 270% compression ratio improvement under the same SSIM.
Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that exhibit covariate shifts with respect to the training distribution. A general interventional data augmentation (IDA)mechanism that simulates arbitrary interventions over spurious variables has often been conjectured as a theoretical solution to this problem and approximated to varying degrees of success. In this work, we study how well modern Text-to-Image (T2I) generators and associated image editing techniques can solve the problem of IDA. We experiment across a diverse collection of benchmarks in domain generalization, ablating across key dimensions of T2I generation, including interventional prompts, conditioning mechanisms, and post-hoc filtering, showing that it substantially outperforms previously state-of-the-art image augmentation techniques independently of how each dimension is configured. We discuss the comparative advantages of using T2I for image editing versus synthesis, also finding that a simple retrieval baseline presents a surprisingly effective alternative, which raises interesting questions about how generative models should be evaluated in the context of domain generalization.
3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, which typically demands high-resolution 3D feature grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we introduce Act3D, a manipulation policy transformer that represents the robot's workspace using a 3D feature field with adaptive resolutions dependent on the task at hand. The model lifts 2D pre-trained features to 3D using sensed depth, and attends to them to compute features for sampled 3D points. It samples 3D point grids in a coarse to fine manner, featurizes them using relative-position attention, and selects where to focus the next round of point sampling. In this way, it efficiently computes 3D action maps of high spatial resolution. Act3D sets a new state-of-the-art in RL-Bench, an established manipulation benchmark, where it achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLBench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. We quantify the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions in ablative experiments. Code and videos are available on our project website: //act3d.github.io/.
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.