The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems.
The existence of latent variables in practical problems is common, for example when some variables are difficult or expensive to measure, or simply unknown. When latent variables are unaccounted for, structure learning for Gaussian graphical models can be blurred by additional correlation between the observed variables that is incurred by the latent variables. A standard approach for this problem is a latent version of the graphical lasso that splits the inverse covariance matrix into a sparse and a low-rank part that are penalized separately. In this paper we propose a generalization of this via the flexible Golazo penalty. This allows us to introduce latent versions of for example the adaptive lasso, positive dependence constraints or predetermined sparsity patterns, and combinations of those. We develop an algorithm for the latent Gaussian graphical model with the Golazo penalty and demonstrate it on simulated and real data.
Gait asymmetry, a consequence of various neurological or physical conditions such as aging and stroke, detrimentally impacts bipedal locomotion, causing biomechanical alterations, increasing the risk of falls and reducing quality of life. Addressing this critical issue, this paper introduces a novel diagnostic method for gait symmetry analysis through the use of an assistive robotic Smart Walker equipped with an innovative asymmetry detection scheme. This method analyzes sensor measurements capturing the interaction torque between user and walker. By applying a seasonal-trend decomposition tool, we isolate gait-specific patterns within these data, allowing for the estimation of stride durations and calculation of a symmetry index. Through experiments involving 5 experimenters, we demonstrate the Smart Walker's capability in detecting and quantifying gait asymmetry by achieving an accuracy of 84.9% in identifying asymmetric cases in a controlled testing environment. Further analysis explores the classification of these asymmetries based on their underlying causes, providing valuable insights for gait assessment. The results underscore the potential of the device as a precise, ready-to-use monitoring tool for personalized rehabilitation, facilitating targeted interventions for enhanced patient outcomes.
We consider solving a probably ill-conditioned linear operator equation, where the operator is not modeled by physical laws but is specified via training pairs (consisting of images and data) of the input-output relation of the operator. We derive a stable method for computing the operator, which consists of first a Gram-Schmidt orthonormalization of images and a principal component analysis of the data. This two-step algorithm provides a spectral decomposition of the linear operator. Moreover, we show that both Gram-Schmidt and principal component analysis can be written as a deep neural network, which relates this procedure to de-and encoder networks. Therefore, we call the two-step algorithm a linear algebra network. Finally, we provide numerical simulations showing the strategy is feasible for reconstructing spectral functions and for solving operator equations without explicitly exploiting the physical model.
Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead to performance degradation due to the accumulation of errors. Moreover, reducing raw visual data to sparse keypoint representations significantly diminishes the density of information, resulting in the loss of fine-grained features. In this paper, we propose \textit{LiDAR-HMP}, the first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly. Building upon our novel structure-aware body feature descriptor, LiDAR-HMP adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results. Extensive experiments show that our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.
Scientific applications are starting to explore the viability of quantum computing. This exploration typically begins with quantum simulations that can run on existing classical platforms, albeit without the performance advantages of real quantum resources. In the context of high-performance computing (HPC), the incorporation of simulation software can often take advantage of the powerful resources to help scale-up the simulation size. The configuration, installation and operation of these quantum simulation packages on HPC resources can often be rather daunting and increases friction for experimentation by scientific application developers. We describe a framework to help streamline access to quantum simulation software running on HPC resources. This includes an interface for circuit-based quantum computing tasks, as well as the necessary resource management infrastructure to make effective use of the underlying HPC resources. The primary contributions of this work include a classification of different usage models for quantum simulation in an HPC context, a review of the software architecture for our approach and a detailed description of the prototype implementation to experiment with these ideas using two different simulators (TNQVM \& NWQ-Sim). We include initial experimental results running on the Frontier supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) using a synthetic workload generated via the SupermarQ quantum benchmarking framework.
We propose progressive radiance distillation, an inverse rendering method that combines physically-based rendering with Gaussian-based radiance field rendering using a distillation progress map. Taking multi-view images as input, our method starts from a pre-trained radiance field guidance, and distills physically-based light and material parameters from the radiance field using an image-fitting process. The distillation progress map is initialized to a small value, which favors radiance field rendering. During early iterations when fitted light and material parameters are far from convergence, the radiance field fallback ensures the sanity of image loss gradients and avoids local minima that attracts under-fit states. As fitted parameters converge, the physical model gradually takes over and the distillation progress increases correspondingly. In presence of light paths unmodeled by the physical model, the distillation progress never finishes on affected pixels and the learned radiance field stays in the final rendering. With this designed tolerance for physical model limitations, we prevent unmodeled color components from leaking into light and material parameters, alleviating relighting artifacts. Meanwhile, the remaining radiance field compensates for the limitations of the physical model, guaranteeing high-quality novel views synthesis. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques quality-wise in both novel view synthesis and relighting. The idea of progressive radiance distillation is not limited to Gaussian splatting. We show that it also has positive effects for prominently specular scenes when adapted to a mesh-based inverse rendering method.
Wireless Network-on-Chip (WNoC) is a promising concept which provides a solution to overcome the scalability issues in prevailing networks-in-package for many-core processors. However, the electromagnetic propagation inside the chip package leads to energy reverberation, resulting in Inter-Symbol Interference (ISI) with high delay spreads. Time Reversal (TR) is a technique that benefits the unique time-invariant channel with rich multipath effects to focus the energy to the desired transceiver. TR mitigates both ISI and co-channel interference, hence providing parallel communications in both space and time. Thus, TR is a versatile candidate to improve the aggregate bandwidth of wireless on-chip networks provided that a Medium Access Control (MAC) is used to efficiently share the wireless medium. In this paper, we explore a simple yet resilient TR-based MAC protocol (TR-MAC) design for WNoC. We propose to manage multiple parallel transmissions with simultaneous spatial channels in the same time slot with TR precoding and focused energy detection at the transceiver. Our results show that TR-MAC can be employed in massive computing architectures with improved latency and throughput while matching with the stringent requirements of the physical layer.
The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.
Digital twins (DTs), which are virtual environments that simulate, predict, and optimize the performance of their physical counterparts, hold great promise in revolutionizing next-generation wireless networks. While DTs have been extensively studied for wireless networks, their use in conjunction with autonomous vehicles featuring programmable mobility remains relatively under-explored. In this paper, we study DTs used as a development environment to design, deploy, and test artificial intelligence (AI) techniques that utilize real-world (RW) observations, e.g. radio key performance indicators, for vehicle trajectory and network optimization decisions in autonomous vehicle networks (AVN). We first compare and contrast the use of simulation, digital twin (software in the loop (SITL)), sandbox (hardware-in-the-loop (HITL)), and physical testbed (PT) environments for their suitability in developing and testing AI algorithms for AVNs. We then review various representative use cases of DTs for AVN scenarios. Finally, we provide an example from the NSF AERPAW platform where a DT is used to develop and test AI-aided solutions for autonomous unmanned aerial vehicles for localizing a signal source based solely on link quality measurements. Our results in the physical testbed show that SITL DTs, when supplemented with data from RW measurements and simulations, can serve as an ideal environment for developing and testing innovative AI solutions for AVNs.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.