亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This study addresses the application of deep learning techniques in joint sound signal classification and localization networks. Current state-of-the-art sound source localization deep learning networks lack feature aggregation within their architecture. Feature aggregation enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. This is particularly important in SSL networks, which must differentiate direct and indirect acoustic signals. To address this gap, we adapt feature aggregation techniques from computer vision neural networks to signal detection neural networks. Additionally, we propose the Scale Encoding Network (SEN) for feature aggregation to encode features from various scales, compressing the network for more computationally efficient aggregation. To evaluate the efficacy of feature aggregation in SSL networks, we integrated the following computer vision feature aggregation sub-architectures into a SSL control architecture: Path Aggregation Network (PANet), Weighted Bi-directional Feature Pyramid Network (BiFPN), and SEN. These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. PANet and BiFPN are established aggregators in computer vision models, while the proposed SEN is a more compact aggregator. The results suggest that models incorporating feature aggregations outperformed the control model, the Sound Event Localization and Detection network (SELDnet), in both sound signal classification and localization. The feature aggregation techniques enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

The concept of fluid antennas (FAs) has emerged as a promising solution to enhance the spectral efficiency of wireless networks, achieved by introducing additional degrees of freedom, including reconfigurability and flexibility. In this paper, we investigate the use of index-modulated (IM) transmissions within the framework of FA systems, where an FA position is activated during each transmission interval. This approach is motivated by the common characteristics exhibited by FAs and IM transmissions, which entails the use of a single radio-frequency chain. From this perspective, we derive a closed-form expression for the bit error rate of IM-FAs considering spatial correlation, and demonstrating superior performance compared to conventional IM systems. To enhance the performance of IM-FAs in correlated conditions, channel coding techniques are applied. We first analyze a set partition coding (SPC) scheme for IM-FAs to spatially separate the FA ports, and provide a tight performance bound over correlated channels. Furthermore, the spatial SPC scheme is extended to turbo-coded modulation where the performance is analyzed for low and high signal-to-noise ratios. Our results reveal that through the implementation of channel coding techniques designed for FAs and IM transmission, the performance of coded IM-FAs exhibits notable enhancements, particularly in high correlation scenarios.

In millimeter-wave (mmWave) integrated sensing and communication networks, users may be within the coverage of multiple access points (AP), which typically employ large-scale antenna arrays to mitigate obstacle occlusion and path loss. However, large-scale arrays generate pencil-shaped beams, which necessitate a higher number of training beams to cover the desired space. Furthermore, as the antenna aperture increases, users are more likely to be situated in the near-field region of the AP antenna array. This motivates our investigation into the near-field beam training problem to achieve effective positioning services. To address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first construct a near-field single-beam training codebook for the uniform planar arrays. Then, the hash functions are chosen independently to construct the multi-arm beam training codebooks for each AP. All APs traverse the predefined multi-arm beam training codeword simultaneously and the multi-AP superimposed signals at the user are recorded. Finally, the soft decision and voting methods are applied to obtain the correctly aligned beams only based on the signal powers. In addition, we logically prove that the traversal complexity is at the logarithmic level. Simulation results show that our proposed near-field HMB training method can achieve 96.4% identification accuracy of the exhaustive beam training method and greatly reduce the training overhead. Furthermore, we verify its applicability under the far-field scenario as well.

We study energy-efficient offloading strategies in a large-scale MEC system with heterogeneous mobile users and network components. The system is considered with enabled user-task handovers that capture the mobility of various mobile users. We focus on a long-run objective and online algorithms that are applicable to realistic systems. The problem is significantly complicated by the large problem size, the heterogeneity of user tasks and network components, and the mobility of the users, for which conventional optimizers cannot reach optimum with a reasonable amount of computational and storage power. We formulate the problem in the vein of the restless multi-armed bandit process that enables the decomposition of high-dimensional state spaces and then achieves near-optimal algorithms applicable to realistically large problems in an online manner. Following the restless bandit technique, we propose two offloading policies by prioritizing the least marginal costs of selecting the corresponding computing and communication resources in the edge and cloud networks. This coincides with selecting the resources with the highest energy efficiency. Both policies are scalable to the offloading problem with a great potential to achieve proved asymptotic optimality - approach optimality as the problem size tends to infinity. With extensive numerical simulations, the proposed policies are demonstrated to clearly outperform baseline policies with respect to power conservation and robust to the tested heavy-tailed lifespan distributions of the offloaded tasks.

Dynamic graph neural networks (DyGNNs) currently struggle with handling distribution shifts that are inherent in dynamic graphs. Existing work on DyGNNs with out-of-distribution settings only focuses on the time domain, failing to handle cases involving distribution shifts in the spectral domain. In this paper, we discover that there exist cases with distribution shifts unobservable in the time domain while observable in the spectral domain, and propose to study distribution shifts on dynamic graphs in the spectral domain for the first time. However, this investigation poses two key challenges: i) it is non-trivial to capture different graph patterns that are driven by various frequency components entangled in the spectral domain; and ii) it remains unclear how to handle distribution shifts with the discovered spectral patterns. To address these challenges, we propose Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts (SILD), which can handle distribution shifts on dynamic graphs by capturing and utilizing invariant and variant spectral patterns. Specifically, we first design a DyGNN with Fourier transform to obtain the ego-graph trajectory spectrums, allowing the mixed dynamic graph patterns to be transformed into separate frequency components. We then develop a disentangled spectrum mask to filter graph dynamics from various frequency components and discover the invariant and variant spectral patterns. Finally, we propose invariant spectral filtering, which encourages the model to rely on invariant patterns for generalization under distribution shifts. Experimental results on synthetic and real-world dynamic graph datasets demonstrate the superiority of our method for both node classification and link prediction tasks under distribution shifts.

Motor imagery classification based on electroencephalography (EEG) signals is one of the most important brain-computer interface applications, although it needs further improvement. Several methods have attempted to obtain useful information from EEG signals by using recent deep learning techniques such as transformers. To improve the classification accuracy, this study proposes a novel EEG-based motor imagery classification method with three key features: generation of a topological map represented as a two-dimensional image from EEG signals with coordinate transformation based on t-SNE, use of the InternImage to extract spatial features, and use of spatiotemporal pooling inspired by PoolFormer to exploit spatiotemporal information concealed in a sequence of EEG images. Experimental results using the PhysioNet EEG Motor Movement/Imagery dataset showed that the proposed method achieved the best classification accuracy of 88.57%, 80.65%, and 70.17% on two-, three-, and four-class motor imagery tasks in cross-individual validation.

Public transportation systems often suffer from unexpected fluctuations in demand and disruptions, such as mechanical failures and medical emergencies. These fluctuations and disruptions lead to delays and overcrowding, which are detrimental to the passengers' experience and to the overall performance of the transit service. To proactively mitigate such events, many transit agencies station substitute (reserve) vehicles throughout their service areas, which they can dispatch to augment or replace vehicles on routes that suffer overcrowding or disruption. However, determining the optimal locations where substitute vehicles should be stationed is a challenging problem due to the inherent randomness of disruptions and due to the combinatorial nature of selecting locations across a city. In collaboration with the transit agency of Nashville, TN, we address this problem by introducing data-driven statistical and machine-learning models for forecasting disruptions and an effective randomized local-search algorithm for selecting locations where substitute vehicles are to be stationed. Our research demonstrates promising results in proactive disruption management, offering a practical and easily implementable solution for transit agencies to enhance the reliability of their services. Our results resonate beyond mere operational efficiency: by advancing proactive strategies, our approach fosters more resilient and accessible public transportation, contributing to equitable urban mobility and ultimately benefiting the communities that rely on public transportation the most.

Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions--cross-contrast (crossCon) and supervised temporal contrast (supTcon)--to train a multimodal model with a shared latent representation. This architecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recognition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Adjustment (LISA) significantly improves recognition accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA performs best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demonstrating that SSIs can be a viable alternative to automatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possibilities in human-computer interaction, demonstrating the potential of cross-modal approaches in noisy and data-limited regimes.

Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

北京阿比特科技有限公司