亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sound event detection (SED) is an active area of audio research that aims to detect the temporal occurrence of sounds. In this paper, we apply SED to engine fault detection by introducing a multimodal SED framework that detects fine-grained engine faults of automobile engines using audio and accelerometer-recorded vibration. We first introduce the problem of engine fault SED on a dataset collected from a large variety of vehicles with expertly-labeled engine fault sound events. Next, we propose a SED model to temporally detect ten fine-grained engine faults that occur within vehicle engines and further explore a pretraining strategy using a large-scale weakly-labeled engine fault dataset. Through multiple evaluations, we show our proposed framework is able to effectively detect engine fault sound events. Finally, we investigate the interaction and characteristics of each modality and show that fusing features from audio and vibration improves overall engine fault SED capabilities.

相關內容

《工程》是中國工程院(CAE)于2015年推出的國際開放存取期刊。其目的是提供一個高水平的平臺,傳播和分享工程研發的前沿進展、當前主要研究成果和關鍵成果;報告工程科學的進展,討論工程發展的熱點、興趣領域、挑戰和前景,在工程中考慮人與環境的福祉和倫理道德,鼓勵具有深遠經濟和社會意義的工程突破和創新,使之達到國際先進水平,成為新的生產力,從而改變世界,造福人類,創造新的未來。 期刊鏈接: · Backbone · 設計 · Performer · Continuity ·
2024 年 4 月 29 日

To realize a global quantum Internet, there is a need for communication between quantum subnetworks. To accomplish this task, there have been multiple design proposals for a quantum backbone network and quantum subnetworks. In this work, we elaborate on the design that uses entanglement and quantum teleportation to build the quantum backbone between packetized quantum networks. We design a network interface to interconnect packetized quantum networks with entanglement-based quantum backbone networks and, moreover, design a scheme to accomplish data transmission over this hybrid quantum network model. We analyze the use of various implementations of the backbone network, focusing our study on backbone networks that use satellite links to continuously distribute entanglement resources. For feasibility, we analyze various system parameters via simulation to benchmark the performance of the overall network.

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

Smishing, also known as SMS phishing, is a type of fraudulent communication in which an attacker disguises SMS communications to deceive a target into providing their sensitive data. Smishing attacks use a variety of tactics; however, they have a similar goal of stealing money or personally identifying information (PII) from a victim. In response to these attacks, a wide variety of anti-smishing tools have been developed to block or filter these communications. Despite this, the number of phishing attacks continue to rise. In this paper, we developed a test bed for measuring the effectiveness of popular anti-smishing tools against fresh smishing attacks. To collect fresh smishing data, we introduce Smishtank.com, a collaborative online resource for reporting and collecting smishing data sets. The SMS messages were validated by a security expert and an in-depth qualitative analysis was performed on the collected messages to provide further insights. To compare tool effectiveness, we experimented with 20 smishing and benign messages across 3 key segments of the SMS messaging delivery ecosystem. Our results revealed significant room for improvement in all 3 areas against our smishing set. Most anti-phishing apps and bulk messaging services didn't filter smishing messages beyond the carrier blocking. The 2 apps that blocked the most smish also blocked 85-100\% of benign messages. Finally, while carriers did not block any benign messages, they were only able to reach a 25-35\% blocking rate for smishing messages. Our work provides insights into the performance of anti-smishing tools and the roles they play in the message blocking process. This paper would enable the research community and industry to be better informed on the current state of anti-smishing technology on the SMS platform.

Causal inference methods for observational data are highly regarded due to their wide applicability. While there are already numerous methods available for de-confounding bias, these methods generally assume that covariates consist solely of confounders or make naive assumptions about the covariates. Such assumptions face challenges in both theory and practice, particularly when dealing with high-dimensional covariates. Relaxing these naive assumptions and identifying the confounding covariates that truly require correction can effectively enhance the practical significance of these methods. Therefore, this paper proposes a General Causal Inference (GCI) framework specifically designed for cross-sectional observational data, which precisely identifies the key confounding covariates and provides corresponding identification algorithm. Specifically, based on progressive derivations of the Markov property on Directed Acyclic Graph, we conclude that the key confounding covariates are equivalent to the common root ancestors of the treatment and the outcome variable. Building upon this conclusion, the GCI framework is composed of a novel Ancestor Set Identification (ASI) algorithm and de-confounding inference methods. Firstly, the ASI algorithm is theoretically supported by the conditional independence properties and causal asymmetry between variables, enabling the identification of key confounding covariates. Subsequently, the identified confounding covariates are used in the de-confounding inference methods to obtain unbiased causal effect estimation, which can support informed decision-making. Extensive experiments on synthetic datasets demonstrate that the GCI framework can effectively identify the critical confounding covariates and significantly improve the precision, stability, and interpretability of causal inference in observational studies.

We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. We investigate coded CDMA schemes where each user's information is encoded via a linear code before being modulated with a signature sequence. We propose an efficient approximate message passing (AMP) decoder that can be tailored to the structure of the linear code, and provide an exact asymptotic characterization of its performance. Based on this result, we consider a decoder that integrates AMP and belief propagation and characterize the tradeoff between spectral efficiency and signal-to-noise ratio, for a given target error rate. Simulation results are provided to demonstrate the benefits of the concatenated scheme at finite lengths.

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can learn to manipulate articulated objects without demonstrations. We combine the strengths of 2D segmentation and 3D RL to improve the efficiency of RL policy training. To improve the stability of the policy on real robots, we design a Frame-consistent Uncertainty-aware Sampling (FUS) strategy to get a condensed and hierarchical 3D representation. In addition, a single versatile RL policy can be trained on multiple articulated object manipulation tasks simultaneously in simulation and shows great generalizability to novel categories and instances. Experimental results demonstrate the effectiveness of our framework in both simulation and real-world settings. Our code is available at //github.com/THU-VCLab/Part-Guided-3D-RL-for-Sim2Real-Articulated-Object-Manipulation.

Tremendous efforts have been devoted to automating software debugging, a time-consuming process involving fault localization and repair generation. Recently, Large Language Models (LLMs) have shown great potential in automated debugging. However, we identified three challenges posed to traditional and LLM-based debugging tools: 1) the upstream imperfection of fault localization affects the downstream repair, 2) the deficiency in handling complex logic errors, and 3) the ignorance of program contexts. In this context, we propose the first automated, unified debugging framework, FixAgent, via LLM agent synergy. FixAgent can perform end-to-end localization, repair, and analysis of bugs. Our insight is that LLMs can benefit from general software engineering principles recognized by human developers in debugging, such as rubber duck debugging, enabling a better understanding of program functionality and logic bugs. Hence, we create three designs inspired by rubber ducking to address these challenges. They are agent specialization and synergy, key variable tracking, and program context comprehension, which request LLMs to provide explicit explanations and force them to focus on crucial program logic information. Experiments on the widely used dataset QuixBugs show that FixAgent correctly fixes 79 out of 80 bugs, 9 of which have never been fixed. It also plausibly patches 1.9X more defects than the best-performing repair tool on CodeFlaws, even with no bug location information and fewer than 0.6% sampling times. On average, FixAgent increases about 20% plausible and correct fixes compared to its base model using different LLMs, showing the effectiveness of our designs. Moreover, the correctness rate of FixAgent reaches remarkably 97.26%, indicating that FixAgent can potentially overcome the overfitting issue of the existing approaches.

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.

北京阿比特科技有限公司