久久久久精品电影_99久热这里精品免费观看_国内在线精品2021_欧美日韩国产一区二区三区在线视频_人人爱人人操人人射人人干_а∨天堂在线一本大道_毛片在线免费视频

This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. Moving away from the resource-intensive trends prevalent in recent literature, our method distills knowledge from a trained Conformer-based ASR model, achieving competitive performance on standard VSR benchmarks with significantly less resource utilization. Using unlabeled audio-visual data only, our baseline model achieves a word error rate (WER) of 47.4% and 54.7% on the LRS2 and LRS3 test benchmarks, respectively. After fine-tuning the model with limited labeled data, the word error rate reduces to 35% (LRS2) and 45.7% (LRS3). Our model can be trained on a single consumer-grade GPU within a few days and is capable of performing real-time end-to-end VSR on dated hardware, suggesting a path towards more accessible and resource-efficient VSR methodologies.

相關內容

語音識別(bie)

關注 753

語(yu)音(yin)識別是(shi)計(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)科學(xue)和計(ji)(ji)(ji)算(suan)(suan)語(yu)言學(xue)的(de)一(yi)個跨學(xue)科子領域，它發展了一(yi)些方法和技術，使(shi)計(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)可(ke)以將口語(yu)識別和翻譯成文(wen)本(ben)。它也被稱為自動語(yu)音(yin)識別（ASR），計(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)語(yu)音(yin)識別或語(yu)音(yin)轉(zhuan)文(wen)本(ben)（STT）。它整合了計(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)科學(xue)，語(yu)言學(xue)和計(ji)(ji)(ji)算(suan)(suan)機(ji)(ji)工程領域的(de)知(zhi)識和研究。

Automator · MoDELS · 語言模型化 · ROUGE · GitHub ·

2024 年 2 月 4 日

GIRT-Model: Automated Generation of Issue Report Templates

Nafiseh Nikeghbal,Amir Hossein Kargaran,Abbas Heydarnoori

from arxiv, Accepted to be published at the 21st IEEE/ACM International Conference on Mining Software Repositories (MSR 2024)

Platforms such as GitHub and GitLab introduce Issue Report Templates (IRTs) to enable more effective issue management and better alignment with developer expectations. However, these templates are not widely adopted in most repositories, and there is currently no tool available to aid developers in generating them. In this work, we introduce GIRT-Model, an assistant language model that automatically generates IRTs based on the developer's instructions regarding the structure and necessary fields. We create GIRT-Instruct, a dataset comprising pairs of instructions and IRTs, with the IRTs sourced from GitHub repositories. We use GIRT-Instruct to instruction-tune a T5-base model to create the GIRT-Model. In our experiments, GIRT-Model outperforms general language models (T5 and Flan-T5 with different parameter sizes) in IRT generation by achieving significantly higher scores in ROUGE, BLEU, METEOR, and human evaluation. Additionally, we analyze the effectiveness of GIRT-Model in a user study in which participants wrote short IRTs with GIRT-Model. Our results show that the participants find GIRT-Model useful in the automated generation of templates. We hope that through the use of GIRT-Model, we can encourage more developers to adopt IRTs in their repositories. We publicly release our code, dataset, and model at //github.com/ISE-Research/girt-model.

ChatGPT · MoDELS · Learning · 類別 · PubMed ·

2024 年 2 月 4 日

Detection of ChatGPT Fake Science with the xFakeBibs Learning Algorithm

Ahmed Abdeen Hamed,Xindong Wu

from arxiv, 14 pages, 6 figures, 4 tables, 2 algorithms

ChatGPT is becoming a new reality. In this paper, we demonstrate a method for distinguishing ChatGPT-generated publications from those produced by scientists. The objective of this work is to introduce a newly designed supervised network-driven algorithm that illustrates how to predict machine-generated content. The premise is that ChatGPT content exhibits behavior that is distinctive and can be set apart from scientific articles. The algorithm was trained and tested on three disease-specific publications, with each model constructed from 100 abstracts. Additionally, the algorithm underwent k-Folds calibration (depending on the availability of the data) to establish a lower-upper bound range of acceptance. The network training model of ChatGPT showed a lower number of nodes and a higher number of edges when compared with models of real article abstracts. The algorithm was executed in single-mode to predict the class of one type of dataset at a time and achieved >94%. It was also executed in multi-mode on mixed documents of ChatGPT and PubMed abstracts. The algorithm remarkably predicted real articles with a precision of 100% and, on rare occasions, 96%-98%. However, ChatGPT content was often misclassified as real publications with up to 88% accuracy in all datasets of the three diseases. Our results also showed that the year of publications mixed with ChatGPT-generated content may play a factor in detecting the correct class, where the older the publication, the better the prediction.

近似 · 模型評估 · CASE · 優化器 · 情景 ·

2024 年 2 月 3 日

Fine-Tuned Convex Approximations of Probabilistic Reachable Sets under Data-driven Uncertainties

Pengcheng Wu,Sonia Martinez,Jun Chen

This paper proposes a mechanism to fine-tune convex approximations of probabilistic reachable sets (PRS) of uncertain dynamic systems. We consider the case of unbounded uncertainties, for which it may be impossible to find a bounded reachable set of the system. Instead, we turn to find a PRS that bounds system states with high confidence. Our data-driven approach builds on a kernel density estimator (KDE) accelerated by a fast Fourier transform (FFT), which is customized to model the uncertainties and obtain the PRS efficiently. However, the non-convex shape of the PRS can make it impractical for subsequent optimal designs. Motivated by this, we formulate a mixed integer nonlinear programming (MINLP) problem whose solution result is an optimal $n$ sided convex polygon that approximates the PRS. Leveraging this formulation, we propose a heuristic algorithm to find this convex set efficiently while ensuring accuracy. The algorithm is tested on comprehensive case studies that demonstrate its near-optimality, accuracy, efficiency, and robustness. The benefits of this work pave the way for promising applications to safety-critical, real-time motion planning of uncertain dynamic systems.

Performer · Analysis · 優化器 · 約束 · QoS ·

2024 年 2 月 3 日

Active RIS Aided ISAC Systems: Beamforming Design and Performance Analysis

Zhiyuan Yu,Hong Ren,Cunhua Pan,Gui Zhou,Boshi Wang,Mianxiong Dong,Jiangzhou Wang

from arxiv, 17 pages,11 figures, accepted by IEEE TCOM.The manuscript has been revised to correct several typographical errors

This paper considers an active reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. We aim to maximize radar signal-to-interference-plus-noise-ratio (SINR) by jointly optimizing the beamforming matrix at the dual-function radar-communication (DFRC) base station (BS) and the reflecting coefficients at the active RIS subject to the quality of service (QoS) constraints of communication users (UE) and the transmit power constraints of active RIS and DFRC BS. To tackle the optimization problem, the majorization-minimization (MM) algorithm is applied to address the nonconvex radar SINR objective function, and the resulting quartic problem is solved by developing an semidefinite relaxation (SDR)-based approach. Moreover, we derive the scaling order of the radar SINR with a large number of reflecting elements. Next, the transmit power allocation problem and the deployment strategy of the active RIS are studied with a moderate number of reflecting elements. Finally, we validate the potential of the active RIS in ISAC systems compared to passive RIS. Additionally, we deliberate on several open problems that remain for future research.

BERT · Learning · 強化學習 · Performer · 黑盒子 ·

2024 年 2 月 1 日

BertRLFuzzer: A BERT and Reinforcement Learning Based Fuzzer

Piyush Jha,Joseph Scott,Jaya Sriram Ganeshna,Mudit Singh,Vijay Ganesh

We present a novel tool BertRLFuzzer, a BERT and Reinforcement Learning (RL) based fuzzer aimed at finding security vulnerabilities for Web applications. BertRLFuzzer works as follows: given a set of seed inputs, the fuzzer performs grammar-adhering and attack-provoking mutation operations on them to generate candidate attack vectors. The key insight of BertRLFuzzer is the use of RL with a BERT model as an agent to guide the fuzzer to efficiently learn grammar-adhering and attack-provoking mutation operators. In order to establish the efficacy of BertRLFuzzer we compare it against a total of 13 black box and white box fuzzers over a benchmark of 9 victim websites with over 16K LOC. We observed a significant improvement relative to the nearest competing tool in terms of time to first attack (54% less), new vulnerabilities found (17 new vulnerabilities), and attack rate (4.4% more attack vectors generated).

估計/估計量 · 多峰值 · Unstructured · 線性模型 · Performer ·

2024 年 2 月 1 日

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Sven Klaassen,Jan Teichert-Kluge,Philipp Bach,Victor Chernozhukov,Martin Spindler,Suhas Vijaykumar

This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.

INFORMS · 可行 · 可辨認的 · 預測器/決策函數 · Principle ·

2024 年 2 月 1 日

Distinguishing the Indistinguishable: Human Expertise in Algorithmic Prediction

Rohan Alur,Manish Raghavan,Devavrat Shah

We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

Performer · Performance · 目標檢測 · 可辨認的 · Extensibility ·

2024 年 2 月 1 日

MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection

Kemal Oksuz,Selim Kuzucu,Tom Joy,Puneet K. Dokania

Combining the strengths of many existing predictors to obtain a Mixture of Experts which is superior to its individual components is an effective way to improve the performance without having to develop new architectures or train a model from scratch. However, surprisingly, we find that na\"ively combining expert object detectors in a similar way to Deep Ensembles, can often lead to degraded performance. We identify that the primary cause of this issue is that the predictions of the experts do not match their performance, a term referred to as miscalibration. Consequently, the most confident detector dominates the final predictions, preventing the mixture from leveraging all the predictions from the experts appropriately. To address this, when constructing the Mixture of Experts, we propose to combine their predictions in a manner which reflects the individual performance of the experts; an objective we achieve by first calibrating the predictions before filtering and refining them. We term this approach the Mixture of Calibrated Experts and demonstrate its effectiveness through extensive experiments on 5 different detection tasks using a variety of detectors, showing that it: (i) improves object detectors on COCO and instance segmentation methods on LVIS by up to $\sim 2.5$ AP; (ii) reaches state-of-the-art on COCO test-dev with $65.1$ AP and on DOTA with $82.62$ $\mathrm{AP_{50}}$; (iii) outperforms single models consistently on recent detection tasks such as Open Vocabulary Object Detection.

Atom（文本編輯器） · Performer · 數據集 · 可理解性 · INTERACT ·

2024 年 2 月 1 日

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

Takuma Yagi,Misaki Ohashi,Yifei Huang,Ryosuke Furuta,Shungo Adachi,Toutai Mitsuyama,Yoichi Sato

In the development of science, accurate and reproducible documentation of the experimental process is crucial. Automatic recognition of the actions in experiments from videos would help experimenters by complementing the recording of experiments. Towards this goal, we propose FineBio, a new fine-grained video dataset of people performing biological experiments. The dataset consists of multi-view videos of 32 participants performing mock biological experiments with a total duration of 14.5 hours. One experiment forms a hierarchical structure, where a protocol consists of several steps, each further decomposed into a set of atomic operations. The uniqueness of biological experiments is that while they require strict adherence to steps described in each protocol, there is freedom in the order of atomic operations. We provide hierarchical annotation on protocols, steps, atomic operations, object locations, and their manipulation states, providing new challenges for structured activity understanding and hand-object interaction recognition. To find out challenges on activity understanding in biological experiments, we introduce baseline models and results on four different tasks, including (i) step segmentation, (ii) atomic operation detection (iii) object detection, and (iv) manipulated/affected object detection. Dataset and code are available from //github.com/aistairc/FineBio.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.