人人干人人摸人人操_亚洲国产一区二区精品91_尤物成人免费高清在线视频_欧洲美熟女乱AV在_亚洲日韩欧美三区在线视频_亚洲熟女一区二区精品成人_国产巨大爆乳在线观看

Chien-yu Huang,Wei-Chih Chen,Shu-wen Yang,Andy T. Liu,Chen-An Li,Yu-Xiang Lin,Wei-Cheng Tseng,Anuj Diwan,Yi-Jen Shih,Jiatong Shi,William Chen,Xuanjun Chen,Chi-Yuan Hsiao,Puyuan Peng,Shih-Heng Wang,Chun-Yi Kuan,Ke-Han Lu,Kai-Wei Chang,Chih-Kai Yang,Fabian Ritter-Gutierrez,Ming To Chuang,Kuan-Po Huang,Siddhant Arora,You-Kuan Lin,Eunjung Yeo,Kalvin Chang,Chung-Ming Chien,Kwanghee Choi,Cheng-Hsiu Hsieh,Yi-Cheng Lin,Chee-En Yu,I-Hsiang Chiu,Heitor R. Guimar?es,Jionghao Han,Tzu-Quan Lin,Tzu-Yuan Lin,Homu Chang,Ting-Wu Chang,Chun Wei Chen,Shou-Jen Chen,Yu-Hua Chen,Hsi-Chun Cheng,Kunal Dhawan,Jia-Lin Fang,Shi-Xin Fang,Kuan-Yu Fang Chiang,Chi An Fu,Hsien-Fu Hsiao,Ching Yu Hsu,Shao-Syuan Huang,Lee Chen Wei,Hsi-Che Lin,Hsuan-Hao Lin,Hsuan-Ting Lin,Jian-Ren Lin,Ting-Chun Liu,Li-Chun Lu,Tsung-Min Pai,Ankita Pasad,Shih-Yun Shan Kuan,Suwon Shon,Yuxun Tang,Yun-Shao Tsai,Jui-Chiang Wei,Tzu-Chieh Wei,Chengxi Wu,Dien-Ruei Wu,Chao-Han Huck Yang,Chieh-Chi Yang,Jia Qi Yip,Shao-Xiang Yuan,Vahid Noroozi,Zhehuai Chen,Haibin Wu,Karen Livescu,David Harwath,Shinji Watanabe,Hung-yi Lee

Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Segment Anything · 圖像分割 · 多樣性 · 張成子空間 ·

2024 年 12 月 20 日

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Jun Ma,Feifei Li,Sumin Kim,Reza Asakereh,Bao-Hiep Le,Dang-Khoa Nguyen-Vu,Alexander Pfefferle,Muxin Wei,Ruochen Gao,Donghang Lyu,Songxiao Yang,Lennart Purucker,Zdravko Marinov,Marius Staring,Haisheng Lu,Thuy Thanh Dao,Xincheng Ye,Zhi Li,Gianluca Brugnara,Philipp Vollmuth,Martha Foltyn-Dumitru,Jaeyoung Cho,Mustafa Ahmed Mahmutoglu,Martin Bendszus,Irada Pflüger,Aditya Rastogi,Dong Ni,Xin Yang,Guang-Quan Zhou,Kaini Wang,Nicholas Heller,Nikolaos Papanikolopoulos,Christopher Weight,Yubing Tong,Jayaram K Udupa,Cahill J. Patrick,Yaqi Wang,Yifan Zhang,Francisco Contijoch,Elliot McVeigh,Xin Ye,Shucheng He,Robert Haase,Thomas Pinetz,Alexander Radbruch,Inga Krause,Erich Kobler,Jian He,Yucheng Tang,Haichun Yang,Yuankai Huo,Gongning Luo,Kaisar Kushibar,Jandos Amankulov,Dias Toleshbayev,Amangeldi Mukhamejan,Jan Egger,Antonio Pepe,Christina Gsaxner,Gijs Luijten,Shohei Fujita,Tomohiro Kikuchi,Benedikt Wiestler,Jan S. Kirschke,Ezequiel de la Rosa,Federico Bolelli,Luca Lumetti,Costantino Grana,Kunpeng Xie,Guomin Wu,Behrus Puladi,Carlos Martín-Isla,Karim Lekadir,Victor M. Campello,Wei Shao,Wayne Brisbane,Hongxu Jiang,Hao Wei,Wu Yuan,Shuangle Li,Yuyin Zhou,Bo Wang

from arxiv, CVPR 2024 MedSAM on Laptop Competition Summary: //www.codabench.org/competitions/1847/

Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.

Performer · Learning · 情景 · 強化學習 · 可辨認的 ·

2024 年 12 月 20 日

Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Brett Barkley,David Fridovich-Keil

Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process -- the backbone of Dyna-style algorithms -- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.

MoDELS · 推斷 · 統計量 · 樣例 · 邊緣化 ·

2024 年 12 月 20 日

Prior-Posterior Derived-Predictive Consistency Checks for Post-Estimation Calculated Quantities of Interest (QOI-Check)

Holger Sennhenn-Reulen

With flexible modeling software - such as the probabilistic programming language Stan - growing in popularity, quantities of interest (QOIs) calculated post-estimation are increasingly desired and customly implemented, both by statistical software developers and applied scientists. Examples of QOI include the marginal expectation of a multilevel model with a non-linear link function, or an ANOVA decomposition of a bivariate regression spline. For this, the QOI-Check is introduced, a systematic approach to ensure proper calibration and correct interpretation of QOIs. It contributes to Bayesian Workflow, and aims to improve the interpretability and trust in post-estimation conclusions based on QOIs. The QOI-Check builds upon Simulation Based Calibration (SBC), and the Holdout Predictive Check (HPC). SBC verifies computational reliability of Bayesian inference algorithms by consistency check of posterior with prior when the posterior is estimated on prior-predicted data, while HPC ensures robust inference by assessing consistency of model predictions with holdout data. SBC and HPC are combined in QOI-Checking for validating post-estimation QOI calculation and interpretation in the context of a (hypothetical) population definition underlying the QOI.

知識 (knowledge) · MoDELS · 語言模型化 · Integration · Performer ·

2024 年 12 月 20 日

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

Brian J Chan,Chao-Ting Chen,Jui-Hung Cheng,Hen-Hsen Huang

Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources. However, RAG introduces challenges such as retrieval latency, potential errors in document selection, and increased system complexity. With the advent of large language models (LLMs) featuring significantly extended context windows, this paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses real-time retrieval. Our method involves preloading all relevant resources, especially when the documents or knowledge for retrieval are of a limited and manageable size, into the LLM's extended context and caching its runtime parameters. During inference, the model utilizes these preloaded parameters to answer queries without additional retrieval steps. Comparative analyses reveal that CAG eliminates retrieval latency and minimizes retrieval errors while maintaining context relevance. Performance evaluations across multiple benchmarks highlight scenarios where long-context LLMs either outperform or complement traditional RAG pipelines. These findings suggest that, for certain applications, particularly those with a constrained knowledge base, CAG provide a streamlined and efficient alternative to RAG, achieving comparable or superior results with reduced complexity.

MoDELS · Performer · 數據集 · Less · 穩健性 ·

2024 年 12 月 19 日

On the Use of Deep Learning Models for Semantic Clone Detection

Subroto Nag Pinku,Debajyoti Mondal,Chanchal K. Roy

from arxiv, Accepted at the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME 2024)

Detecting and tracking code clones can ease various software development and maintenance tasks when changes in a code fragment should be propagated over all its copies. Several deep learning-based clone detection models have appeared in the literature for detecting syntactic and semantic clones, widely evaluated with the BigCloneBench dataset. However, class imbalance and the small number of semantic clones make BigCloneBench less ideal for interpreting model performance. Researchers also use other datasets such as GoogleCodeJam, OJClone, and SemanticCloneBench to understand model generalizability. To overcome the limitations of existing datasets, the GPT-assisted semantic and cross-language clone dataset GPTCloneBench has been released. However, how these models compare across datasets remains unclear. In this paper, we propose a multi-step evaluation approach for five state-of-the-art clone detection models leveraging existing benchmark datasets, including GPTCloneBench, and using mutation operators to study model ability. Specifically, we examine three highly-performing single-language models (ASTNN, GMN, CodeBERT) on BigCloneBench, SemanticCloneBench, and GPTCloneBench, testing their robustness with mutation operations. Additionally, we compare them against cross-language models (C4, CLCDSA) known for detecting semantic clones. While single-language models show high F1 scores for BigCloneBench, their performance on SemanticCloneBench varies (up to 20%). Interestingly, the cross-language model (C4) shows superior performance (around 7%) on SemanticCloneBench over other models and performs similarly on BigCloneBench and GPTCloneBench. On mutation-based datasets, C4 has more robust performance (less than 1% difference) compared to single-language models, which show high variability.

MoDELS · 潛在 · state-of-the-art · 情景 · 設計 ·

2024 年 12 月 19 日

EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

Jianrong Zhang,Hehe Fan,Yi Yang

from arxiv, Project page: //jiro-zhang.github.io/EnergyMoGen/

Diffusion models, particularly latent diffusion models, have demonstrated remarkable success in text-driven human motion generation. However, it remains challenging for latent diffusion models to effectively compose multiple semantic concepts into a single, coherent motion sequence. To address this issue, we propose EnergyMoGen, which includes two spectrums of Energy-Based Models: (1) We interpret the diffusion model as a latent-aware energy-based model that generates motions by composing a set of diffusion models in latent space; (2) We introduce a semantic-aware energy model based on cross-attention, which enables semantic composition and adaptive gradient descent for text embeddings. To overcome the challenges of semantic inconsistency and motion distortion across these two spectrums, we introduce Synergistic Energy Fusion. This design allows the motion latent diffusion model to synthesize high-quality, complex motions by combining multiple energy terms corresponding to textual descriptions. Experiments show that our approach outperforms existing state-of-the-art models on various motion generation tasks, including text-to-motion generation, compositional motion generation, and multi-concept motion generation. Additionally, we demonstrate that our method can be used to extend motion datasets and improve the text-to-motion task.

Networking · Learning · 對抗學習 · 可辨認的 · 講稿 ·

2024 年 12 月 18 日

A Review of the Duality of Adversarial Learning in Network Intrusion: Attacks and Countermeasures

Shalini Saini,Anitha Chennamaneni,Babatunde Sawyerr

from arxiv, 23 pages, 2 figures, 5 tables

Deep learning solutions are instrumental in cybersecurity, harnessing their ability to analyze vast datasets, identify complex patterns, and detect anomalies. However, malevolent actors can exploit these capabilities to orchestrate sophisticated attacks, posing significant challenges to defenders and traditional security measures. Adversarial attacks, particularly those targeting vulnerabilities in deep learning models, present a nuanced and substantial threat to cybersecurity. Our study delves into adversarial learning threats such as Data Poisoning, Test Time Evasion, and Reverse Engineering, specifically impacting Network Intrusion Detection Systems. Our research explores the intricacies and countermeasures of attacks to deepen understanding of network security challenges amidst adversarial threats. In our study, we present insights into the dynamic realm of adversarial learning and its implications for network intrusion. The intersection of adversarial attacks and defenses within network traffic data, coupled with advances in machine learning and deep learning techniques, represents a relatively underexplored domain. Our research lays the groundwork for strengthening defense mechanisms to address the potential breaches in network security and privacy posed by adversarial attacks. Through our in-depth analysis, we identify domain-specific research gaps, such as the scarcity of real-life attack data and the evaluation of AI-based solutions for network traffic. Our focus on these challenges aims to stimulate future research efforts toward the development of resilient network defense strategies.

語言模型化 · MoDELS · 代碼 · CodeBERT · Engineering ·

2024 年 12 月 18 日

On the Compression of Language Models for Code: An Empirical Study on CodeBERT

Giordano d'Aloisio,Luca Traini,Federica Sarro,Antinisca Di Marco

Language models have proven successful across a wide range of software engineering tasks, but their significant computational costs often hinder their practical adoption. To address this challenge, researchers have begun applying various compression strategies to improve the efficiency of language models for code. These strategies aim to optimize inference latency and memory usage, though often at the cost of reduced model effectiveness. However, there is still a significant gap in understanding how these strategies influence the efficiency and effectiveness of language models for code. Here, we empirically investigate the impact of three well-known compression strategies -- knowledge distillation, quantization, and pruning -- across three different classes of software engineering tasks: vulnerability detection, code summarization, and code search. Our findings reveal that the impact of these strategies varies greatly depending on the task and the specific compression method employed. Practitioners and researchers can use these insights to make informed decisions when selecting the most appropriate compression strategy, balancing both efficiency and effectiveness based on their specific needs.

置信度 · Processing（編程語言） · 估計/估計量 · MoDELS · 泛函 ·

2024 年 12 月 18 日

Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling

Jinzong Dong,Zhaohui Jiang,Dong Pan,Haoyang Yu

from arxiv, Accepted by AAAI-25

Confidence calibration of classification models is a technique to estimate the true posterior probability of the predicted class, which is critical for ensuring reliable decision-making in practical applications. Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data or fit a user-defined calibration function, but often overlook fully mining and utilizing the prior distribution behind the calibration curve. However, a well-informed prior distribution can provide valuable insights beyond the empirical data under the limited data or low-density regions of confidence scores. To fill this gap, this paper proposes a new method that integrates the prior distribution behind the calibration curve with empirical data to estimate a continuous calibration curve, which is realized by modeling the sampling process of calibration data as a binomial process and maximizing the likelihood function of the binomial process. We prove that the calibration curve estimating method is Lipschitz continuous with respect to data distribution and requires a sample size of $3/B$ of that required for histogram binning, where $B$ represents the number of bins. Also, a new calibration metric ($TCE_{bpm}$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed. $TCE_{bpm}$ is proven to be a consistent calibration measure. Furthermore, realistic calibration datasets can be generated by the binomial process modeling from a preset true calibration curve and confidence score distribution, which can serve as a benchmark to measure and compare the discrepancy between existing calibration metrics and the true calibration error. The effectiveness of our calibration method and metric are verified in real-world and simulated data.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.