一级a视频免费一区二区,亚洲AV综合色无码国产精品区卡,成人在线一区二区

The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points.

相關內容

噪聲

關注 0

Continuity · MoDELS · HTTPS · Performance · 多樣性 ·

2024 年 1 月 25 日

Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

Jannis Vamvas,No?mi Aepli,Rico Sennrich

from arxiv, First Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)

Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments at //github.com/ZurichNLP/swiss-german-text-encoders

數據集 · Extensibility · Performer · HTTPS · Engineering ·

2024 年 1 月 25 日

Benchmarking the Sim-to-Real Gap in Cloth Manipulation

David Blanco-Mulero,Oriol Barbany,Gokhan Alcan,Adrià Colomé,Carme Torras,Ville Kyrki

from arxiv, Accepted to IEEE Robotics and Automation Letters (RA-L). 8 pages, 6 figures. Supplementary material available at //sites.google.com/view/cloth-sim2real-benchmark

Realistic physics engines play a crucial role for learning to manipulate deformable objects such as garments in simulation. By doing so, researchers can circumvent challenges such as sensing the deformation of the object in the realworld. In spite of the extensive use of simulations for this task, few works have evaluated the reality gap between deformable object simulators and real-world data. We present a benchmark dataset to evaluate the sim-to-real gap in cloth manipulation. The dataset is collected by performing a dynamic as well as a quasi-static cloth manipulation task involving contact with a rigid table. We use the dataset to evaluate the reality gap, computational time, and simulation stability of four popular deformable object simulators: MuJoCo, Bullet, Flex, and SOFA. Additionally, we discuss the benefits and drawbacks of each simulator. The benchmark dataset is open-source. Supplementary material, videos, and code, can be found at //sites.google.com/view/cloth-sim2real-benchmark.

知識 (knowledge) · Performer · MoDELS · Learning · 語言模型化 ·

2024 年 1 月 25 日

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods

Mohammed Sabry,Anya Belz

from arxiv, Accepted to Findings of EACL 2024. Camera ready version

As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.

目標函數 · 泛函 · 估計/估計量 · 噪聲 · SimPLe ·

2024 年 1 月 25 日

Theoretical Analysis of Explicit Averaging and Novel Sign Averaging in Comparison-Based Search

Daiki Morinaga,Youhei Akimoto

from arxiv, 13 pages, 1 figures

In black-box optimization, noise in the objective function is inevitable. Noise disrupts the ranking of candidate solutions in comparison-based optimization, possibly deteriorating the search performance compared with a noiseless scenario. Explicit averaging takes the sample average of noisy objective function values and is widely used as a simple and versatile noise-handling technique. Although it is suitable for various applications, it is ineffective if the mean is not finite. We theoretically reveal that explicit averaging has a negative effect on the estimation of ground-truth rankings when assuming stably distributed noise without a finite mean. Alternatively, sign averaging is proposed as a simple but robust noise-handling technique. We theoretically prove that the sign averaging estimates the order of the medians of the noisy objective function values of a pair of points with arbitrarily high probability as the number of samples increases. Its advantages over explicit averaging and its robustness are also confirmed through numerical experiments.

估計/估計量 · 線性的 · MoDELS · INFORMS · Analysis ·

2024 年 1 月 24 日

Bayesian Analysis of the Beta Regression Model Subject to Linear Inequality Restrictions with Application

Solmaz Seifollahi,Hossein Bevrani,Kristofer Mansson

from arxiv, 16 pages, 7 tables

ReRecent studies in machine learning are based on models in which parameters or state variables are bounded restricted. These restrictions are from prior information to ensure the validity of scientific theories or structural consistency based on physical phenomena. The valuable information contained in the restrictions must be considered during the estimation process to improve estimation accuracy. Many researchers have focused on linear regression models subject to linear inequality restrictions, but generalized linear models have received little attention. In this paper, the parameters of beta Bayesian regression models subjected to linear inequality restrictions are estimated. The proposed Bayesian restricted estimator, which is demonstrated by simulated studies, outperforms ordinary estimators. Even in the presence of multicollinearity, it outperforms the ridge estimator in terms of the standard deviation and the mean squared error. The results confirm that the proposed Bayesian restricted estimator makes sparsity in parameter estimating without using the regularization penalty. Finally, a real data set is analyzed by the new proposed Bayesian estimation method.

Integration · 可約的 · QoS · 評論員 · Networking ·

2024 年 1 月 24 日

Quantifying the Impact of Frame Preemption on Combined TSN Shapers

Rubi Debnath,Philipp Hortig,Luxi Zhao,Sebastian Steinhorst

from arxiv, Accepted in IEEE/IFIP Network Operations and Management Symposium (NOMS) 2024

Different scheduling mechanisms in Time Sensitive Networking (TSN) can be integrated together to design and support complex architectures with enhanced capabilities for mixed critical networks. Integrating Frame Preemption (FP) with Credit-Based Shaper (CBS) and Gate Control List (GCL) opens up different modes and configuration choices resulting in a complex evaluation of several possibilities and their impact on the Quality of Service (QoS). In this paper, we implement and quantify the integration of preemptive CBS with GCL by incorporating FP into the architecture. Our experiments show that the end-to-end delay of Audio Video Bridging (AVB) flows shaped by CBS reduces significantly (up to 40\%) when AVB flows are set to preemptable class. We further show that the jitter of Time Triggered (TT) traffic remains unaffected in "with Hold/Release" mode. Furthermore, we propose to introduce Guardband (GB) in the "without Hold/Release" to reduce the jitter of the TT flow. We compare all the different integration modes, starting with CBS with GCL, extending it further to FP. We evaluate all feasible combinations in both synthetic and realistic scenarios and offer recommendations for practical configuration methods.

估計/估計量 · 隨機梯度下降 · 推斷 · SGD · Markovian ·

2024 年 1 月 23 日

On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent

Rahul Singh,Abhinek Shukla,Dootika Vats

from arxiv, 43 pages, 8 figures

Stochastic gradient descent (SGD) is an estimation tool for large data employed in machine learning and statistics. Due to the Markovian nature of the SGD process, inference is a challenging problem. An underlying asymptotic normality of the averaged SGD (ASGD) estimator allows for the construction of a batch-means estimator of the asymptotic covariance matrix. Instead of the usual increasing batch-size strategy employed in ASGD, we propose a memory efficient equal batch-size strategy and show that under mild conditions, the estimator is consistent. A key feature of the proposed batching technique is that it allows for bias-correction of the variance, at no cost to memory. Since joint inference for high dimensional problems may be undesirable, we present marginal-friendly simultaneous confidence intervals, and show through an example how covariance estimators of ASGD can be employed in improved predictions.

MoDELS · 評論員 · 縮放 · 可理解性 · GPT-3 ·

2021 年 8 月 18 日

On the Opportunities and Risks of Foundation Models

Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri Chatterji,Annie Chen,Kathleen Creel,Jared Quincy Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Kohd,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang

from arxiv, Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

Performer · Machine Learning · 模型性能 · MoDELS · Processing（編程語言） ·

2021 年 8 月 2 日

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu,Luwei Xiao,Yixuan Sun,Junhang Zhang,Tianlong Ma,Liang He

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.

Faster R-CNN · domain shift · R-CNN · 目標檢測 · 可約的 ·

2018 年 3 月 8 日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Yuhua Chen,Wen Li,Christos Sakaridis,Dengxin Dai,Luc Van Gool

from arxiv, Accepted to CVPR 2018

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.