云南虫谷在线观看免费观看电视剧,欧美狂野视频一区国产精品,91精品久久久久久久久无码变态,久久躁天天狠狠夜夜婷婷,欧美激情一区二区三区免费

Foundation models, including Vision Language Models (VLMs) and Large Language Models (LLMs), possess the $generality$ to handle diverse distributions and tasks, which stems from their extensive pre-training datasets. The fine-tuning of foundation models is a common practice to enhance task performance or align the model's behavior with human expectations, allowing them to gain $speciality$. However, the small datasets used for fine-tuning may not adequately cover the diverse distributions and tasks encountered during pre-training. Consequently, the pursuit of speciality during fine-tuning can lead to a loss of {generality} in the model, which is related to catastrophic forgetting (CF) in deep learning. In this study, we demonstrate this phenomenon in both VLMs and LLMs. For instance, fine-tuning VLMs like CLIP on ImageNet results in a loss of generality in handling diverse distributions, and fine-tuning LLMs like Galactica in the medical domain leads to a loss in following instructions and common sense. To address the trade-off between the speciality and generality, we investigate multiple regularization methods from continual learning, the weight averaging method (Wise-FT) from out-of-distributional (OOD) generalization, which interpolates parameters between pre-trained and fine-tuned models, and parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA). Our findings show that both continual learning and Wise-ft methods effectively mitigate the loss of generality, with Wise-FT exhibiting the strongest performance in balancing speciality and generality.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 估計/估計量 · 混合 · 樣本 · 統計理論 ·

2023 年 11 月 17 日

Co-variance Operator of Banach Valued Random Elements: U-Statistic Approach

Suprio Bhar,Subhra Sankar Dhar

from arxiv, This revised version contains an updated literature review and an expanded appendix on some technical topics on Banach spaces

This article proposes a co-variance operator for Banach valued random elements using the concept of $U$-statistic. We then study the asymptotic distribution of the proposed co-variance operator along with related large sample properties. Moreover, specifically for Hilbert space valued random elements, the asymptotic distribution of the proposed estimator is derived even for dependent data under some mixing conditions.

相互獨立的 · 樣本復雜度 · 向量化 · CASE · 優化器 ·

2023 年 11 月 16 日

Dual Induction CLT for High-dimensional m-dependent Data

Heejong Bong,Arun Kumar Kuchibhotla,Alessandro Rinaldo

from arxiv, 25 pages

We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, matching the optimal rates established in the univariate case. When specialized to the sums of independent non-degenerate random vectors, we obtain sharp rates under the weakest possible conditions. On the technical side, we develop an inductive relationship between anti-concentration inequalities and Berry--Esseen bounds, inspired by the classical Lindeberg swapping method and the concentration inequality approach for dependent data, that may be of independent interest.

MoDELS · Performer · 模型性能 · 語言模型化 · 可辨認的 ·

2023 年 11 月 16 日

A Framework for Monitoring and Retraining Language Models in Real-World Applications

Jaykumar Kasundra,Claudia Schulz,Melicaalsadat Mirsafian,Stavroula Skylaki

In the Machine Learning (ML) model development lifecycle, training candidate models using an offline holdout dataset and identifying the best model for the given task is only the first step. After the deployment of the selected model, continuous model monitoring and model retraining is required in many real-world applications. There are multiple reasons for retraining, including data or concept drift, which may be reflected on the model performance as monitored by an appropriate metric. Another motivation for retraining is the acquisition of increasing amounts of data over time, which may be used to retrain and improve the model performance even in the absence of drifts. We examine the impact of various retraining decision points on crucial factors, such as model performance and resource utilization, in the context of Multilabel Classification models. We explain our key decision points and propose a reference framework for designing an effective model retraining strategy.

知識 (knowledge) · 自動問答 · 數據集 · Integration · 圖 ·

2023 年 11 月 16 日

FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives

Jiaju Chen,Yuxuan Lu,Shao Zhang,Bingsheng Yao,Yuanzhe Dong,Ying Xu,Yunyao Li,Qianwen Wang,Dakuo Wang,Yuling Sun

AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain.

有偏 · 語言模型化 · MoDELS · 神經元 · 可辨認的 ·

2023 年 11 月 16 日

CRISPR: Eliminating Bias Neurons from an Instruction-following Language Model

Nakyeong Yang,Taegwan Kang,Kyomin Jung

from arxiv, 5 pages, 1 figure

Large language models (LLMs) executing tasks through instruction-based prompts often face challenges stemming from distribution differences between user instructions and training instructions. This leads to distractions and biases, especially when dealing with inconsistent dynamic labels. In this paper, we introduces a novel bias mitigation method, CRISPR, designed to alleviate instruction-label biases in LLMs. CRISPR utilizes attribution methods to identify bias neurons influencing biased outputs and employs pruning to eliminate the bias neurons. Experimental results demonstrate the method's effectiveness in mitigating biases in instruction-based prompting, enhancing language model performance on social bias benchmarks without compromising pre-existing knowledge. CRISPR proves highly practical, model-agnostic, offering flexibility in adapting to evolving social biases.

正則化項 · 極大 · 路徑 · CASE · SICOMP ·

2023 年 11 月 15 日

Semantic Tree-Width and Path-Width of Conjunctive Regular Path Queries

Diego Figueira,Rémi Morvan

from arxiv, Journal version submitted to LMCS special issue (v3) of an ICDT'23 paper "Approximation and Semantic Tree-width of Conjunctive Regular Path Queries" (v2). 55 pages and 17 figures

We show that the problem of whether a query is equivalent to a query of tree-width $k$ is decidable, for the class of Unions of Conjunctive Regular Path Queries with two-way navigation (UC2RPQs). A previous result by Barcel\'o, Romero, and Vardi [SIAM Journal on Computing, 2016] has shown decidability for the case $k=1$, and here we extend this result showing that decidability in fact holds for any arbitrary $k\geq 1$. The algorithm is in 2ExpSpace, but for the restricted but practically relevant case where all regular expressions of the query are of the form $a^*$ or $(a_1 + \dotsb + a_n)$ we show that the complexity of the problem drops to $\Pi^P_2$. We also investigate the related problem of approximating a UC2RPQ by queries of small tree-width. We exhibit an algorithm which, for any fixed number $k$, builds the maximal under-approximation of tree-width $k$ of a UC2RPQ. The maximal under-approximation of tree-width $k$ of a query $q$ is a query $q'$ of tree-width $k$ which is contained in $q$ in a maximal and unique way, that is, such that for every query $q''$ of tree-width $k$, if $q''$ is contained in $q$ then $q''$ is also contained in $q'$. Our approach is shown to be robust, in the sense that it allows also to test equivalence with queries of a given path-width, it also covers the previously known result for $k=1$, and it allows to test for equivalence of whether a (one-way) UCRPQ is equivalent to a UCRPQ of a given tree-width (or path-width).

圖 · CASE · 邊 · STOC · CASES ·

2023 年 11 月 15 日

Counting Small Induced Subgraphs with Edge-monotone Properties

Simon D?ring,Dániel Marx,Philip Wellnitz

We study the parameterized complexity of #IndSub($\Phi$), where given a graph $G$ and an integer $k$, the task is to count the number of induced subgraphs on $k$ vertices that satisfy the graph property $\Phi$. Focke and Roth [STOC 2022] completely characterized the complexity for each $\Phi$ that is a hereditary property (that is, closed under vertex deletions): #IndSub($\Phi$) is #W[1]-hard except in the degenerate cases when every graph satisfies $\Phi$ or only finitely many graphs satisfy $\Phi$. We complement this result with a classification for each $\Phi$ that is edge monotone (that is, closed under edge deletions): #IndSub($\Phi$) is #W[1]-hard except in the degenerate case when there are only finitely many integers $k$ such that $\Phi$ is nontrivial on $k$-vertex graphs. Our result generalizes earlier results for specific properties $\Phi$ that are related to the connectivity or density of the graph. Further, we extend the #W[1]-hardness result by a lower bound which shows that #IndSub($\Phi$) cannot be solved in time $f(k) \cdot |V(G)|^{o(\sqrt{\log k/\log\log k})}$ for any function $f$, unless the Exponential-Time Hypothesis (ETH) fails. For many natural properties, we obtain even a tight bound $f(k) \cdot |V(G)|^{o(k)}$; for example, this is the case for every property $\Phi$ that is nontrivial on $k$-vertex graphs for each $k$ greater than some $k_0$.

相關系數 · 推斷 · 推薦系統 · 有向 · Taxonomy ·

2022 年 8 月 26 日

Causal Inference in Recommender Systems: A Survey and Future Directions

Chen Gao,Yu Zheng,Wenjie Wang,Fuli Feng,Xiangnan He,Yong Li

from arxiv, Under peer review

Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.

學成 · 深度學習 · 可辨認的 · MoDELS · 目標跟蹤 ·

2019 年 7 月 31 日

Deep Learning in Video Multi-Object Tracking: A Survey

Gioele Ciaparrone,Francisco Luque Sánchez,Siham Tabik,Luigi Troiano,Roberto Tagliaferri,Francisco Herrera

from arxiv, New in v2: corrected typos and various minor mistakes. Submitted to Neurocomputing. Main text: 25 pages, 5 figures, 6 tables. Summary table in appendix at the end of the paper

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.