销魂美女一区二区三区AV_国产无遮挡又黄又爽不要VIP软_国产真实伦实例对白_A级毛片久久久久久精品_亚洲欧美日韩特级一区二区三区_在线精品亚洲欧美日韩国产_中文字幕无码AV免费观看

Yujia Qin,Shengding Hu,Yankai Lin,Weize Chen,Ning Ding,Ganqu Cui,Zheni Zeng,Yufei Huang,Chaojun Xiao,Chi Han,Yi Ren Fung,Yusheng Su,Huadong Wang,Cheng Qian,Runchu Tian,Kunlun Zhu,Shihao Liang,Xingyu Shen,Bokai Xu,Zhen Zhang,Yining Ye,Bowen Li,Ziwei Tang,Jing Yi,Yuzhang Zhu,Zhenning Dai,Lan Yan,Xin Cong,Yaxi Lu,Weilin Zhao,Yuxiang Huang,Junxi Yan,Xu Han,Xian Sun,Dahai Li,Jason Phang,Cheng Yang,Tongshuang Wu,Heng Ji,Zhiyuan Liu,Maosong Sun

Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models.

相關內容

TOOLS

關注 1

這個新版本的工具會議系列恢復了從1989年到2012年的50個會議的傳統。工具最初是“面向對象語言和系統的技術”，后來發展到包括軟件技術的所有創新方面。今天許多最重要的軟件概念都是在這里首次引入的。2019年TOOLS 50+1在俄羅斯喀山附近舉行，以同樣的創新精神、對所有與軟件相關的事物的熱情、科學穩健性和行業適用性的結合以及歡迎該領域所有趨勢和社區的開放態度，延續了該系列。官網鏈接： · 可交換的 · Omega · 相同 · 論文 ·

2023 年 8 月 7 日

Strong Byzantine Agreement with Adaptive Word Complexity

Pierre Civit,Seth Gilbert,Rachid Guerraoui,Jovan Komatovic,Manuel Vidigueira

The strong Byzantine agreement (SBA) problem is defined among n processes, out of which t < n can be faulty and behave arbitrarily. SBA allows correct (non-faulty) processes to agree on a common value. Moreover, if all correct processes have proposed the same value, only that value can be agreed upon. It has been known for a long time that any solution to the SBA problem incurs quadratic worst-case word complexity; additionally, the bound was known to be tight. However, no existing protocol achieves adaptive word complexity, where the number of exchanged words depends on the actual number of faults, and not on the upper bound. Therefore, it is still unknown whether SBA with adaptive word complexity exists. This paper answers the question in the affirmative. Namely, we introduce STRONG, a synchronous protocol that solves SBA among n = (2 + Omega(1))t + 1 processes and achieves adaptive word complexity. We show that the fundamental challenge of adaptive SBA lies in efficiently solving certification, the problem of obtaining a constant-sized, locally-verifiable proof that a value can safely be decided.

泛函 · 泛化理論 · 語言模型化 · MoDELS · 縮放 ·

2023 年 8 月 7 日

Studying Large Language Model Generalization with Influence Functions

Roger Grosse,Juhan Bae,Cem Anil,Nelson Elhage,Alex Tamkin,Amirhossein Tajdini,Benoit Steiner,Dustin Li,Esin Durmus,Ethan Perez,Evan Hubinger,Kamil? Luko?iūt?,Karina Nguyen,Nicholas Joseph,Sam McCandlish,Jared Kaplan,Samuel R. Bowman

from arxiv, 119 pages, 47 figures, 22 tables

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

Neural Networks · Networking · Networks · 優化器 · 可約的 ·

2023 年 8 月 6 日

Local Randomized Neural Networks Methods for Interface Problems

Yunlong Li,Fei Wang

from arxiv, 22 pages, 15 figures

Accurate modeling of complex physical problems, such as fluid-structure interaction, requires multiphysics coupling across the interface, which often has intricate geometry and dynamic boundaries. Conventional numerical methods face challenges in handling interface conditions. Deep neural networks offer a mesh-free and flexible alternative, but they suffer from drawbacks such as time-consuming optimization and local optima. In this paper, we propose a mesh-free approach based on Randomized Neural Networks (RNNs), which avoid optimization solvers during training, making them more efficient than traditional deep neural networks. Our approach, called Local Randomized Neural Networks (LRNNs), uses different RNNs to approximate solutions in different subdomains. We discretize the interface problem into a linear system at randomly sampled points across the domain, boundary, and interface using a finite difference scheme, and then solve it by a least-square method. For time-dependent interface problems, we use a space-time approach based on LRNNs. We show the effectiveness and robustness of the LRNNs methods through numerical examples of elliptic and parabolic interface problems. We also demonstrate that our approach can handle high-dimension interface problems. Compared to conventional numerical methods, our approach achieves higher accuracy with fewer degrees of freedom, eliminates the need for complex interface meshing and fitting, and significantly reduces training time, outperforming deep neural networks.

在線 · Oracle · Buffer（公司） · 情景 · Extensibility ·

2023 年 8 月 5 日

Online Algorithms with Randomly Infused Advice

Yuval Emek,Yuval Gil,Maciej Pacut,Stefan Schmid

from arxiv, Appeared at ESA 2023

We introduce a novel method for the rigorous quantitative evaluation of online algorithms that relaxes the "radical worst-case" perspective of classic competitive analysis. In contrast to prior work, our method, referred to as randomly infused advice (RIA), does not make any probabilistic assumptions about the input sequence and does not rely on the development of designated online algorithms. Rather, it can be applied to existing online randomized algorithms, introducing a means to evaluate their performance in scenarios that lie outside the radical worst-case regime. More concretely, an online algorithm ALG with RIA benefits from pieces of advice generated by an omniscient but not entirely reliable oracle. The crux of the new method is that the advice is provided to ALG by writing it into the buffer B from which ALG normally reads its random bits, hence allowing us to augment it through a very simple and non-intrusive interface. The (un)reliability of the oracle is captured via a parameter 0 {\le} {\alpha} {\le} 1 that determines the probability (per round) that the advice is successfully infused by the oracle; if the advice is not infused, which occurs with probability 1 - {\alpha}, then the buffer B contains fresh random bits (as in the classic online setting). The applicability of the new RIA method is demonstrated by applying it to three extensively studied online problems: paging, uniform metrical task systems, and online set cover. For these problems, we establish new upper bounds on the competitive ratio of classic online algorithms that improve as the infusion parameter {\alpha} increases. These are complemented with (often tight) lower bounds on the competitive ratio of online algorithms with RIA for the three problems.

變換 · 估計/估計量 · 豪斯多夫距離 · 流形 · Projection ·

2023 年 8 月 4 日

Ridge Estimation with Nonlinear Transformations

Zheng Zhai,Hengchao Chen,Zhigang Yao

Ridge estimation is an important manifold learning technique. The goal of this paper is to examine the effects of nonlinear transformations on the ridge sets. The main result proves the inclusion relationship between ridges: $\cR(f\circ p)\subseteq \cR(p)$, provided that the transformation $f$ is strictly increasing and concave on the range of the function $p$. Additionally, given an underlying true manifold $\cM$, we show that the Hausdorff distance between $\cR(f\circ p)$ and its projection onto $\cM$ is smaller than the Hausdorff distance between $\cR(p)$ and the corresponding projection. This motivates us to apply an increasing and concave transformation before the ridge estimation. In specific, we show that the power transformations $f^{q}(y)=y^q/q,-\infty<q\leq 1$ are increasing and concave on $\RR_+$, and thus we can use such power transformations when $p$ is strictly positive. Numerical experiments demonstrate the advantages of the proposed methods.

泛函 · surge · Performer · 統計量 · 查準率/準確率 ·

2023 年 8 月 3 日

Functional Data Regression Reconciles with Excess Bases

Tomoya Wakayama,Hidetoshi Matsui

As the development of measuring instruments and computers has accelerated the collection of massive data, functional data analysis (FDA) has gained a surge of attention. FDA is a methodology that treats longitudinal data as a function and performs inference, including regression. Functionalizing data typically involves fitting it with basis functions. However, the number of these functions smaller than the sample size is selected commonly. This paper casts doubt on this convention. Recent statistical theory has witnessed a phenomenon (the so-called double descent) in which excess parameters overcome overfitting and lead to precise interpolation. If we transfer this idea to the choice of the number of bases for functional data, providing an excess number of bases can lead to accurate predictions. We have explored this phenomenon in a functional regression problem and examined its validity through numerical experiments. In addition, through application to real-world datasets, we demonstrated that the double descent goes beyond just theoretical and numerical experiments - it is also important for practical use.

contrastive · Performer · 變換 · 信息抽取 · INFORMS ·

2021 年 2 月 4 日

Contrastive Triple Extraction with Generative Transformer

Hongbin Ye,Ningyu Zhang,Shumin Deng,Mosha Chen,Chuanqi Tan,Fei Huang,Huajun Chen

from arxiv, Accepted by AAAI 2021

Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i.e., batch-wise dynamic attention-masking and triple-wise calibration). Experimental results on three datasets (i.e., NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines.

entity · 圖 · 知識圖譜 · MoDELS · 相似度 ·

2019 年 9 月 11 日

Domain Representation for Knowledge Graph Embedding

Cunxiang Wang,Feiliang Ren,Zhichao Lin,Chenxv Zhao,Tian Xie,Yue Zhang

from arxiv, Acceptted by NLPCC2019

Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.

異常點 · 異常檢測 · CIFAR-10 · Extensibility · Performance ·

2018 年 12 月 21 日

Deep Anomaly Detection with Outlier Exposure

Dan Hendrycks,Mantas Mazeika,Thomas G. Dietterich

from arxiv, ICLR 2019; PyTorch code available at //github.com/hendrycks/outlier-exposure

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

事件抽取 · 學成 · 逆強化學習 · GAN · 估計/估計量 ·

2018 年 4 月 21 日

Event Extraction with Generative Adversarial Imitation Learning

Tongtao Zhang,Heng Ji

We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.