四虎亚洲精品高清在线观看,精品国产91久久久久久久下载,日本高清不卡码二区三区,欧美亚洲国产另类

Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates, where we choose the updates of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.

相關內容

Adam

關注 0

Attention · 變換 · MoDELS · 層 · 稀疏 ·

2024 年 3 月 15 日

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Yuandong Tian,Yiping Wang,Zhenyu Zhang,Beidi Chen,Simon Du

from arxiv, ICLR'24 camera ready. Improve theorem 3 and theorem 4. Polish writing and add code link

We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformer architectures. This is achieved by integrating out the self-attention layer in Transformers, producing a modified dynamics of MLP layers only. JoMA removes unrealistic assumptions in previous analysis (e.g., lack of residual connection) and predicts that the attention first becomes sparse (to learn salient tokens), then dense (to learn less salient tokens) in the presence of nonlinear activations, while in the linear case, it is consistent with existing works that show attention becomes sparse over time. We leverage JoMA to qualitatively explains how tokens are combined to form hierarchies in multilayer Transformers, when the input tokens are generated by a latent hierarchical generative model. Experiments on models trained from real-world dataset (Wikitext2/Wikitext103) and various pre-trained models (OPT, Pythia) verify our theoretical findings. Code can be found in //github.com/facebookresearch/luckmatters/tree/yuandong3.

估計/估計量 · SimPLe · 樣例 · 泛函 · 變換 ·

2024 年 3 月 14 日

Enhanced One-Step Neville Algorithm with Access to the Convergence Rate

U. D. Jentschura,L. T. Giorgini

from arxiv, 13 pages; RevTeX

The recursive Neville algorithm allows one to calculate interpolating functions recursively. Upon a judicious choice of the abscissas used for the interpolation (and extrapolation), this algorithm leads to a method for convergence acceleration. For example, one can use the Neville algorithm in order to successively eliminate inverse powers of the upper limit of the summation from the partial sums of a given, slowly convergent input series. Here, we show that, for a particular choice of the abscissas used for the extrapolation, one can replace the recursive Neville scheme by a simple one-step transformation, while also obtaining access to subleading terms for the transformed series after convergence acceleration. The matrix-based, unified formulas allow one to estimate the rate of convergence of the partial sums of the input series to their limit. In particular, Bethe logarithms for hydrogen are calculated to 100 decimal digits.

模型評估 · MoDELS · AI · Chatbot · 可辨認的 ·

2024 年 3 月 14 日

AI on AI: Exploring the Utility of GPT as an Expert Annotator of AI Publications

Autumn Toney-Wails,Christian Schoeberl,James Dunham

Identifying scientific publications that are within a dynamic field of research often requires costly annotation by subject-matter experts. Resources like widely-accepted classification criteria or field taxonomies are unavailable for a domain like artificial intelligence (AI), which spans emerging topics and technologies. We address these challenges by inferring a functional definition of AI research from existing expert labels, and then evaluating state-of-the-art chatbot models on the task of expert data annotation. Using the arXiv publication database as ground-truth, we experiment with prompt engineering for GPT chatbot models to identify an alternative, automated expert annotation pipeline that assigns AI labels with 94% accuracy. For comparison, we fine-tune SPECTER, a transformer language model pre-trained on scientific publications, that achieves 96% accuracy (only 2% higher than GPT) on classifying AI publications. Our results indicate that with effective prompt engineering, chatbots can be used as reliable data annotators even where subject-area expertise is required. To evaluate the utility of chatbot-annotated datasets on downstream classification tasks, we train a new classifier on GPT-labeled data and compare its performance to the arXiv-trained model. The classifier trained on GPT-labeled data outperforms the arXiv-trained model by nine percentage points, achieving 82% accuracy.

貪心 · 設計 · 類別 · SOSP · binary ·

2024 年 3 月 13 日

An Algorithmic Theory of Simplicity in Mechanism Design

Diodato Ferraioli,Carmine Ventre

A growing body of work in economics and computation focuses on the trade-off between implementability and simplicity in mechanism design. The goal is to develop a theory that not only allows to design an incentive structure easy to grasp for imperfectly rational agents, but also understand the ensuing limitations on the class of mechanisms that enforce it. In this context, the concept of OSP mechanisms has assumed a prominent role since they provably account for the absence of contingent reasoning skills, a specific cognitive limitation. For single-dimensional agents, it is known that OSP mechanisms need to use certain greedy algorithms. In this work, we introduce a notion that interpolates between OSP and SOSP, a more stringent notion where agents only plan a subset of their own future moves. We provide an algorithmic characterization of this novel class of mechanisms for single-dimensional domains and binary allocation problems, that precisely measures the interplay between simplicity and implementability. We build on this to show how mechanisms based on reverse greedy algorithms (a.k.a., deferred acceptance auctions) are algorithmically more robust to imperfectly rationality than those adopting greedy algorithms.

相互獨立的 · MoDELS · 標注 · 準則 · Learning ·

2024 年 3 月 12 日

Towards Independence Criterion in Machine Unlearning of Features and Labels

Ling Han,Nanqing Luo,Hao Huang,Jing Chen,Mary-Anne Hartley

from arxiv, 10 pages, 1 figure

This work delves into the complexities of machine unlearning in the face of distributional shifts, particularly focusing on the challenges posed by non-uniform feature and label removal. With the advent of regulations like the GDPR emphasizing data privacy and the right to be forgotten, machine learning models face the daunting task of unlearning sensitive information without compromising their integrity or performance. Our research introduces a novel approach that leverages influence functions and principles of distributional independence to address these challenges. By proposing a comprehensive framework for machine unlearning, we aim to ensure privacy protection while maintaining model performance and adaptability across varying distributions. Our method not only facilitates efficient data removal but also dynamically adjusts the model to preserve its generalization capabilities. Through extensive experimentation, we demonstrate the efficacy of our approach in scenarios characterized by significant distributional shifts, making substantial contributions to the field of machine unlearning. This research paves the way for developing more resilient and adaptable unlearning techniques, ensuring models remain robust and accurate in the dynamic landscape of data privacy and machine learning.

INTERACT · 控制器 · PDE · CASES · Lyapunov ·

2024 年 3 月 12 日

The Exponential Stabilization of a Heat and Piezoelectric Beam Interaction with Static or Hybrid Feedback Controllers

Ahmet Ozkan Ozer,Ibrahim Khalilullah,Uthman Rasaq

from arxiv, 1 figure

This study investigates a strongly-coupled system of partial differential equations (PDE) governing heat transfer in a copper rod, longitudinal vibrations, and total charge accumulation at electrodes within a magnetizable piezoelectric beam. Conducted within the transmission line framework, the analysis reveals profound interactions between traveling electromagnetic and mechanical waves in magnetizable piezoelectric beams, despite disparities in their velocities. Findings suggest that in the open-loop scenario, the interaction of heat and beam dynamics lacks exponential stability solely considering thermal effects. To confront this challenge, two types of boundary-type state feedback controllers are proposed: (i) employing static feedback controllers entirely and (ii) adopting a hybrid approach wherein the electrical controller dynamically enhances system dynamics. In both cases, solutions of the PDE systems demonstrate exponential stability through meticulously formulated Lyapunov functions with diverse multipliers. The proposed proof technique establishes a robust foundation for demonstrating the exponential stability of Finite-Difference-based model reductions as the discretization parameter approaches zero.

Processing（編程語言） · 推斷 · NLP · Computational Linguistics · 估計/估計量 ·

2021 年 9 月 2 日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Amir Feder,Katherine A. Keith,Emaad Manzoor,Reid Pryzant,Dhanya Sridhar,Zach Wood-Doughty,Jacob Eisenstein,Justin Grimmer,Roi Reichart,Margaret E. Roberts,Brandon M. Stewart,Victor Veitch,Diyi Yang

A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.

泛化理論 · INFORMS · 估計/估計量 · 互信息 · 泛化誤差 ·

2021 年 6 月 18 日

A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

Xinjie Lan,Kenneth Barner

from arxiv, To appear in the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.

跳躍連接 · Neural Networks · 優化器 · 線性的 · 圖 ·

2021 年 5 月 10 日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Keyulu Xu,Mozhi Zhang,Stefanie Jegelka,Kenji Kawaguchi

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

學成 · 大數據 · 相同 · 人工智能 · 統計方法 ·

2020 年 5 月 5 日

A Survey of Learning Causality with Data: Problems and Methods

Ruocheng Guo,Lu Cheng,Jundong Li,P. Richard Hahn,Huan Liu

from arxiv, 35 pages, accepted by ACM CSUR

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.