A级日本乱理伦片免费入口-欧美丰满美乳XXⅩ高潮WWW

Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically and empirically study the effect of normalization in SAM for both convex and non-convex functions, revealing two key roles played by normalization: i) it helps in stabilizing the algorithm; and ii) it enables the algorithm to drift along a continuum (manifold) of minima -- a property identified by recent theoretical works that is the key to better performance. We further argue that these two properties of normalization make SAM robust against the choice of hyper-parameters, supporting the practicality of SAM. Our conclusions are backed by various experiments.

相關內容

規范化的

關注 2

語言模型化 · Performer · Better · MoDELS · Continuity ·

2023 年 7 月 12 日

A Survey on Evaluation of Large Language Models

Yupeng Chang,Xu Wang,Jindong Wang,Yuan Wu,Kaijie Zhu,Hao Chen,Linyi Yang,Xiaoyuan Yi,Cunxiang Wang,Yidong Wang,Wei Ye,Yue Zhang,Yi Chang,Philip S. Yu,Qiang Yang,Xing Xie

from arxiv, 24 pages; code is at //github.com/MLGroupJLU/LLM-eval-survey

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: //github.com/MLGroupJLU/LLM-eval-survey.

Engineering · TOOLS · MoDELS · AI · Extensibility ·

2023 年 7 月 12 日

Navigating the Complexity of Generative AI Adoption in Software Engineering

Daniel Russo

In this paper, the adoption patterns of Generative Artificial Intelligence (AI) tools within software engineering are investigated. Influencing factors at the individual, technological, and societal levels are analyzed using a mixed-methods approach for an extensive comprehension of AI adoption. An initial structured interview was conducted with 100 software engineers, employing the Technology Acceptance Model (TAM), the Diffusion of Innovations theory (DOI), and the Social Cognitive Theory (SCT) as guiding theories. A theoretical model named the Human-AI Collaboration and Adaptation Framework (HACAF) was deduced using the Gioia Methodology, characterizing AI adoption in software engineering. This model's validity was subsequently tested through Partial Least Squares - Structural Equation Modeling (PLS-SEM), using data collected from 183 software professionals. The results indicate that the adoption of AI tools in these early integration stages is primarily driven by their compatibility with existing development workflows. This finding counters the traditional theories of technology acceptance. Contrary to expectations, the influence of perceived usefulness, social aspects, and personal innovativeness on adoption appeared to be less significant. This paper yields significant insights for the design of future AI tools and supplies a structure for devising effective strategies for organizational implementation.

Facebook AI Research · Extensibility · 評論員 · 分解的 · 回合 ·

2023 年 7 月 12 日

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Tianyu Zhao,Mojtaba Taherisadr,Salma Elmalaki

Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.

重要性采樣 · 樣本 · 優化器 · 近似 · 教程 ·

2023 年 7 月 10 日

Importance Sampling for Minimization of Tail Risks: A Tutorial

Anand Deo,Karthyek Murthy

This paper provides an introductory overview of how one may employ importance sampling effectively as a tool for solving stochastic optimization formulations incorporating tail risk measures such as Conditional Value-at-Risk. Approximating the tail risk measure by its sample average approximation, while appealing due to its simplicity and universality in use, requires a large number of samples to be able to arrive at risk-minimizing decisions with high confidence. This is primarily due to the rarity with which the relevant tail events get observed in the samples. In simulation, Importance Sampling is among the most prominent methods for substantially reducing the sample requirement while estimating probabilities of rare events. Can importance sampling be used for optimization as well? If so, what are the ingredients required for making importance sampling an effective tool for optimization formulations involving rare events? This tutorial aims to provide an introductory overview of the two key ingredients in this regard, namely, (i) how one may arrive at an importance sampling change of measure prescription at every decision, and (ii) the prominent techniques available for integrating such a prescription within a solution paradigm for stochastic optimization formulations.

泛化誤差 · 泛化理論 · 樣本 · 經驗風險 · MNIST (數據集) ·

2023 年 7 月 10 日

The Value of Out-of-Distribution Data

Ashwin De Silva,Rahul Ramesh,Carey E. Priebe,Pratik Chaudhari,Joshua T. Vogelstein

from arxiv, Previous versions of this work have been presented at the Out-of-Distribution Generalization in Computer Vision (OOD-CV) Workshop (ECCV 2022) and the Workshop on Distribution Shifts (NeurIPS 2022)

We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.

語言模型化 · MoDELS · HER · Processing（編程語言） · 情景 ·

2023 年 7 月 9 日

On the Creativity of Large Language Models

Giorgio Franceschelli,Mirco Musolesi

Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. One of the most remarkable applications is creative writing, e.g., poetry or storytelling: the generated outputs are often of astonishing quality. However, a natural question arises: can LLMs be really considered creative? In this article we firstly analyze the development of LLMs under the lens of creativity theories, investigating the key open questions and challenges. In particular, we focus our discussion around the dimensions of value, novelty and surprise as proposed by Margaret Boden in her work. Then, we consider different classic perspectives, namely product, process, press and person. We discuss a set of ``easy'' and ``hard'' problems in machine creativity, presenting them in relation to LLMs. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries, analyzing the opportunities offered by them, the challenges arising by them and the potential associated risks, from both legal and ethical points of view.

MoDELS · 黑盒子 · 線性的 · Neural Networks · 優化器 ·

2023 年 7 月 8 日

Achieving Efficiency in Black Box Simulation of Distribution Tails with Self-structuring Importance Samplers

Anand Deo,Karthyek Murthy

from arxiv, 51 pages, 6 Figures

This paper presents a novel Importance Sampling (IS) scheme for estimating distribution tails of performance measures modeled with a rich set of tools such as linear programs, integer linear programs, piecewise linear/quadratic objectives, feature maps specified with deep neural networks, etc. The conventional approach of explicitly identifying efficient changes of measure suffers from feasibility and scalability concerns beyond highly stylized models, due to their need to be tailored intricately to the objective and the underlying probability distribution. This bottleneck is overcome in the proposed scheme with an elementary transformation which is capable of implicitly inducing an effective IS distribution in a variety of models by replicating the concentration properties observed in less rare samples. This novel approach is guided by developing a large deviations principle that brings out the phenomenon of self-similarity of optimal IS distributions. The proposed sampler is the first to attain asymptotically optimal variance reduction across a spectrum of multivariate distributions despite being oblivious to the specifics of the underlying model. Its applicability is illustrated with contextual shortest path and portfolio credit risk models informed by neural networks

Extensibility · Continuity · Analysis · 穩健性 · Medium ·

2023 年 7 月 6 日

Recovery of Multiple Parameters in Subdiffusion from One Lateral Boundary Measurement

Siyu Cen,Bangti Jin,Yikan Liu,Zhi Zhou

from arxiv, 28 pages

This work is concerned with numerically recovering multiple parameters simultaneously in the subdiffusion model from one single lateral measurement on a part of the boundary, while in an incompletely known medium. We prove that the boundary measurement corresponding to a fairly general boundary excitation uniquely determines the order of the fractional derivative and the polygonal support of the diffusion coefficient, without knowing either the initial condition or the source. The uniqueness analysis further inspires the development of a robust numerical algorithm for recovering the fractional order and diffusion coefficient. The proposed algorithm combines small-time asymptotic expansion, analytic continuation of the solution and the level set method. We present extensive numerical experiments to illustrate the feasibility of the simultaneous recovery. In addition, we discuss the uniqueness of recovering general diffusion and potential coefficients from one single partial boundary measurement, when the boundary excitation is more specialized.

Extensibility · INTERACT · INFORMS · TEAM · 優化器 ·

2022 年 2 月 21 日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Aviv Adler,Oscar Mickelin,Ragesh K. Ramachandran,Gaurav S. Sukhatme,Sertac Karaman

from arxiv, 27 pages, 9 figures

When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.

多樣性 · 學成 · state-of-the-art · MoDELS · 張成子空間 ·

2021 年 3 月 14 日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Nicolas Perez Nieves,Yaodong Yang,Oliver Slumbers,David Henry Mguni,Jun Wang

from arxiv, corresponds to <[email protected]>

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.