99视频在线播放喷射-在线看片日中文福利免费

Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that an $\epsilon$-stationary point exists in the final iterates of SGDs, given a large enough total iteration budget, $T$, not just anywhere in the entire range of iterates -- a much stronger result than the existing one. Additionally, our analyses allow us to measure the density of the $\epsilon$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T}})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research.

相關內容

非凸

關注 0

分解的 · 可辨認的 · 相互獨立的 · 潛在 · 統計量 ·

2023 年 12 月 5 日

On the Identifiability of Quantized Factors

Vitória Barin-Pacela,Kartik Ahuja,Simon Lacoste-Julien,Pascal Vincent

Disentanglement aims to recover meaningful latent ground-truth factors from the observed distribution solely, and is formalized through the theory of identifiability. The identifiability of independent latent factors is proven to be impossible in the unsupervised i.i.d. setting under a general nonlinear map from factors to observations. In this work, however, we demonstrate that it is possible to recover quantized latent factors under a generic nonlinear diffeomorphism. We only assume that the latent factors have independent discontinuities in their density, without requiring the factors to be statistically independent. We introduce this novel form of identifiability, termed quantized factor identifiability, and provide a comprehensive proof of the recovery of the quantized factors.

INFORMS · 可理解性 · GROUP · Learning · 設計 ·

2023 年 12 月 5 日

Mapping the Information Journey: Unveiling the Documentation Experience of Software Developers in China

Zhijun Gao,Jiangying Wang,Meina Wang

from arxiv, 27 pages

This research delves into understanding the behaviors and characteristics of Chinese developers in relation to their use of technical documentation, which is crucial for creating high-quality developer documentation. We conducted interviews with 25 software developers and surveyed 177 participants, using the preliminary interview findings to inform the survey design. Our approach encompassed traditional user research methods, including persona and user journey mapping, to develop typical personas and information journeys based on the qualitative data from the interviews and quantitative results from the survey. Our results revealed distinct characteristics and differences between junior and senior developers in terms of their use of technical documentation, broadly categorized into personality traits, learning habits, and working habits. We observed that the information journey of both groups typically encompasses four stages: Exploration, Understanding, Practice, and Application. Consequently, we created two distinct personas and information journey maps to represent these two developer groups. Our findings highlight that developers prioritize the content, organization, and maintenance aspects of documentation. In conclusion, we recommend organizing documentation content to align with developers' information journeys, tailoring documentation to meet the needs of developers at various levels, and focusing on the content, organization, and maintenance aspects of documentation.

MoDELS · Performer · Atom（文本編輯器） · Analysis · Learning ·

2023 年 12 月 4 日

On the Expressive Power of Behavior Structure

Cheng Wang,Hangyu Zhu,Yuhang Lin,Changjun Jiang

Efforts toward a comprehensive description of behavior have indeed facilitated the development of representation-based approaches that utilize deep learning to capture behavioral information. As behavior complexity increases, the expressive power of these models reaches a bottleneck. We coin the term ``behavioral molecular structure" and propose a new model called the Behavioral Molecular Structure (BMS). The model characterizes behaviors at the atomic level, analogizes behavioral attributes to atoms, and concretizes interrelations at the granularity of atoms using graphs. Here, we design three different downstream tasks to test the performance of the BMS model on public datasets. Additionally, we provide a preliminary theoretical analysis demonstrating that the BMS model can offer effective expressiveness for complex behaviors.

語言模型化 · MoDELS · 大語言模型 · 推斷 · Prompt ·

2023 年 12 月 3 日

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao,Dian Yu,Jeffrey Zhao,Izhak Shafran,Thomas L. Griffiths,Yuan Cao,Karthik Narasimhan

from arxiv, NeurIPS 2023 camera ready version. Code repo with all prompts: //github.com/princeton-nlp/tree-of-thought-llm

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: //github.com/princeton-nlp/tree-of-thought-llm.

Things · Performer · 評論員 · Analysis · 泛函 ·

2023 年 12 月 1 日

Scalable and Lightweight Post-Quantum Authentication for Internet of Things

Attila A. Yavuz,Saleh Darzi,Saif E. Nouma

from arxiv, 10 pages

Internet of Things (IoT) applications are composed of massive quantities of resource-limited devices that collect sensitive data with long-term operational and security requirements. With the threat of emerging quantum computers, Post-Quantum Cryptography (PQC) is a critical requirement for IoTs. In particular, digital signatures offer scalable authentication with non-repudiation and are an essential tool for IoTs. However, as seen in NIST PQC standardization, post-quantum signatures are extremely costly for resource-limited IoTs. Hence, there is a significant need for quantum-safe signatures that respect the processing, memory, and bandwidth limitations of IoTs. In this paper, we created a new lightweight quantum-safe digital signature referred to as INFinity-HORS (INF-HORS), which is (to the best of our knowledge) the first signer-optimal hash-based signature with (polynomially) unbounded signing capability. INF-HORS enables a verifier to non-interactively construct one-time public keys from a master public key via encrypted function evaluations. This strategy avoids the performance bottleneck of hash-based standards (e.g., SPHINCS+) by eliminating hyper-tree structures. It also does not require a trusted party or non-colliding servers to distribute public keys. Our performance analysis confirms that INF-HORS is magnitudes of times more signer computation efficient than selected NIST PQC schemes (e.g., SPHINCS+, Dilithium, Falcon) with a small memory footprint.

2023 年 12 月 1 日

Modeling the Ratio of Correlated Biomarkers Using Copula Regression

Moritz Berger,Nadja Klein,Michael Wagner,Matthias Schmid

from arxiv, 32 pages, 6 figures, 5 tables

Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are based on oversimplified assumptions, assuming e.g.\@ independence or strictly positive associations between the components. In this paper, we close this gap in the literature and propose a regression model where the marginal distributions of the two components are linked by Frank copula. A key feature of our model is that it allows for both positive and negative correlations between the components, with one of the model parameters being directly interpretable in terms of Kendall's rank correlation coefficient. We study our method theoretically, evaluate finite sample properties in a simulation study and demonstrate its efficacy in an application to diagnosis of Alzheimer's disease via ratios of amyloid-beta and total tau protein biomarkers.

MoDELS · 講稿 · Learning · Sphering · 表示 ·

2023 年 11 月 2 日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Hang Chen,Keqing Du,Chenguang Li,Xinyu Yang

from arxiv, under review

The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.

Processing（編程語言） · Taxonomy · Cognition · Attention · 蒸餾 ·

2023 年 9 月 27 日

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Zheng Chu,Jingchang Chen,Qianglong Chen,Weijiang Yu,Tao He,Haotian Wang,Weihua Peng,Ming Liu,Bing Qin,Ting Liu

from arxiv, Resources are available at //github.com/zchuz/CoT-Reasoning-Survey

Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

MoDELS · 評論員 · 縮放 · 可理解性 · GPT-3 ·

2021 年 8 月 18 日

On the Opportunities and Risks of Foundation Models

Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri Chatterji,Annie Chen,Kathleen Creel,Jared Quincy Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Kohd,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang

from arxiv, Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

學成 · 大數據 · 相同 · 人工智能 · 統計方法 ·

2020 年 5 月 5 日

A Survey of Learning Causality with Data: Problems and Methods

Ruocheng Guo,Lu Cheng,Jundong Li,P. Richard Hahn,Huan Liu

from arxiv, 35 pages, accepted by ACM CSUR

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.