国产日黄色大片一区二区,精品夜色国产国偷自产乱码,自拍色综合图第一页区,国产又黄又湿又色又刺激

from arxiv, 21 pages, 4 figures. Submitted to TMLR. Updated to TMLR format. Minor corrections in Figure 1 description, statements of Theorem 14, 18 and Corollary 17. Minor clarification in statement of Theorem 10, 21. Moved most proofs to appendix and added sketches, moved remarks within proofs into main body

Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove $O(1/T)$ convergence for smooth and convex functions and linear convergence for smooth and strongly convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.

相關內容

平滑

關注 1

Performer · 可理解性 · Better · MoDELS · 可行 ·

2024 年 1 月 24 日

Towards Better Understanding of User Satisfaction in Open-Domain Conversational Search

Zhumin Chu,Qingyao Ai,Zhihong Wang,Yiqun Liu,Yingye Huang,Rui Zhang,Min Zhang,Shaoping Ma

from arxiv, 25 pages

With the increasing popularity of conversational search, how to evaluate the performance of conversational search systems has become an important question in the IR community. Existing works on conversational search evaluation can mainly be categorized into two streams: (1) constructing metrics based on semantic similarity (e.g. BLUE, METEOR and BERTScore), or (2) directly evaluating the response ranking performance of the system using traditional search methods (e.g. nDCG, RBP and nERR). However, these methods either ignore the information need of the user or ignore the mixed-initiative property of conversational search. This raises the question of how to accurately model user satisfaction in conversational search scenarios. Since explicitly asking users to provide satisfaction feedback is difficult, traditional IR studies often rely on the Cranfield paradigm (i.e., third-party annotation) and user behavior modeling to estimate user satisfaction in search. However, the feasibility and effectiveness of these two approaches have not been fully explored in conversational search. In this paper, we dive into the evaluation of conversational search from the perspective of user satisfaction. We build a novel conversational search experimental platform and construct a Chinese open-domain conversational search behavior dataset containing rich annotations and search behavior data. We also collect third-party satisfaction annotation at the session-level and turn-level, to investigate the feasibility of the Cranfield paradigm in the conversational search scenario. Experimental results show both some consistency and considerable differences between the user satisfaction annotations and third-party annotations. We also propose dialog continuation or ending behavior models (DCEBM) to capture session-level user satisfaction based on turn-level information.

特化 · 估計/估計量 · 線性回歸 · 線性的 · 方陣 ·

2024 年 1 月 24 日

The Fragility of Sparsity

Michal Kolesár,Ulrich K. Müller,Sebastian T. Roelsgaard

from arxiv, 44 pages, including appendices

We show, using three empirical applications, that linear regression estimates which rely on the assumption of sparsity are fragile in two ways. First, we document that different choices of the regressor matrix that do not impact ordinary least squares (OLS) estimates, such as the choice of baseline category with categorical controls, can move sparsity-based estimates two standard errors or more. Second, we develop two tests of the sparsity assumption based on comparing sparsity-based estimators with OLS. The tests tend to reject the sparsity assumption in all three applications. Unless the number of regressors is comparable to or exceeds the sample size, OLS yields more robust results at little efficiency cost.

Facebook AI Research · Ghost（博客程序） · 相互獨立的 · 成比例 · 情景 ·

2024 年 1 月 23 日

The Fairness of Redistricting Ghost

Jia-Wei Liang,Nina Amenta

We explore the fairness of a redistricting game introduced by Mixon and Villar, which provides a two-party protocol for dividing a state into electoral districts, without the participation of an independent authority. We analyze the game in an abstract setting that ignores the geographic distribution of voters and assumes that voter preferences are fixed and known. We show that the minority player can always win at least $p-1$ districts, where $p$ is proportional to the percentage of minority voters. We give an upper bound on the number of districts won by the minority based on a "cracking" strategy for the majority.

聯想記憶 · CASES · 統計量 · Analysis · 確切的 ·

2024 年 1 月 22 日

The Exponential Capacity of Dense Associative Memories

Carlo Lucibello,Marc Mézard

from arxiv, Version accepted on Physics Review Letters

Recent generalizations of the Hopfield model of associative memories are able to store a number $P$ of random patterns that grows exponentially with the number $N$ of neurons, $P=\exp(\alpha N)$. Besides the huge storage capacity, another interesting feature of these networks is their connection to the attention mechanism which is part of the Transformer architectures widely applied in deep learning. In this work, we study a generic family of pattern ensembles using a statistical mechanics analysis which gives exact asymptotic thresholds for the retrieval of a typical pattern, $\alpha_1$, and lower bounds for the maximum of the load $\alpha$ for which all patterns can be retrieved, $\alpha_c$, as well as sizes of attraction basins. We discuss in detail the cases of Gaussian and spherical patterns, and show that they display rich and qualitatively different phase diagrams.

變換 · Automator · 講稿 · ONCE · 樣例 ·

2024 年 1 月 22 日

Towards Automatic Transformations of Coq Proof Scripts

Nicolas Magaud

from arxiv, In Proceedings ADG 2023, arXiv:2401.10725

Proof assistants like Coq are increasingly popular to help mathematicians carry out proofs of the results they conjecture. However, formal proofs remain highly technical and are especially difficult to reuse. In this paper, we present a framework to carry out a posteriori script transformations. These transformations are meant to be applied as an automated post-processing step, once the proof has been completed. As an example, we present a transformation which takes an arbitrary large proof script and produces an equivalent single-line proof script, which can be executed by Coq in one single step. Other applications, such as fully expanding a proof script (for debugging purposes), removing all named hypotheses, etc. could be developed within this framework. We apply our tool to various Coq proof scripts, including some from the GeoCoq library.

模型評估 · Tensor · 控制器 · 數據填補 · 可約的 ·

2024 年 1 月 21 日

Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control

Zhigang Wang,Xu Zhang,Ning Wang,Chuanfei Xu,Jie Nie,Zhiqiang Wei,Yu Gu,Ge Yu

from arxiv, 13 pages

Transformer-based models are becoming deeper and larger recently. For better scalability, an underlying training solution in industry is to split billions of parameters (tensors) into many tasks and then run them across homogeneous accelerators (e.g., GPUs). However, such dedicated compute cluster is prohibitively expensive in academia and moderate companies. An economic replacement is to aggregate existing heterogeneous devices and share resources among multi-tenants. Nevertheless, static hardware configurations and dynamic resource contention definitely cause straggling tasks, which heavily slows down the overall training efficiency. Existing works feature contributions mainly tailored for traditional data parallelism. They cannot work well for the new tensor parallelism due to strict communication and correctness constraints. In this paper we first present ZERO-resizing, a novel dynamic workload balancing technique without any data migration. We tune workloads in real-time by temporarily resizing matrices involved in core tensor-related computations. We particularly design data imputation and priority selection policies to respectively satisfy consistency constraint required by normal training and reduce the accuracy loss. We also give a lightweight data migration technique without loss of accuracy, to cope with heavy heterogeneity. Our final SEMI-migration solution is built on top of these two techniques and can adaptively distinguish their respective balancing missions, to achieve an overall success in efficiency and accuracy. Extensive experiments on the representative Colossal-AI platform validate the effectiveness of our proposals.

最大似然估計 · 極大似然 · 估計/估計量 · 似然 · CASES ·

2024 年 1 月 20 日

Maximum Likelihood Estimators of Quantum Probabilities

Mirko Navara,Jan ?evic

Classical probability theory is based on assumptions which are often violated in practice. Therefore quantum probability is a proposed alternative not only in quantum physics, but also in other sciences. However, so far it mostly criticizes the classical approach, but does not suggest a working alternative. Maximum likelihood estimators were given very low attention in this context. We show that they can be correctly defined and their computation in closed form is feasible at least in some cases.

SAT · Amazon Web Services · 向量化 · 均值 · 近似 ·

2024 年 1 月 19 日

DRAT Proofs of Unsatisfiability for SAT Modulo Monotonic Theories

Nick Feng,Alan J. Hu,Sam Bayless,Syed M. Iqbal,Patrick Trentin,Mike Whalen,Lee Pike,John Backes

Generating proofs of unsatisfiability is a valuable capability of most SAT solvers, and is an active area of research for SMT solvers. This paper introduces the first method to efficiently generate proofs of unsatisfiability specifically for an important subset of SMT: SAT Modulo Monotonic Theories (SMMT), which includes many useful finite-domain theories (e.g., bit vectors and many graph-theoretic properties) and is used in production at Amazon Web Services. Our method uses propositional definitions of the theory predicates, from which it generates compact Horn approximations of the definitions, which lead to efficient DRAT proofs, leveraging the large investment the SAT community has made in DRAT. In experiments on practical SMMT problems, our proof generation overhead is minimal (7.41% geometric mean slowdown, 28.8% worst-case), and we can generate and check proofs for many problems that were previously intractable.

語言模型化 · MoDELS · 泛化理論 · 可辨認的 · Continuity ·

2023 年 7 月 12 日

A Comprehensive Overview of Large Language Models

Humza Naveed,Asad Ullah Khan,Shi Qiu,Muhammad Saqib,Saeed Anwar,Muhammad Usman,Nick Barnes,Ajmal Mian

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

數據增強 · 圖 · 圖形處理器 · Performer · Neural Networks ·

2020 年 12 月 2 日

Data Augmentation for Graph Neural Networks

Tong Zhao,Yozen Liu,Leonardo Neves,Oliver Woodford,Meng Jiang,Neil Shah

from arxiv, AAAI 2021. This complete version contains the Appendix

Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.