好男人在线观看免费2019,亚洲国产一区二区精品91,一级毛片视频免费不卡无,日韩美女高潮在线观看

Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.

相關內容

估計/估計量

關注 3

MoDELS · Unstructured · state-of-the-art · 剪枝 · 大語言模型 ·

2023 年 12 月 28 日

The LLM Surgeon

Tycho F. A. van der Ouderaa,Markus Nagel,Mart van Baalen,Yuki M. Asano,Tijmen Blankevoort

State-of-the-art language models are becoming increasingly large in an effort to achieve the highest performance on large corpora of available textual data. However, the sheer size of the Transformer architectures makes it difficult to deploy models within computational, environmental or device-specific constraints. We explore data-driven compression of existing pretrained models as an alternative to training smaller models from scratch. To do so, we scale Kronecker-factored curvature approximations of the target loss landscape to large language models. In doing so, we can compute both the dynamic allocation of structures that can be removed as well as updates of remaining weights that account for the removal. We provide a general framework for unstructured, semi-structured and structured pruning and improve upon weight updates to capture more correlations between weights, while remaining computationally efficient. Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

線性的 · 圖 · 極小點 · 未標記 · 近似 ·

2023 年 12 月 28 日

On The Maximum Linear Arrangement Problem for Trees

Lluís Alemany-Puig,Juan Luis Esteban,Ramon Ferrer-i-Cancho

Linear arrangements of graphs are a well-known type of graph labeling and are found at the heart of many important computational problems, such as the Minimum Linear Arrangement Problem ($\texttt{minLA}$). A linear arrangement is usually defined as a permutation of the $n$ vertices of a graph. An intuitive geometric setting is that of vertices lying on consecutive integer positions in the real line, starting at 1; edges are often drawn as semicircles above the real line. In this paper we study the Maximum Linear Arrangement problem ($\texttt{MaxLA}$), the maximization variant of $\texttt{minLA}$. We devise a new characterization of maximum arrangements of general graphs, and prove that $\texttt{MaxLA}$ can be solved for cycle graphs in constant time, and for $k$-linear trees ($k\le2$) in time $O(n)$. We present two constrained variants of $\texttt{MaxLA}$ we call $\texttt{bipartite MaxLA}$ and $\texttt{1-thistle MaxLA}$. We prove that the former can be solved in $O(n)$ for any bipartite graph; the latter, by an algorithm that typically runs in $O(n^3)$ on unlabelled trees. The combination of the two variants has two promising characteristics. First, it solves $\texttt{MaxLA}$ for almost all trees consisting of a few tenths of nodes. Second, it produces a high quality approximation to $\texttt{MaxLA}$ for trees where the algorithm fails to solve $\texttt{MaxLA}$. Furthermore, we conjecture that $\texttt{bipartite MaxLA}$ solves $\texttt{MaxLA}$ for at least $50\%$ of all free trees.

Facebook AI Research · 知識 (knowledge) · Extensibility · 相互獨立的 · 分離的 ·

2023 年 12 月 28 日

Provably Fair Cooperative Scheduling

Reiner H?hnle,Ludovic Henrio

The context of this work is cooperative scheduling, a concurrency paradigm, where task execution is not arbitrarily preempted. Instead, language constructs exist that let a task voluntarily yield the right to execute to another task. The inquiry is the design of provably fair schedulers and suitable notions of fairness for cooperative scheduling languages. To the best of our knowledge, this problem has not been addressed so far. Our approach is to study fairness independently from syntactic constructs or environments, purely from the point of view of the semantics of programming languages, i.e., we consider fairness criteria using the formal definition of a program execution. We develop our concepts for classic structural operational semantics (SOS) as well as for the recent locally abstract, globally concrete (LAGC) semantics. The latter is a highly modular approach to semantics ensuring the separation of concerns between local statement evaluation and scheduling decisions. The new knowledge contributed by our work is threefold: first, we show that a new fairness notion, called quiescent fairness, is needed to characterize fairness adequately in the context of cooperative scheduling; second, we define a provably fair scheduler for cooperative scheduling languages; third, a qualitative comparison between the SOS and LAGC versions yields that the latter, while taking higher initial effort, is more amenable to proving fairness and scales better under language extensions than SOS. The grounding of our work is a detailed formal proof of quiescent fairness for the scheduler defined in LAGC semantics. The importance of our work is that it provides a formal foundation for the implementation of fair schedulers for cooperative scheduling languages, an increasingly popular paradigm (for example: akka/Scala, JavaScript, async Rust). Being based solely on semantics, our ideas are widely applicable. Further, our work makes clear that the standard notion of fairness in concurrent languages needs to be adapted for cooperative scheduling and, more generally, for any language that combines atomic execution sequences with some form of preemption.

MoDELS · 估計/估計量 · Analysis · 似然 · Extensibility ·

2023 年 12 月 26 日

Multinomial Link Models

Tianmeng Wang,Liping Tong,Jie Yang

from arxiv, 36 pages, 3 figures

We propose a unified multinomial link model for analyzing categorical responses. It not only covers the existing multinomial logistic models and their extensions as a special class, but also allows the observations with NA or Unknown responses to be incorporated as a special category in the data analysis. We provide explicit formulae for computing the likelihood gradient and Fisher information matrix, as well as detailed algorithms for finding the maximum likelihood estimates of the model parameters. Our algorithms solve the infeasibility issue of existing statistical software on estimating parameters of cumulative link models. The applications to real datasets show that the proposed multinomial link models can fit the data significantly better, and the corresponding data analysis may correct the misleading conclusions due to missing data.

INTERACT · INFORMS · Processing（編程語言） · 語言模型化 · 回合 ·

2023 年 5 月 22 日

Interactive Natural Language Processing

Zekun Wang,Ge Zhang,Kexin Yang,Ning Shi,Wangchunshu Zhou,Shaochun Hao,Guangzheng Xiong,Yizhi Li,Mong Yuan Sim,Xiuying Chen,Qingqing Zhu,Zhenzhu Yang,Adam Nik,Qi Liu,Chenghua Lin,Shi Wang,Ruibo Liu,Wenhu Chen,Ke Xu,Dayiheng Liu,Yike Guo,Jie Fu

from arxiv, 110 pages

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence. This paradigm considers language models as agents capable of observing, acting, and receiving feedback iteratively from external entities. Specifically, language models in this context can: (1) interact with humans for better understanding and addressing user needs, personalizing responses, aligning with human values, and improving the overall user experience; (2) interact with knowledge bases for enriching language representations with factual knowledge, enhancing the contextual relevance of responses, and dynamically leveraging external information to generate more accurate and informed responses; (3) interact with models and tools for effectively decomposing and addressing complex tasks, leveraging specialized expertise for specific subtasks, and fostering the simulation of social behaviors; and (4) interact with environments for learning grounded representations of language, and effectively tackling embodied tasks such as reasoning, planning, and decision-making in response to environmental observations. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept. We then provide a systematic classification of iNLP, dissecting its various components, including interactive objects, interaction interfaces, and interaction methods. We proceed to delve into the evaluation methodologies used in the field, explore its diverse applications, scrutinize its ethical and safety issues, and discuss prospective research directions. This survey serves as an entry point for researchers who are interested in this rapidly evolving area and offers a broad view of the current landscape and future trajectory of iNLP.

次最優 · ML · 極小點 · state-of-the-art · MoDELS ·

2020 年 12 月 10 日

Composite Adversarial Attacks

Xiaofeng Mao,Yuefeng Chen,Shuhui Wang,Hang Su,Yuan He,Hui Xue

from arxiv, To appear in AAAI 2021, code will be released later

Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness. In practice, attack algorithms are artificially selected and tuned by human experts to break a ML system. However, manual selection of attackers tends to be sub-optimal, leading to a mistakenly assessment of model security. In this paper, a new procedure called Composite Adversarial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms and their hyper-parameters from a candidate pool of \textbf{32 base attackers}. We design a search space where attack policy is represented as an attacking sequence, i.e., the output of the previous attacker is used as the initialization input for successors. Multi-objective NSGA-II genetic algorithm is adopted for finding the strongest attack policy with minimum complexity. The experimental result shows CAA beats 10 top attackers on 11 diverse defenses with less elapsed time (\textbf{6 $\times$ faster than AutoAttack}), and achieves the new state-of-the-art on $l_{\infty}$, $l_{2}$ and unrestricted adversarial attacks.

圖卷積神經網絡/圖卷積網絡 · 圖 · 圖卷積 · Networking · 圖卷積網絡 ·

2020 年 7 月 13 日

Distributed Graph Convolutional Networks

Simone Scardapane,Indro Spinelli,Paolo Di Lorenzo

from arxiv, Preprint submitted to IEEE TSIPN

The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. A wide set of numerical results validate our proposal. To the best of our knowledge, this is the first work combining graph convolutional neural networks with distributed optimization.

置信度 · MoDELS · Extensibility · 圖 · entity ·

2019 年 2 月 26 日

Embedding Uncertain Knowledge Graphs

Xuelu Chen,Muhao Chen,Weijia Shi,Yizhou Sun,Carlo Zaniolo

Embedding models for deterministic Knowledge Graphs (KG) have been extensively studied, with the purpose of capturing latent semantic relations between entities and incorporating the structured knowledge into machine learning. However, there are many KGs that model uncertain knowledge, which typically model the inherent uncertainty of relations facts with a confidence score, and embedding such uncertain knowledge represents an unresolved challenge. The capturing of uncertain knowledge will benefit many knowledge-driven applications such as question answering and semantic search by providing more natural characterization of the knowledge. In this paper, we propose a novel uncertain KG embedding model UKGE, which aims to preserve both structural and uncertainty information of relation facts in the embedding space. Unlike previous models that characterize relation facts with binary classification techniques, UKGE learns embeddings according to the confidence scores of uncertain relation facts. To further enhance the precision of UKGE, we also introduce probabilistic soft logic to infer confidence scores for unseen relation facts during training. We propose and evaluate two variants of UKGE based on different learning objectives. Experiments are conducted on three real-world uncertain KGs via three tasks, i.e. confidence prediction, relation fact ranking, and relation fact classification. UKGE shows effectiveness in capturing uncertain knowledge by achieving promising results on these tasks, and consistently outperforms baselines on these tasks.

長短期記憶網絡 · 命名實體識別 · MoDELS · Better · 門控 ·

2018 年 5 月 15 日

Chinese NER Using Lattice LSTM

Yue Zhang,Jie Yang

from arxiv, Accepted at ACL 2018 as Long paper

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

BLEU · MoDELS · 注意力機制 · Transformer · Networking ·

2017 年 12 月 6 日

Attention Is All You Need

Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin

from arxiv, 15 pages, 5 figures

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.