国产乱伦对白刺激视频_国产精品大秀视频日韩无码_男女做爰猛烈吃奶摸A片_偷拍初高中女奶头动态图图片_免费看三级中文字幕_99热动漫这里只有精品_精品丝袜国产自在线拍AV婷婷

Xuetong Li,Yuan Gao,Hong Chang,Danyang Huang,Yingying Ma,Rui Pan,Haobo Qi,Feifei Wang,Shuyuan Wu,Ke Xu,Jing Zhou,Xuening Zhu,Yingqiu Zhu,Hansheng Wang

This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models.

相關內容

統計(ji)量

關注 3

情景 · Extensibility · 可辨認的 · motivation · Analysis ·

2024 年 4 月 27 日

Curve Stabbing Depth: Data Depth for Plane Curves

Stephane Durocher,Alexandre Leblanc,Spencer Szabados

from arxiv, Preprint

Measures of data depth have been studied extensively for point data. Motivated by recent work on analysis, clustering, and identifying representative elements in sets of trajectories, we introduce {\em curve stabbing depth} to quantify how deeply a given curve $Q$ is located relative to a given set $\cal C$ of curves in $\mathbb{R}^2$. Curve stabbing depth evaluates the average number of elements of $\cal C$ stabbed by rays rooted along the length of $Q$. We describe an $O(n^3 + n^2 m\log^2m+nm^2\log^2 m)$-time algorithm for computing curve stabbing depth when $Q$ is an $m$-vertex polyline and $\cal C$ is a set of $n$ polylines, each with $O(m)$ vertices.

MoDELS · 變換 · 語言模型化 · INFORMS · 可理解性 ·

2024 年 4 月 25 日

A Short Survey of Human Mobility Prediction in Epidemic Modeling from Transformers to LLMs

Christian N. Mayemba,D'Jeff K. Nkashama,Jean Marie Tshimula,Maximilien V. Dialufuma,Jean Tshibangu Muabila,Mbuyi Mukendi Didier,Hugues Kanda,René Manassé Galekwa,Heber Dibwe Fita,Serge Mundele,Kalonji Kalala,Aristarque Ilunga,Lambert Mukendi Ntobo,Dominique Muteba,Aaron Aruna Abedi

This paper provides a comprehensive survey of recent advancements in leveraging machine learning techniques, particularly Transformer models, for predicting human mobility patterns during epidemics. Understanding how people move during epidemics is essential for modeling the spread of diseases and devising effective response strategies. Forecasting population movement is crucial for informing epidemiological models and facilitating effective response planning in public health emergencies. Predicting mobility patterns can enable authorities to better anticipate the geographical and temporal spread of diseases, allocate resources more efficiently, and implement targeted interventions. We review a range of approaches utilizing both pretrained language models like BERT and Large Language Models (LLMs) tailored specifically for mobility prediction tasks. These models have demonstrated significant potential in capturing complex spatio-temporal dependencies and contextual patterns in textual data.

分解的 · 潛在 · 估計/估計量 · 統計量 · Facebook AI Research ·

2024 年 4 月 25 日

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Jing Ouyang,Chengyu Cui,Kean Ming Tan,Gongjun Xu

In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA).

Taxonomy · 語言模型化 · MoDELS · 評論員 · 大語言模型 ·

2024 年 4 月 25 日

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

Elizabeth Fons,Rachneet Kaur,Soham Palande,Zhen Zeng,Svitlana Vyetrenko,Tucker Balch

Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.

簇 · 近似 · 圖 · Better · Analysis ·

2024 年 4 月 24 日

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

Vicente Balmaseda,Ying Xu,Yixin Cao,Nate Veldt

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.

Learning · 簇 · Taxonomy · 表示學習 · 聚類方法 ·

2022 年 6 月 15 日

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Sheng Zhou,Hongjia Xu,Zhuonan Zheng,Jiawei Chen,Zhao li,Jiajun Bu,Jia Wu,Xin Wang,Wenwu Zhu,Martin Ester

from arxiv, Github Repo: //github.com/zhoushengisnoob/DeepClustering

Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimensional data type. With the huge success of deep learning, especially the deep unsupervised learning, many representation learning techniques with deep architectures have been proposed in the past decade. Recently, the concept of Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering. Moreover, this survey also provides the popular benchmark datasets, evaluation metrics and open-source implementations to clearly illustrate various experimental settings. Last but not least, we discuss the practical applications of deep clustering and suggest challenging topics deserving further investigations as future directions.

數據增強 · Processing（編程語言） · Better · 多樣性 · 測試數據 ·

2021 年 10 月 5 日

Data Augmentation Approaches in Natural Language Processing: A Survey

Bohan Li,Yutai Hou,Wanxiang Che

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.

Processing（編程語言） · 推斷 · NLP · Computational Linguistics · 估計/估計量 ·

2021 年 9 月 2 日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Amir Feder,Katherine A. Keith,Emaad Manzoor,Reid Pryzant,Dhanya Sridhar,Zach Wood-Doughty,Jacob Eisenstein,Justin Grimmer,Roi Reichart,Margaret E. Roberts,Brandon M. Stewart,Victor Veitch,Diyi Yang

A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.

知識庫 · 基 · 自動問答 · INFORMS · Next ·

2021 年 5 月 25 日

A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions

Yunshi Lan,Gaole He,Jinhao Jiang,Jing Jiang,Wayne Xin Zhao,Ji-Rong Wen

from arxiv, Accepted by IJCAI 2021 Survey Track

Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Recently, a large number of studies focus on semantically or syntactically complicated questions. In this paper, we elaborately summarize the typical challenges and solutions for complex KBQA. We begin with introducing the background about the KBQA task. Next, we present the two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. We then review the advanced methods comprehensively from the perspective of the two categories. Specifically, we explicate their solutions to the typical challenges. Finally, we conclude and discuss some promising directions for future research.

INFORMS · 圖 · 可約的 · 知識圖譜 · 可辨認的 ·

2018 年 8 月 29 日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan,Luheng He,Mari Ostendorf,Hannaneh Hajishirzi

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.