非会员试看十分钟做受小视频,亚洲AV永久少妇精品一区在线,日韩在线视频免费

Research articles are being shared in increasing numbers on multiple online platforms. Although the scholarly impact of these articles has been widely studied, the online interest determined by how long the research articles are shared online remains unclear. Being cognizant of how long a research article is mentioned online could be valuable information to the researchers. In this paper, we analyzed multiple social media platforms on which users share and/or discuss scholarly articles. We built three clusters for papers, based on the number of yearly online mentions having publication dates ranging from the year 1920 to 2016. Using the online social media metrics for each of these three clusters, we built machine learning models to predict the long-term online interest in research articles. We addressed the prediction task with two different approaches: regression and classification. For the regression approach, the Multi-Layer Perceptron model performed best, and for the classification approach, the tree-based models performed better than other models. We found that old articles are most evident in the contexts of economics and industry (i.e., patents). In contrast, recently published articles are most evident in research platforms (i.e., Mendeley) followed by social media platforms (i.e., Twitter).

相關內容

在線

關注 0

簇 · Processing（編程語言） · MoDELS · 復合數據 · 模型復雜度 ·

2022 年 10 月 25 日

Bayesian mixture models (in)consistency for the number of clusters

Louise Alamichel,Daria Bystrova,Julyan Arbel,Guillaume Kon Kam King

Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, their application for clustering has some limitations. Miller and Harrison (2014) proved posterior inconsistency in the number of clusters when the true number of clusters is finite for Dirichlet process and Pitman--Yor process mixture models. In this work, we extend this result to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations of them. The latter include the Dirichlet multinomial process and the recently proposed Pitman--Yor and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced by Guha et al. (2021) for the Dirichlet process extends to more general models and provides a consistent method to estimate the number of components.

INFORMS · 估計/估計量 · 可辨認的 · 論文 · 預測器/決策函數 ·

2022 年 10 月 24 日

Predicting Long-Term Citations from Short-Term Linguistic Influence

Sandeep Soni,David Bamman,Jacob Eisenstein

from arxiv, 17 pages, 3 figures, to appear in the Findings of EMNLP 2022

A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of $\textit{future}$ citations: the estimate of linguistic influence from the two years after a paper's publication is correlated with and predictive of its citation count in the following three years. This is demonstrated using an online evaluation with incremental temporal training/test splits, in comparison with a strong baseline that includes predictors for initial citation counts, topics, and lexical features.

TOOLS · 多樣性 · INTERACT · 可理解性 · Processing（編程語言） ·

2022 年 10 月 24 日

Evaluation of Argo Scholar with Observational Study

Kevin Li,Haoyang Yang,Evan Montoya,Anish Upadhayay,Zhiyan Zhou,Jon Saad-Falcon,Duen Horng Chau

from arxiv, VIS IEEE 22

Discovering and making sense of relevant literature is fundamental in any scientific field. Node-link diagram-based visualization tools can aid this process; however, existing tools have been evaluated only on small scales. This paper evaluates Argo Scholar, an open-source visualization tool designed for interactive exploration of literature and easy sharing of exploration results. A large-scale user study of 122 participants from diverse backgrounds and experiences showed that Argo Scholar is effective at helping users find related work and understand paper connections, and incremental graph-based exploration is effective across diverse disciplines. Based on the user study and user feedback, we provide design considerations and feature suggestions for future work.

簇 · INFORMS · ICS · 統計量 · 估計/估計量 ·

2022 年 10 月 24 日

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

Samuel Anyaso-Samuel,Somnath Datta

from arxiv, 19 pages, 4 figures, 2 tables

Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intra-cluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study, and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.

簇 · 分離的 · 推斷 · Processing（編程語言） · 統計量 ·

2022 年 10 月 24 日

Post-clustering difference testing: valid inference and practical considerations

Benjamin Hivert,Denis Agniel,Rodolphe Thiébaut,Boris P Hejblum

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to infer the variables that significantly separate the estimated clusters from each other. However, data-driven hypotheses are considered for the inference process, since the hypotheses are derived from the clustering results. This double use of the data leads traditional hypothesis test to fail to control the Type I error rate particularly because of uncertainty in the clustering process and the potential artificial differences it could create. We propose three novel statistical hypothesis tests which account for the clustering process. Our tests efficiently control the Type I error rate by identifying only variables that contain a true signal separating groups of observations.

控制器 · FAST · state-of-the-art · 預測器/決策函數 · CC ·

2022 年 10 月 23 日

The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control

Jasin Machkour,Michael Muma,Daniel P. Palomar

from arxiv, 32 pages, 24 figures, 2 tables, R packages 'TRexSelector' and 'tlars' on CRAN

We propose the Terminating-Random Experiments (T-Rex) selector, a fast variable selection method for high-dimensional data. The T-Rex selector controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations confirm that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the dummies can be sampled from any univariate probability distribution with finite expectation and variance. The computational complexity of the proposed method is linear in the number of variables. The T-Rex selector outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its sequential computation time is more than two orders of magnitude lower than that of the strongest benchmark methods. The open source R package TRexSelector containing the implementation of the T-Rex selector is available on CRAN.

控制器 · 統計量 · 可辨認的 · 數據拆分 · 方陣 ·

2022 年 10 月 22 日

Model-free variable selection in sufficient dimension reduction via FDR control

Yixin Han,Xu Guo,Changliang Zou

from arxiv, 55 pages, 5 figures, 5 tables

Simultaneously identifying contributory variables and controlling the false discovery rate (FDR) in high-dimensional data is an important statistical problem. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction via data splitting technique. The variable selection problem is first connected with a least square procedure with several response transformations. We construct a series of statistics with global symmetry property and then utilize the symmetry to derive a data-driven threshold to achieve error rate control. This method can achieve finite-sample and asymptotic FDR control under some mild conditions. Numerical experiments indicate that our procedure has satisfactory FDR control and higher power compared with existing methods.

推斷 · 統計量 · 估計/估計量 · 樣本 · 頻率主義學派 ·

2022 年 10 月 22 日

Inference on the Best Policies with Many Covariates

Waverly Wei,Yuqing Zhou,Zeyu Zheng,Jingshen Wang

from arxiv, Accepted by The Journal of Econometrics

Understanding the impact of the most effective policies or treatments on a response variable of interest is desirable in many empirical works in economics, statistics and other disciplines. Due to the widespread winner's curse phenomenon, conventional statistical inference assuming that the top policies are chosen independent of the random sample may lead to overly optimistic evaluations of the best policies. In recent years, given the increased availability of large datasets, such an issue can be further complicated when researchers include many covariates to estimate the policy or treatment effects in an attempt to control for potential confounders. In this manuscript, to simultaneously address the above-mentioned issues, we propose a resampling-based procedure that not only lifts the winner's curse in evaluating the best policies observed in a random sample, but also is robust to the presence of many covariates. The proposed inference procedure yields accurate point estimates and valid frequentist confidence intervals that achieve the exact nominal level as the sample size goes to infinity for multiple best policy effect sizes. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and two empirical studies, evaluating the most effective policies in charitable giving and the most beneficial group of workers in the National Supported Work program.

分解的 · 相同 · MoDELS · SimPLe · 估計/估計量 ·

2022 年 10 月 21 日

Bayes factors and posterior estimation: Two sides of the very same coin

Harlan Campbell,Paul Gustafson

from arxiv, 12 pages

Recently, several researchers have claimed that conclusions obtained from a Bayes factor (or the posterior odds) may contradict those obtained from Bayesian posterior estimation. In this short paper, we wish to point out that no such "incompatibility" exists if one is willing to consistently define one's priors and posteriors. The key for compatibility is that the (implied) prior model odds used for testing are the same as those used for estimation. Our recommendation is simple: If one reports a Bayes factor comparing two models, then one should also report posterior estimates which appropriately acknowledge the uncertainty with regards to which of the two models is correct.

可辨認的 · Cognition · 分解的 · 有偏 · GROUP ·

2022 年 10 月 21 日

Approaches to Identify Vulnerabilities to Misinformation: A Research Agenda

Nattapat Boonprakong,Benjamin Tag,Tilman Dingler

from arxiv, Position paper to a CHI 2022 workshop: Designing Credibility Tools To Combat Mis/Disinformation

Given the prevalence of online misinformation and our scarce cognitive capacity, Internet users have been shown to frequently fall victim to such information. As some studies have investigated psychological factors that make people susceptible to believe or share misinformation, some ongoing research further put these findings into practice by objectively identifying when and which users are vulnerable to misinformation. In this position paper, we highlight two ongoing avenues of research to identify vulnerable users: detecting cognitive biases and exploring misinformation spreaders. We also discuss the potential implications of these objective approaches: discovering more cohorts of vulnerable users and prompting interventions to more effectively address the right group of users. Lastly, we point out two of the understudied contexts for misinformation vulnerability research as opportunities for future research.