精品亚洲中文一区二区三区_国产欧美日韩精品A在线播放_国产亚洲精品VA在线观看_欧美韩精品屏幕一区二区_国产精彩视频在线观看_轻点好疼好大好爽免费网站_波多野结依一区二区三区

Retracted papers often circulate widely on social media, online news outlets and other websites before their official retraction. The spread of potentially inaccurate or misleading results from retracted papers can harm the scientific community and the public. Here we quantify the amount and type of attention 3,985 retracted papers received over time in different online platforms, ranging from social media to knowledge repositories. Comparing to a set of non-retracted control papers, we show that retracted papers receive more attention after publication. This tendency seems to be more pronounced on news outlets and knowledge repositories. This finding indicates that untrustworthy research penetrates even curated platforms and is often shared uncritically, amplifying the negative impact on the public. At the same time, we find that posts on Twitter tend to express more uncertainty about retracted than about control papers, suggesting that these posts could help identify potentially flawed scientific findings. We also find that, around the time they are retracted, papers generate discussions that are mostly about the retraction incident rather than about the results of the paper, showing that by this point papers have exhausted attention to their findings and highlighting the limited effect of retractions in reducing uncritical conversations. Our findings reveal the extent to which retracted papers are discussed on different online platforms and identify at scale audience skepticism towards them. They also show that retractions come too late, which has implications for efforts to better time retraction notices.

相關內容

可辨認的

關注 4

數據集 · 模型評估 · Continuity · INFORMS · Performer ·

2021 年 12 月 7 日

Presenting a Larger Up-to-date Movie Dataset and Investigating the Effects of Pre-released Attributes on Gross Revenue

Arnab Sen Sharma,Tirtha Roy,Sadique Ahmmod Rifat,Maruf Ahmed Mridul

Movie-making has become one of the most costly and risky endeavors in the entertainment industry. Continuous change in the preference of the audience makes it harder to predict what kind of movie will be financially successful at the box office. So, it is no wonder that cautious, intelligent stakeholders and large production houses will always want to know the probable revenue that will be generated by a movie before making an investment. Researchers have been working on finding an optimal strategy to help investors in making the right decisions. But the lack of a large, up-to-date dataset makes their work harder. In this work, we introduce an up-to-date, richer, and larger dataset that we have prepared by scraping IMDb for researchers and data analysts to work with. The compiled dataset contains the summery data of 7.5 million titles and detail information of more than 200K movies. Additionally, we perform different statistical analysis approaches on our dataset to find out how a movie's revenue is affected by different pre-released attributes such as budget, runtime, release month, content rating, genre etc. In our analysis, we have found that having a star cast/director has a positive impact on generated revenue. We introduce a novel approach for calculating the star power of a movie. Based on our analysis we select a set of attributes as features and train different machine learning algorithms to predict a movie's expected revenue. Based on generated revenue, we classified the movies in 10 categories and achieved a one-class-away accuracy rate of almost 60% (bingo accuracy of 30%). All the generated datasets and analysis codes are available online. We also made the source codes of our scraper bots public, so that researchers interested in extending this work can easily modify these bots as they need and prepare their own up-to-date datasets.

Processing（編程語言） · contrastive · 分解的 · 哈爾濱工業大學（HIT） · 稀疏 ·

2021 年 12 月 7 日

How do scientific papers with different levels of journals spread online? Exploring the temporal dynamics in the diffusion processes

Renmeng Cao,Xiaoke Xu,Yunxue Cui,Zhizhao Fang,Xianwen Wang

Social media has become an important channel for publicizing academic research, which provides an opportunity for each scientific paper to become a hit. Employing a dataset of about 10 million tweets of 584,264 scientific papers from 2012 to 2018, this study investigates the differential diffusion of elite and non-elite journal papers (divided by Average journal impact factor percentile). We find that non-elite journal papers are diffused deeper and farther than elite journal papers, showing a diffusion trend with multiple rounds, sparse, short-duration and small-scale bursts. In contrast, the bursts of elite journals are characterized by a small number of persistent, dense and large-scale bursts. We also discover that elite journal papers are more inclined to broadcast diffusion while non-elite journal papers prefer viral diffusion. Elite journal papers are generally disseminated to many loosely connected communities, while non-elite journal papers are diffused to several densely connected communities.

MSMARCO · 可辨認的 · Performer · 情景 · 話題 ·

2021 年 12 月 6 日

A Sensitivity Analysis of the MSMARCO Passage Collection

Joel Mackenzie,Matthias Petri,Alistair Moffat

The recent MSMARCO passage retrieval collection has allowed researchers to develop highly tuned retrieval systems. One aspect of this data set that makes it distinctive compared to traditional corpora is that most of the topics only have a single answer passage marked relevant. Here we carry out a "what if" sensitivity study, asking whether a set of systems would still have the same relative performance if more passages per topic were deemed to be "relevant", exploring several mechanisms for identifying sets of passages to be so categorized. Our results show that, in general, while run scores can vary markedly if additional plausible passages are presumed to be relevant, the derived system ordering is relatively insensitive to additional relevance, providing support for the methodology that was used at the time the MSMARCO passage collection was created.

潛變量/隱變量 · 估計/估計量 · 端到端 · 自動問答 · INFORMS ·

2021 年 12 月 4 日

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh Sachan,Siva Reddy,William Hamilton,Chris Dyer,Dani Yogatama

from arxiv, NeurIPS 2021 camera-ready version

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

INFORMS · IR · 信息檢索 · 秩 · 學成 ·

2021 年 11 月 27 日

Pre-training Methods in Information Retrieval

Yixing Fan,Xiaohui Xie,Yinqiong Cai,Jia Chen,Xinyu Ma,Xiangsheng Li,Ruqing Zhang,Jiafeng Guo,Yiqun Liu

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the application of PTMs in IR, we believe it is the right time to summarize the current status, learn from existing methods, and gain some insights for future development. In this survey, we present an overview of PTMs applied in different components of IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and envision some promising directions, with the hope of inspiring more works on these topics for future research.

Automator · Processing（編程語言） · 可辨認的 · INFORMS · AIM ·

2021 年 8 月 26 日

A Survey on Automated Fact-Checking

Zhijiang Guo,Michael Schlichtkrull,Andreas Vlachos

from arxiv, 27 pages, 15 pages of references

Fact-checking has become increasingly important due to the speed with which both information and misinformation can spread in the modern media ecosystem. Therefore, researchers have been exploring how fact-checking can be automated, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the veracity of claims. In this paper, we survey automated fact-checking stemming from natural language processing, and discuss its connections to related tasks and disciplines. In this process, we present an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts. Finally, we highlight challenges for future research.

MoDELS · 協同過濾 · INFORMS · Neural Networks · Networks ·

2021 年 4 月 27 日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Le Wu,Xiangnan He,Xiang Wang,Kun Zhang,Meng Wang

from arxiv, In submission

Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.

Facebook · Social Graph · 求逆 · tuning · MoDELS ·

2020 年 6 月 20 日

Embedding-based Retrieval in Facebook Search

Jui-Ting Huang,Ashish Sharma,Shuying Sun,Li Xia,David Zhang,Philip Pronin,Janani Padmanabhan,Giuseppe Ottaviano,Linjun Yang

from arxiv, 9 pages, 3 figures, 3 tables, to be published in KDD '20

Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in eb search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.

INFORMS · MoDELS · 秩 · INTERACT · Performer ·

2018 年 5 月 2 日

Leveraging Long and Short-term Information in Content-aware Movie Recommendation

Wei Zhao,Benyou Wang,Jianbo Ye,Min Yang,Zhou Zhao,Xiaojun Chen

Movie recommendation systems provide users with ranked lists of movies based on individual's preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term models represent the interactions between users and movies that are supposed to change slowly across time, session-based models encode the information of users' interests and changing dynamics of movies' attributes in short terms. In this paper, we propose an LSIC model, leveraging Long and Short-term Information in Content-aware movie recommendation using adversarial training. In the adversarial process, we train a generator as an agent of reinforcement learning which recommends the next movie to a user sequentially. We also train a discriminator which attempts to distinguish the generated list of movies from the real records. The poster information of movies is integrated to further improve the performance of movie recommendation, which is specifically essential when few ratings are available. The experiments demonstrate that the proposed model has robust superiority over competitors and sets the state-of-the-art. We will release the source code of this work after publication.

話題模型 · MoDELS · 話題 · 可辨認的 · INFORMS ·

2018 年 4 月 5 日

Topic Modelling of Everyday Sexism Project Entries

Sophie Melville,Kathryn Eccles,Taha Yasseri

from arxiv, preprint, under review

The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.