欧美综合一本热第九页_精品亚洲高清一区二区三区电影_男人午夜A天堂一区二区三区_日韩免费无卡一区二区三区_黄色网站在线看免费99_无码又黄又湿又免费视频下载_欧美激情视频动漫在线观看一区

Controlled topical vocabularies (CVs) are built into information systems to aid browsing and retrieval of items that may be unfamiliar, but it is unclear how this feature should be integrated with standard keyword searching. Few systems or scholarly prototypes have attempted this, and none have used the most widely used CV, the Library of Congress Subject Headings (LCSH), which organizes monograph collections in academic libraries throughout the world. This paper describes a working prototype of a Web application that concurrently allows topic exploration using an outline tree view of the LCSH hierarchy and natural language keyword searching of a real-world Science and Engineering bibliographic collection. Pilot testing shows the system is functional, and work to fit the complex LCSH structure into a usable hierarchy is ongoing. This study contributes to knowledge of the practical design decisions required when developing linked interactions between topical hierarchy browsing and natural language searching, which promise to facilitate information discovery and exploration.

相關內容

INFORMS

關注 10

《計算機信息》雜志發表高質量的論文，擴大了運籌學和計算的范圍，尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文，以及描述新的和有用的軟件工具的論文。官網鏈接： · 數據可視化 · 有向 · INFORMS · Seven ·

2021 年 9 月 8 日

Towards Natural Language Interfaces for Data Visualization: A Survey

Leixian Shen,Enya Shen,Yuyu Luo,Xiaocong Yang,Xuming Hu,Xiongshuai Zhang,Zhiwei Tai,Jianmin Wang

from arxiv, 20 pages, 15 figures

Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than worrying about operating the interface to visualization tools. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed both within academic research and commercial software, especially in recent years. In this article, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer. The following seven stages are used: query understanding, data transformation, visual mapping, view transformation, human interaction, context management, and presentation. Finally, we also shed light on several promising directions for future work in the community.

秩 · 語言模型化 · MoDELS · state-of-the-art · Extensibility ·

2021 年 6 月 25 日

Pre-trained Language Model based Ranking in Baidu Search

Lixin Zou,Shengqiang Zhang,Hengyi Cai,Dehong Ma,Suqi Cheng,Daiting Shi,Zhifan Zhu,Weiyue Su,Shuaiqiang Wang,Zhicong Cheng,Dawei Yin

from arxiv, 9-pages, 3 figures, 7 tables, SIGKDD 2021 accepted paper

As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues:(1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web-document, prohibit their deployments in an online ranking system that demands extremely low latency;(2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system;(3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engine's performance.

Integration · 圖 · 知識圖譜 · 語言模型化 · Performer ·

2020 年 12 月 21 日

CSKG: The CommonSense Knowledge Graph

Filip Ilievski,Pedro Szekely,Bin Zhang

from arxiv, arXiv admin note: substantial text overlap with arXiv:2006.06114

Sources of commonsense knowledge aim to support applications in natural language understanding, computer vision, and knowledge graphs. These sources contain complementary knowledge to each other, which makes their integration desired. Yet, such integration is not trivial because of their different foci, modeling approaches, and sparse overlap. In this paper, we propose to consolidate commonsense knowledge by following five principles. We apply these principles to combine seven key sources into a first integrated CommonSense Knowledge Graph (CSKG). We perform analysis of CSKG and its various text and graph embeddings, showing that CSKG is a well-connected graph and that its embeddings provide a useful entry point to the graph. Moreover, we show the impact of CSKG as a source for reasoning evidence retrieval, and for pre-training language models for generalizable downstream reasoning. CSKG and all its embeddings are made publicly available to support further research on commonsense knowledge integration and reasoning.

Facebook · Social Graph · 求逆 · tuning · MoDELS ·

2020 年 6 月 20 日

Embedding-based Retrieval in Facebook Search

Jui-Ting Huang,Ashish Sharma,Shuying Sun,Li Xia,David Zhang,Philip Pronin,Janani Padmanabhan,Giuseppe Ottaviano,Linjun Yang

from arxiv, 9 pages, 3 figures, 3 tables, to be published in KDD '20

Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in eb search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.

MINE · 可理解性 · Taxonomy · Extensibility · 騰訊 QQ ·

2019 年 5 月 21 日

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Bang Liu,Weidong Guo,Di Niu,Chaoyue Wang,Shunnan Xu,Jinghong Lin,Kunfeng Lai,Yu Xu

from arxiv, Accepted by KDD 2019

Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.

漢字識別 · 文字識別 · 屬性空間 · 類別 · 卷積神經網絡 ·

2018 年 8 月 27 日

Open Set Chinese Character Recognition using Multi-typed Attributes

Sheng He,Lambert Schomaker

from arxiv, 29 pages, submitted to Pattern Recognition

Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.

情感分析 · 推薦系統 · 基準 · 數據集 · 混合 ·

2018 年 3 月 18 日

Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017

Braja Gopal Patra,Dipankar Das,Amitava Das

Sentiment analysis is essential in many real-world applications such as stance detection, review analysis, recommendation system, and so on. Sentiment analysis becomes more difficult when the data is noisy and collected from social media. India is a multilingual country; people use more than one languages to communicate within themselves. The switching in between the languages is called code-switching or code-mixing, depending upon the type of mixing. This paper presents overview of the shared task on sentiment analysis of code-mixed data pairs of Hindi-English and Bengali-English collected from the different social media platform. The paper describes the task, dataset, evaluation, baseline and participant's systems.

Processing（編程語言） · NLP · INFORMS · 自動問答 · 注意力機制 ·

2017 年 8 月 17 日

Natural Language Processing: State of The Art, Current Trends and Challenges

Diksha Khurana,Aditya Koli,Kiran Khatter,Sukhdev Singh

from arxiv, 25 pages

Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally. It has spread its applications in various fields such as machine translation, email spam detection, information extraction, summarization, medical, and question answering etc. The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution of NLP, state of the art presenting the various applications of NLP and current trends and challenges.

大數據 · 可理解性 · 復合數據 · Processing（編程語言） · Better ·

2016 年 1 月 15 日

Big Data: Understanding Big Data

Kevin Taylor-Sakyi

from arxiv, 8 pages, Big Data Analytics, Data Storage, MapReduce, Knowledge-Space, Big Data Inconsistencies

Steve Jobs, one of the greatest visionaries of our time was quoted in 1996 saying "a lot of times, people do not know what they want until you show it to them" [38] indicating he advocated products to be developed based on human intuition rather than research. With the advancements of mobile devices, social networks and the Internet of Things, enormous amounts of complex data, both structured and unstructured are being captured in hope to allow organizations to make better business decisions as data is now vital for an organizations success. These enormous amounts of data are referred to as Big Data, which enables a competitive advantage over rivals when processed and analyzed appropriately. However Big Data Analytics has a few concerns including Management of Data-lifecycle, Privacy & Security, and Data Representation. This paper reviews the fundamental concept of Big Data, the Data Storage domain, the MapReduce programming paradigm used in processing these large datasets, and focuses on two case studies showing the effectiveness of Big Data Analytics and presents how it could be of greater good in the future if handled appropriately.

判別器 · Color · Oracle · 多媒體 · 模式識別 ·

2012 年 11 月 20 日

Content based video retrieval

B. V. Patel,B. B. Meshram

Content based video retrieval is an approach for facilitating the searching and browsing of large image collections over World Wide Web. In this approach, video analysis is conducted on low level visual properties extracted from video frame. We believed that in order to create an effective video retrieval system, visual perception must be taken into account. We conjectured that a technique which employs multiple features for indexing and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate this claim, content based indexing and retrieval systems were implemented using color histogram, various texture features and other approaches. Videos were stored in Oracle 9i Database and a user study measured correctness of response.