成年人日屄视频免费观看_国产亚洲一区二区三区在线_日本一区二区三区免费在线观看_国产一区二区三区在线网站_粉嫩小泬视频无码视频在线观看_亚洲国产欧美日韩V一区二区_国产女仆色成人精品免费视频

Marco Esters,Corey Oses,Simon Divilov,Hagen Eckert,Rico Friedrich,David Hicks,Michael J. Mehl,Frisco Rose,Andriy Smolyanyuk,Arrigo Calzolari,Xiomara Campilongo,Cormac Toher,Stefano Curtarolo

from arxiv, 32 pages, 8 figures

To enable materials databases supporting computational and experimental research, it is critical to develop platforms that both facilitate access to the data and provide the tools used to generate/analyze it - all while considering the diversity of users' experience levels and usage needs. The recently formulated FAIR principles (Findable, Accessible, Interoperable, and Reusable) establish a common framework to aid these efforts. This article describes aflow.org, a web ecosystem developed to provide FAIR - compliant access to the AFLOW databases. Graphical and programmatic retrieval methods are offered, ensuring accessibility for all experience levels and data needs. aflow.org goes beyond data-access by providing applications to important features of the AFLOW software, assisting users in their own calculations without the need to install the entire high-throughput framework. Outreach commitments to provide AFLOW tutorials and materials science education to a global and diverse audiences will also be presented.

相關內容

WEB

關注 0

CASE · Vision · INFORMS · 講稿 · 設計 ·

2022 年 9 月 19 日

Mapping STI ecosystems via Open Data: overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark

Nicandro Bovenzi,Nicolau Duran-Silva,Francesco Alessandro Massucci,Francesco Multari,Cèsar Parra-Rojas,Josep Pujol-Llatse

from arxiv, International Conference on Theory and Practice of Digital Libraries (TPDL) 2022

Science, Technology and Innovation (STI) decision-makers often need to have a clear vision of what is researched and by whom to design effective policies. Such a vision is provided by effective and comprehensive mappings of the research activities carried out within their institutional boundaries. A major challenge to be faced in this context is the difficulty in accessing the relevant data and in combining information coming from different sources: indeed, traditionally, STI data has been confined within closed data sources and, when available, it is categorised with different taxonomies. Here, we present a proof-of-concept study of the use of Open Resources to map the research landscape on the Sustainable Development Goal (SDG) 13-Climate Action, for an entire country, Denmark, and we map it on the 25 ERC panels.

WEB · 可理解性 · Internet Archive · Pandora · 知識 (knowledge) ·

2022 年 9 月 18 日

Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

Himarsha R. Jayanetti,Shawn M. Jones,Martin Klein,Alex Osbourne,Paul Koerbin,Michael L. Nelson,Michele C. Weigle

from arxiv, 5 figures, 16 pages, accepted for publication at TPDL 2022

As web archives' holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms: : Archive-It, Conifer, the Croatian Web Archive (HAW), the Internet Archive's user account web archives, Library of Congress (LC), PANDORA, Trove, and the UK Web Archive (UKWA). We note a plethora of different approaches to web archive collection structures. Some web archive collections support sub-collections and some permit embargoes. Curatorial decisions may be attributed to a single organization or many. Archived web pages are known by many names: mementos, copies, captures, or snapshots. Some platforms restrict a memento to a single collection and others allow mementos to cross collections. Knowledge of collection structures has implications for many different applications and users. Visitors will need to understand how to navigate collections. Future archivists will need to understand what options are available for designing collections. Platform designers need it to know what possibilities exist. The developers of tools that consume collections need to understand collection structures so they can meet the needs of their users.

值域 · search engine · Analysis · Processing（編程語言） · MoDELS ·

2022 年 9 月 17 日

Secure and Efficient Query Processing in Outsourced Databases

Dmytro Bogatov

from arxiv, Ph.D. thesis

Various cryptographic techniques are used in outsourced database systems to ensure data privacy while allowing for efficient querying. This work proposes a definition and components of a new secure and efficient outsourced database system, which answers various types of queries, with different privacy guarantees in different security models. This work starts with the survey of five order-revealing encryption schemes that can be used directly in many database indices and five range query protocols with various security / efficiency tradeoffs. The survey systematizes the state-of-the-art range query solutions in a snapshot adversary setting and offers some non-obvious observations regarding the efficiency of the constructions. In $\mathcal{E}\text{psolute}$, a secure range query engine, security is achieved in a setting with a much stronger adversary where she can continuously observe everything on the server, and leaking even the result size can enable a reconstruction attack. $\mathcal{E}\text{psolute}$ proposes a definition, construction, analysis, and experimental evaluation of a system that provably hides both access pattern and communication volume while remaining efficient. The work concludes with $k\text{-a}n\text{o}n$ -- a secure similarity search engine in a snapshot adversary model. The work presents a construction in which the security of $k\text{NN}$ queries is achieved similarly to OPE / ORE solutions -- encrypting the input with an approximate Distance Comparison Preserving Encryption scheme so that the inputs, the points in a hyperspace, are perturbed, but the query algorithm still produces accurate results. We use TREC datasets and queries for the search, and track the rank quality metrics such as MRR and nDCG. For the attacks, we build an LSTM model that trains on the correlation between a sentence and its embedding and then predicts words from the embedding.

INFORMS · MoDELS · 哈爾濱工業大學（HIT） · 圖 · Attention ·

2022 年 9 月 15 日

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites

Simiao Zuo,Qingyu Yin,Haoming Jiang,Shaohui Xi,Bing Yin,Chao Zhang,Tuo Zhao

E-commerce queries are often short and ambiguous. Consequently, query understanding often uses query rewriting to disambiguate user-input queries. While using e-commerce search tools, users tend to enter multiple searches, which we call context, before purchasing. These history searches contain contextual insights about users' true shopping intents. Therefore, modeling such contextual information is critical to a better query rewriting model. However, existing query rewriting models ignore users' history behaviors and consider only the instant search query, which is often a short string offering limited information about the true shopping intent. We propose an end-to-end context-aware query rewriting model to bridge this gap, which takes the search context into account. Specifically, our model builds a session graph using the history search queries and their contained words. We then employ a graph attention mechanism that models cross-query relations and computes contextual information of the session. The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network. The session representations are then decoded to generate rewritten queries. Empirically, we demonstrate the superiority of our method to state-of-the-art approaches under various metrics. On in-house data from an online shopping platform, by introducing contextual information, our model achieves 11.6% improvement under the MRR (Mean Reciprocal Rank) metric and 20.1% improvement under the HIT@16 metric (a hit rate metric), in comparison with the best baseline method (Transformer-based model).

知識 (knowledge) · Use Case · Analysis · 可辨認的 · 圖 ·

2022 年 9 月 15 日

Knowledge Graph Induction enabling Recommending and Trend Analysis: A Corporate Research Community Use Case

Nandana Mihindukulasooriya,Mike Sava,Gaetano Rossiello,Md Faisal Mahbub Chowdhury,Irene Yachbes,Aditya Gidh,Jillian Duckwitz,Kovit Nisar,Michael Santos,Alfio Gliozzo

from arxiv, Accepted at ISWC 2022

A research division plays an important role of driving innovation in an organization. Drawing insights, following trends, keeping abreast of new research, and formulating strategies are increasingly becoming more challenging for both researchers and executives as the amount of information grows in both velocity and volume. In this paper we present a use case of how a corporate research community, IBM Research, utilizes Semantic Web technologies to induce a unified Knowledge Graph from both structured and textual data obtained by integrating various applications used by the community related to research projects, academic papers, datasets, achievements and recognition. In order to make the Knowledge Graph more accessible to application developers, we identified a set of common patterns for exploiting the induced knowledge and exposed them as APIs. Those patterns were born out of user research which identified the most valuable use cases or user pain points to be alleviated. We outline two distinct scenarios: recommendation and analytics for business use. We will discuss these scenarios in detail and provide an empirical evaluation on entity recommendation specifically. The methodology used and the lessons learned from this work can be applied to other organizations facing similar challenges.

TOOLS · 統計量 · MoDELS · 情景 · INFORMS ·

2022 年 9 月 15 日

A new set of tools for goodness-of-fit validation

Gilles R. Ducharme,Teresa Ledwina

from arxiv, 35 pages, 10 figures, submitted to Biometrika

We introduce two new tools to assess the validity of statistical distributions. These tools are based on components derived from a new statistical quantity, the $comparison$ $curve$. The first tool is a graphical representation of these components on a $bar$ $plot$ (B plot), which can provide a detailed appraisal of the validity of the statistical model, in particular when supplemented by acceptance regions related to the model. The knowledge gained from this representation can sometimes suggest an existing $goodness$-$of$-$fit$ test to supplement this visual assessment with a control of the type I error. Otherwise, an adaptive test may be preferable and the second tool is the combination of these components to produce a powerful $\chi^2$-type goodness-of-fit test. Because the number of these components can be large, we introduce a new selection rule to decide, in a data driven fashion, on their proper number to take into consideration. In a simulation, our goodness-of-fit tests are seen to be powerwise competitive with the best solutions that have been recommended in the context of a fully specified model as well as when some parameters must be estimated. Practical examples show how to use these tools to derive principled information about where the model departs from the data.

數據集 · 可理解性 · 講稿 · Analysis · 可辨認的 ·

2022 年 9 月 15 日

ImageSubject: A Large-scale Dataset for Subject Detection

Xin Miao,Jiayi Liu,Huayan Wang,Jun Fu

Main subjects usually exist in the images or videos, as they are the objects that the photographer wants to highlight. Human viewers can easily identify them but algorithms often confuse them with other objects. Detecting the main subjects is an important technique to help machines understand the content of images and videos. We present a new dataset with the goal of training models to understand the layout of the objects and the context of the image then to find the main subjects among them. This is achieved in three aspects. By gathering images from movie shots created by directors with professional shooting skills, we collect the dataset with strong diversity, specifically, it contains 107\,700 images from 21\,540 movie shots. We labeled them with the bounding box labels for two classes: subject and non-subject foreground object. We present a detailed analysis of the dataset and compare the task with saliency detection and object detection. ImageSubject is the first dataset that tries to localize the subject in an image that the photographer wants to highlight. Moreover, we find the transformer-based detection model offers the best result among other popular model architectures. Finally, we discuss the potential applications and conclude with the importance of the dataset.

Analysis · Medical Image Analysis · ML · MoDELS · 設計 ·

2022 年 9 月 14 日

The need for a more human-centered approach to designing and validating transparent AI in medical image analysis -- Guidelines and Evidence from a Systematic Review

Haomin Chen,Catalina Gomez,Chien-Ming Huang,Mathias Unberath

Transparency in Machine Learning (ML), attempts to reveal the working mechanisms of complex models. Transparent ML promises to advance human factors engineering goals of human-centered AI in the target users. From a human-centered design perspective, transparency is not a property of the ML model but an affordance, i.e. a relationship between algorithm and user; as a result, iterative prototyping and evaluation with users is critical to attaining adequate solutions that afford transparency. However, following human-centered design principles in healthcare and medical image analysis is challenging due to the limited availability of and access to end users. To investigate the state of transparent ML in medical image analysis, we conducted a systematic review of the literature. Our review reveals multiple severe shortcomings in the design and validation of transparent ML for medical image analysis applications. We find that most studies to date approach transparency as a property of the model itself, similar to task performance, without considering end users during neither development nor evaluation. Additionally, the lack of user research, and the sporadic validation of transparency claims put contemporary research on transparent ML for medical image analysis at risk of being incomprehensible to users, and thus, clinically irrelevant. To alleviate these shortcomings in forthcoming research while acknowledging the challenges of human-centered design in healthcare, we introduce the INTRPRT guideline, a systematic design directive for transparent ML systems in medical image analysis. The INTRPRT guideline suggests formative user research as the first step of transparent model design to understand user needs and domain requirements. Following this process produces evidence to support design choices, and ultimately, increases the likelihood that the algorithms afford transparency.

Performer · Machine Learning · 模型性能 · MoDELS · Processing（編程語言） ·

2021 年 8 月 2 日

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu,Luwei Xiao,Yixuan Sun,Junhang Zhang,Tianlong Ma,Liang He

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.

蒸餾 · MoDELS · 學成 · Student-Teacher · Vision ·

2021 年 6 月 17 日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Lin Wang,Kuk-Jin Yoon

from arxiv, Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI),2021. Some references are updated in this version

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.