销魂美女一区二区三区AV,一级A婬片试看28分钟,国产精品免费aⅴ片在线观看

Various stakeholders, such as researchers, government agencies, businesses, and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work. These data are crucial for a variety of application, such as advancing scientific research, conducting business evaluations, and undertaking policy analysis. However, collecting such data is often a time-consuming and laborious task. Consequently, many users turn to using openly accessible data for their research. However, these existing open dataset releases typically suffer from lack of relationship between different data sources and a limited temporal coverage. To address this issue, we present a new open dataset, the Intelligent Innovation Dataset (IIDS), which comprises six interrelated datasets spanning nearly 120 years, encompassing paper information, paper citation relationships, patent details, patent legal statuses, and funding information. The extensive contextual and extensive temporal coverage of the IIDS dataset will provide researchers and practitioners and policy maker with comprehensive data support, enabling them to conduct in-depth scientific research and comprehensive data analyses.

相關內容

數據集

關注 88

數據集，又稱為資料集、數據集合或資料集合，是一種由數據所組成的集合。
Data set（或dataset）是一個數據的集合，通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量，如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數，該數據集的數據可能包括一個或多個成員。

MoDELS · 控制器 · 估計/估計量 · AIM · 可辨認的 ·

2024 年 11 月 5 日

Bayesian Controlled FDR Variable Selection via Knockoffs

Lorenzo Focardi-Olmi,Anna Gottard,Michele Guindani,Marina Vannucci

In many research fields, researchers aim to identify significant associations between a set of explanatory variables and a response while controlling the false discovery rate (FDR). To this aim, we develop a fully Bayesian generalization of the classical model-X knockoff filter. Knockoff filter introduces controlled noise in the model in the form of cleverly constructed copies of the predictors as auxiliary variables. In our approach we consider the joint model of the covariates and the response and incorporate the conditional independence structure of the covariates into the prior distribution of the auxiliary knockoff variables. We further incorporate the estimation of a graphical model among the covariates, which in turn aids knockoffs generation and improves the estimation of the covariate effects on the response. We use a modified spike-and-slab prior on the regression coefficients, which avoids the increase of the model dimension as typical in the classical knockoff filter. Our model performs variable selection using an upper bound on the posterior probability of non-inclusion. We show how our model construction leads to valid model-X knockoffs and demonstrate that the proposed characterization is sufficient for controlling the BFDR at an arbitrary level, in finite samples. We also show that the model selection is robust to the estimation of the precision matrix. We use simulated data to demonstrate that our proposal increases the stability of the selection with respect to classical knockoff methods, as it relies on the entire posterior distribution of the knockoff variables instead of a single sample. With respect to Bayesian variable selection methods, we show that our selection procedure achieves comparable or better performances, while maintaining control over the FDR. Finally, we show the usefulness of the proposed model with an application to real data.

Facebook AI Research · GROUP · 統計量 · SimPLe · MoDELS ·

2024 年 11 月 4 日

The Intersectionality Problem for Algorithmic Fairness

Johannes Himmelreich,Arbie Hsu,Kristian Lum,Ellen Veomett

from arxiv, 18 pages, 3 figures

A yet unmet challenge in algorithmic fairness is the problem of intersectionality, that is, achieving fairness across the intersection of multiple groups -- and verifying that such fairness has been attained. Because intersectional groups tend to be small, verifying whether a model is fair raises statistical as well as moral-methodological challenges. This paper (1) elucidates the problem of intersectionality in algorithmic fairness, (2) develops desiderata to clarify the challenges underlying the problem and guide the search for potential solutions, (3) illustrates the desiderata and potential solutions by sketching a proposal using simple hypothesis testing, and (4) evaluates, partly empirically, this proposal against the proposed desiderata.

Engineering · Processing（編程語言） · 均值 · 情景 · 講稿 ·

2024 年 10 月 31 日

Teaching Theorizing in Software Engineering Research

Klaas-Jan Stol

from arxiv, 38 pages, 5 tables, 6 figures

This chapter seeks to support software engineering (SE) researchers and educators in teaching the importance of theory as well as the theorizing process. Drawing on insights from other fields, the chapter presents 12 intermediate products of theorizing and what they mean in an SE context. These intermediate products serve different roles: some are theory products to frame research studies, some are theory generators, and others are components of theory. Whereas the SE domain doesn't have many theories of its own, these intermediate products of theorizing can be found widely. The chapter aims to help readers to recognize these intermediate products, their role, and how they can help in the theorizing process within SE research. To illustrate their utility, the chapter then applies the set of intermediate theorizing products to the software architecture research field. The chapter ends with a suggested structure for a 12-week course on theorizing in SE which can be readily adapted by educators.

知識 (knowledge) · MoDELS · 語言模型化 · 學成 · 可理解性 ·

2024 年 10 月 31 日

Probing Language Models on Their Knowledge Source

Zineddine Tighidet,Andrea Mogini,Jiali Mei,Benjamin Piwowarski,Patrick Gallinari

from arxiv, Accepted at BlackBoxNLP@EMNLP2024

Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK). Understanding how LLMs models prioritize one knowledge source over the other remains a challenge. In this paper, we propose a novel probing framework to explore the mechanisms governing the selection between PK and CK in LLMs. Using controlled prompts designed to contradict the model's PK, we demonstrate that specific model activations are indicative of the knowledge source employed. We evaluate this framework on various LLMs of different sizes and demonstrate that mid-layer activations, particularly those related to relations in the input, are crucial in predicting knowledge source selection, paving the way for more reliable models capable of handling knowledge conflicts effectively.

邊 · Learning · 可約的 · 邊緣設備 · 推斷 ·

2022 年 10 月 6 日

Enabling Deep Learning on Edge Devices

Zhongnan Qu

from arxiv, PhD thesis at ETH Zurich

Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.

Machine Translation · 估計/估計量 · 機器翻譯 · MoDELS · 統計量 ·

2022 年 2 月 22 日

An Overview on Machine Translation Evaluation

Lifeng Han

from arxiv, 35 pages, in Chinese

Since the 1950s, machine translation (MT) has become one of the important tasks of AI and development, and has experienced several different periods and stages of development, including rule-based methods, statistical methods, and recently proposed neural network-based learning methods. Accompanying these staged leaps is the evaluation research and development of MT, especially the important role of evaluation methods in statistical translation and neural translation research. The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers on the problems existing in machine translation itself, how to improve and how to optimise. In some practical application fields, such as in the absence of reference translations, the quality estimation of machine translation plays an important role as an indicator to reveal the credibility of automatically translated target languages. This report mainly includes the following contents: a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress, including human evaluation, automatic evaluation, and evaluation of evaluation methods (meta-evaluation). Manual evaluation and automatic evaluation include reference-translation based and reference-translation independent participation; automatic evaluation methods include traditional n-gram string matching, models applying syntax and semantics, and deep learning models; evaluation of evaluation methods includes estimating the credibility of human evaluations, the reliability of the automatic evaluation, the reliability of the test set, etc. Advances in cutting-edge evaluation methods include task-based evaluation, using pre-trained language models based on big data, and lightweight optimisation models using distillation techniques.

圖形處理器 · Weight · 學成 · 遷移學習 · Performer ·

2021 年 7 月 20 日

Adaptive Transfer Learning on Graph Neural Networks

Xueting Han,Zhenhuan Huang,Bang An,Jing Bai

Graph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. However, there is an inherent gap between self-supervised tasks and downstream tasks in terms of optimization objective and training data. Conventional pre-training methods may be not effective enough on knowledge transfer since they do not make any adaptation for downstream tasks. To solve such problems, we propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task. Our methods would adaptively select and combine different auxiliary tasks with the target task in the fine-tuning stage. We design an adaptive auxiliary loss weighting model to learn the weights of auxiliary tasks by quantifying the consistency between auxiliary tasks and the target task. In addition, we learn the weighting model through meta-learning. Our methods can be applied to various transfer learning approaches, it performs well not only in multi-task learning but also in pre-training and fine-tuning. Comprehensive experiments on multiple downstream tasks demonstrate that the proposed methods can effectively combine auxiliary tasks with the target task and significantly improve the performance compared to state-of-the-art methods.

語音合成 · AIM · 學成 · 可理解性 · 穩健性 ·

2021 年 6 月 30 日

A Survey on Neural Speech Synthesis

Xu Tan,Tao Qin,Frank Soong,Tie-Yan Liu

from arxiv, A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 450 references

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.

估計/估計量 · contrastive · INFORMS · 互信息 · 表示學習 ·

2021 年 6 月 25 日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet

from arxiv, ICML 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

推斷 · 估計/估計量 · 統計量 · Machine Learning · 學成 ·

2020 年 2 月 5 日

A Survey on Causal Inference

Liuyi Yao,Zhixuan Chu,Sheng Li,Yaliang Li,Jing Gao,Aidong Zhang

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.