亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 得分 · 查準率/準確率 · Less · Better ·
2023 年 8 月 30 日

A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.

Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.

Random linear codes (RLCs) are well known to have nice combinatorial properties and near-optimal parameters in many different settings. However, getting explicit constructions matching the parameters of RLCs is challenging, and RLCs are hard to decode efficiently. This motivated several previous works to study the problem of partially derandomizing RLCs, by applying certain operations to an explicit mother code. Among them, one of the most well studied operations is random puncturing, where a series of works culminated in the work of Guruswami and Mosheiff (FOCS' 22), which showed that a random puncturing of a low-biased code is likely to possess almost all interesting local properties of RLCs. In this work, we provide an in-depth study of another, dual operation of random puncturing, known as random shortening, which can be viewed equivalently as random puncturing on the dual code. Our main results show that for any small $\varepsilon$, by starting from a mother code with certain weaker conditions (e.g., having a large distance) and performing a random (or even pseudorandom) shortening, the new code is $\varepsilon$-biased with high probability. Our results hold for any field size and yield a shortened code with constant rate. This can be viewed as a complement to random puncturing, and together, we can obtain codes with properties like RLCs from weaker initial conditions. Our proofs involve several non-trivial methods of estimating the weight distribution of codewords, which may be of independent interest.

Crossed random effects structures arise in many scientific contexts. They raise severe computational problems with likelihood and Bayesian computations scaling like $N^{3/2}$ or worse for $N$ data points. In this paper we develop a composite likelihood approach for crossed random effects probit models. For data arranged in rows and columns, one likelihood uses marginal distributions of the responses as if they were independent, another uses a hierarchical model capturing all within row dependence as if the rows were independent and the third model reverses the roles of rows and columns. We find that this method has a cost that grows as $\mathrm{O}(N)$ in crossed random effects settings where using the Laplace approximation has cost that grows superlinearly. We show how to get consistent estimates of the probit slope and variance components by maximizing those three likelihoods. The algorithm scales readily to a data set of five million observations from Stitch Fix.

With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.

Multiclass classification is a fundamental and challenging task in machine learning. The existing techniques of multiclass classification can be categorized as (i) decomposition into binary (ii) extension from binary and (iii) hierarchical classification. Decomposing multiclass classification into a set of binary classifications that can be efficiently solved by using binary classifiers, called class binarization, which is a popular technique for multiclass classification. Neuroevolution, a general and powerful technique for evolving the structure and weights of neural networks, has been successfully applied to binary classification. In this paper, we apply class binarization techniques to a neuroevolution algorithm, NeuroEvolution of Augmenting Topologies (NEAT), that is used to generate neural networks for multiclass classification. We propose a new method that applies Error-Correcting Output Codes (ECOC) to design the class binarization strategies on the neuroevolution for multiclass classification. The ECOC strategies are compared with the class binarization strategies of One-vs-One and One-vs-All on three well-known datasets Digit, Satellite, and Ecoli. We analyse their performance from four aspects of multiclass classification degradation, accuracy, evolutionary efficiency, and robustness. The results show that the NEAT with ECOC performs high accuracy with low variance. Specifically, it shows significant benefits in a flexible number of binary classifiers and strong robustness.

As conversational models become increasingly available to the general public, users are engaging with this technology in social interactions. Such unprecedented interaction experiences may pose considerable social and psychological risks to the users unless the technology is properly controlled. This highlights the need for scalable and robust evaluation metrics for conversational chatbots. Existing evaluation metrics aim to automate offline user evaluation and approximate human judgment of pre-curated dialogs. However, they are limited in their ability to capture subjective perceptions of users who actually interact with the bots and might not generalize to real-world settings. To address this limitation, we propose an approach to approximate online human evaluation leveraging large language models (LLMs) from the GPT family. We introduce a new Dialog system Evaluation framework based on Prompting (DEP), which enables a fully automatic evaluation pipeline that replicates live user studies and achieves an impressive correlation with human judgment (up to Pearson r=0.95 on a system level). The DEP approach involves collecting synthetic chat logs of evaluated bots with an LLM in the other-play setting, where the LLM is carefully conditioned to follow a specific scenario. We further explore different prompting approaches to produce evaluation scores with the same LLM. The best performing prompts, which contain few-shot demonstrations and instructions, show outstanding performance on the tested dataset and demonstrate the ability to generalize to other dialog corpora.

Continuous integration at scale is costly but essential to software development. Various test optimization techniques including test selection and prioritization aim to reduce the cost. Test batching is an effective alternative, but overlooked technique. This study evaluates parallelization's effect by adjusting machine count for test batching and introduces two novel approaches. We establish TestAll as a baseline to study the impact of parallelism and machine count on feedback time. We re-evaluate ConstantBatching and introduce DynamicBatching, which adapts batch size based on the remaining changes in the queue. We also propose TestCaseBatching, enabling new builds to join a batch before full test execution, thus speeding up continuous integration. Our evaluations utilize Ericsson's results and 276 million test outcomes from open-source Chrome, assessing feedback time, execution reduction, and providing access to Chrome project scripts and data. The results reveal a non-linear impact of test parallelization on feedback time, as each test delay compounds across the entire test queue. ConstantBatching, with a batch size of 4, utilizes up to 72% fewer machines to maintain the actual average feedback time and provides a constant execution reduction of up to 75%. Similarly, DynamicBatching maintains the actual average feedback time with up to 91% fewer machines and exhibits variable execution reduction of up to 99%. TestCaseBatching holds the line of the actual average feedback time with up to 81% fewer machines and demonstrates variable execution reduction of up to 67%. We recommend practitioners use DynamicBatching and TestCaseBatching to reduce the required testing machines efficiently. Analyzing historical data to find the threshold where adding more machines has minimal impact on feedback time is also crucial for resource-effective testing.

Decentralized and incomplete data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints, and the presence of missing values within them can potentially introduce bias to the causal estimands. We introduce a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. Our approach disentangles the loss function into multiple components, each corresponding to a specific data source with missing values. Our approach accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. Our method recovers the conditional distribution of missing confounders given the observed confounders from the decentralized data sources to identify causal effects. Our framework estimates heterogeneous causal effects without the sharing of raw training data among sources, which helps to mitigate privacy risks. The efficacy of our approach is demonstrated through a collection of simulated and real-world instances, illustrating its potential and practicality.

What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioural analysis focusing on opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero's representations, and make the resulting behavioural and representational analyses available online.

北京阿比特科技有限公司