亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Although text style transfer has witnessed rapid development in recent years, there is as yet no established standard for evaluation, which is performed using several automatic metrics, lacking the possibility of always resorting to human judgement. We focus on the task of formality transfer, and on the three aspects that are usually evaluated: style strength, content preservation, and fluency. To cast light on how such aspects are assessed by common and new metrics, we run a human-based evaluation and perform a rich correlation analysis. We are then able to offer some recommendations on the use of such metrics in formality transfer, also with an eye to their generalisability (or not) to related tasks.

相關內容

Search engines and recommendation systems attempt to continually improve the quality of the experience they afford to their users. Refining the ranker that produces the lists displayed in response to user requests is an important component of this process. A common practice is for the service providers to make changes (e.g. new ranking features, different ranking models) and A/B test them on a fraction of their users to establish the value of the change. An alternative approach estimates the effectiveness of the proposed changes offline, utilising previously collected clickthrough data on the old ranker to posit what the user behaviour on ranked lists produced by the new ranker would have been. A majority of offline evaluation approaches invoke the well studied inverse propensity weighting to adjust for biases inherent in logged data. In this paper, we propose the use of parametric estimates for these propensities. Specifically, by leveraging well known learning-to-rank methods as subroutines, we show how accurate offline evaluation can be achieved when the new rankings to be evaluated differ from the logged ones.

We introduce a new type of Krasnoselskii's result. Using a simple differentiability condition, we relax the nonexpansive condition in Krasnoselskii's theorem. More clearly, we analyze the convergence of the sequence $x_{n+1}=\frac{x_n+g(x_n)}{2}$ based on some differentiability condition of $g$ and present some fixed point results. We introduce some iterative sequences that for any real differentiable function $g$ and any starting point $x_0\in \mathbb [a,b]$ converge monotonically to the nearest root of $g$ in $[a,b]$ that lay to the right or left side of $x_0$. Based on this approach, we present an efficient and novel method for finding the real roots of real functions. We prove that no root will be missed in our method. It is worth mentioning that our iterative method is free from the derivative evaluation which can be regarded as an advantage of this method in comparison with many other methods. Finally, we illustrate our results with some numerical examples.

This article argues that the Situation theory and the Channel theory can be used as a general framework for Imperfect Information Management. Different kinds of imperfections are uncertainty, imprecision, vagueness, incompleteness, inconsistency, and context-dependency which can be handled pretty well by our brain. Basic approaches like probability theory and standard logic are intrinsically inefficient in modeling fallacious minds. The generalized probability and nonstandard logic theories have epistemological motivations to provide better models for information integration in cognitive agents. Among many models of them, possibility theory and probabilistic logic theory are the best approaches. I argue, based on a review of different approaches to Imperfect Information Management, that a good framework for it is the Situation theory of Barwise and the Channel theory of Barwise-Seligman. These theories have relied on a powerful and unique epistemological-based notion of information to refer to partiality. These frameworks have a proper approach for context modeling to handle common knowledge and incomplete information. Also, they distinguish belief from knowledge clearly to model the non-monotonic and dynamic nature of knowledge. They discern the logic of the world from information flow in the mind. The objectification process in these theories reveals to us the nature of default or probabilistic rules in perceptions. The concept of the channel can be used to represent those types of reasoning mechanisms that move from one model or logic to another one. The imprecision in our perceptions causes fuzziness in reasoning and vagueness in communication that can be represented by some suitable classifications connected by some channels. This new framework like a network framework can provide a scalable and open framework to cover different models based on a relativistic notion of truth.

A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected explanation methods. To address these challenges and aid user study design, we introduce Use-Case-Grounded Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that SimEvals can be used to efficiently screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.

Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives.

There are many examples of cases where access to improved models of human behavior and cognition has allowed creation of robots which can better interact with humans, and not least in road vehicle automation this is a rapidly growing area of research. Human-robot interaction (HRI) therefore provides an important applied setting for human behavior modeling - but given the vast complexity of human behavior, how complete and accurate do these models need to be? Here, we outline some possible ways of thinking about this problem, starting from the suggestion that modelers need to keep the right end goal in sight: A successful human-robot interaction, in terms of safety, performance, and human satisfaction. Efforts toward model completeness and accuracy should be focused on those aspects of human behavior to which interaction success is most sensitive. We emphasise that identifying which those aspects are is a difficult scientific objective in its own right, distinct for each given HRI context. We propose and exemplify an approach to formulating a priori hypotheses on this matter, in cases where robots are to be involved in interactions which currently take place between humans, such as in automated driving. Our perspective also highlights some possible risks of overreliance on machine-learned models of human behavior in HRI, and how to mitigate against those risks.

In this work, we evaluate various existing dialogue relevance metrics, find strong dependency on the dataset, often with poor correlation with human scores of relevance, and propose modifications to reduce data requirements and domain sensitivity while improving correlation. Our proposed metric achieves state-of-the-art performance on the HUMOD dataset while reducing measured sensitivity to dataset by 37%-66%. We achieve this without fine-tuning a pretrained language model, and using only 3,750 unannotated human dialogues and a single negative example. Despite these limitations, we demonstrate competitive performance on four datasets from different domains. Our code, including our metric and experiments, is open sourced.

A multitude of explainability methods and associated fidelity performance metrics have been proposed to help better understand how modern AI systems make decisions. However, much of the current work has remained theoretical -- without much consideration for the human end-user. In particular, it is not yet known (1) how useful current explainability methods are in practice for more real-world scenarios and (2) how well associated performance metrics accurately predict how much knowledge individual explanations contribute to a human end-user trying to understand the inner-workings of the system. To fill this gap, we conducted psychophysics experiments at scale to evaluate the ability of human participants to leverage representative attribution methods for understanding the behavior of different image classifiers representing three real-world scenarios: identifying bias in an AI system, characterizing the visual strategy it uses for tasks that are too difficult for an untrained non-expert human observer as well as understanding its failure cases. Our results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varied widely across these scenarios. This suggests a critical need for the field to move past quantitative improvements of current attribution methods towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.

This short paper introduces the u-index, a simple and objective metric to evaluate the impact and relevance of academic research output, as a possible alternative to widespread metrics such as the h-index or the i10-index. The proposed index is designed to address possible issues with standard metrics such as inflated ratings resulting from self-citations or by means of extended co-authorship numbers where every citation has the same impact as for works written by smaller groups, despite limited individual contributions. The new index makes also possible to differentiate scholars who would otherwise fall into the same h-index group, hence providing further insights into the actual impact of a specific individual researcher.

As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.

北京阿比特科技有限公司