成人艳情一二三区按摩_国产乱人弄视频免费观看_中文人妻黄色网站_碰碰女人公开免费视频_2020色精品极品在线视频_无遮挡无码免费视频_一本一本久久久精品综合不卡

To assess the potential applications and limitations of chatbot GPT-3 Davinci-003, this study explored the temporal reliability of personality questionnaires applied to the chatbot and its personality profile. Psychological questionnaires were administered to the chatbot on two separate occasions, followed by a comparison of the responses to human normative data. The findings revealed varying levels of agreement in the chatbot's responses over time, with some scales displaying excellent while others demonstrated poor agreement. Overall, Davinci-003 displayed a socially desirable and pro-social personality profile, particularly in the domain of communion. However, the underlying basis of the chatbot's responses, whether driven by conscious self-reflection or predetermined algorithms, remains uncertain.

知識薈萃

精品入門和進階教程、論文和代碼整理等

查看相關VIP內容、論文、資訊等

INTERACT · 機器人 · Extensibility · 設計 · Robot ·

2023 年 7 月 28 日

We are all Individuals: The Role of Robot Personality and Human Traits in Trustworthy Interaction

Mei Yii Lim,José David Aguas Lopes,David A. Robb,Bruce W. Wilson,Meriam Moujahid,Emanuele De Pellegrin,Helen Hastie

from arxiv, 8 pages, RO-MAN'22, 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), August 2022, Naples, Italy

As robots take on roles in our society, it is important that their appearance, behaviour and personality are appropriate for the job they are given and are perceived favourably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality but, importantly, with respect to individual human traits. Firstly, we show that we can accurately portray personality in a social robot, in terms of extroversion-introversion using vocal cues and linguistic features. Secondly, through garnering preferences and trust ratings for these different robot personalities, we establish that, for a Robo-Barista, an extrovert robot is preferred and trusted more than an introvert robot, regardless of the subject's own personality. Thirdly, we find that individual attitudes and predispositions towards robots do impact trust in the Robo-Baristas, and are therefore important considerations in addition to robot personality, roles and interaction context when designing any human-robot interaction study.

MoDELS · 評論員 · 有偏 · 語言模型化 · 可辨認的 ·

2023 年 7 月 28 日

A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI

Arash Hajikhani,Carolyn Cole

from arxiv, 17 pages, 6 figures, 6 tables

This paper examines the comparative effectiveness of a specialized compiled language model and a general-purpose model like OpenAI's GPT-3.5 in detecting SDGs within text data. It presents a critical review of Large Language Models (LLMs), addressing challenges related to bias and sensitivity. The necessity of specialized training for precise, unbiased analysis is underlined. A case study using a company descriptions dataset offers insight into the differences between the GPT-3.5 and the specialized SDG detection model. While GPT-3.5 boasts broader coverage, it may identify SDGs with limited relevance to the companies' activities. In contrast, the specialized model zeroes in on highly pertinent SDGs. The importance of thoughtful model selection is emphasized, taking into account task requirements, cost, complexity, and transparency. Despite the versatility of LLMs, the use of specialized models is suggested for tasks demanding precision and accuracy. The study concludes by encouraging further research to find a balance between the capabilities of LLMs and the need for domain-specific expertise and interpretability.

MoDELS · 冪法 · Performer · 交叉驗證 · Machine Learning ·

2023 年 7 月 28 日

Is this model reliable for everyone? Testing for strong calibration

Jean Feng,Alexej Gossmann,Romain Pirracchio,Nicholas Petrick,Gene Pennello,Berkman Sahiner

In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model.

Networking · Git · 在線 · INTERACT · 評論員 ·

2023 年 7 月 27 日

Measuring Centralization of Online Platforms Through Size and Interconnection of Communities

Milo Z. Trujillo,Laurent Hébert-Dufresne,James Bagrow

from arxiv, 11 pages, 5 figures, 6 pages and 9 figures of supplemental material

Decentralized architecture offers a robust and flexible structure for online platforms, since centralized moderation and computation can be easy to disrupt with targeted attacks. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper we introduce a method of characterizing community influence in terms of how many edges between communities would be disrupted by a community's removal. Our approach provides a careful definition of "centralization" appropriate in bipartite user-community socio-technical networks, and demonstrates the inadequacy of more trivial methods for interrogating centralization such as examining the distribution of community sizes. We use this method to compare the structure of multiple socio-technical platforms -- Mastodon, git code hosting servers, BitChute, Usenet, and Voat -- and find a range of structures, from interconnected but decentralized git servers to an effectively centralized use of Mastodon servers, as well as multiscale hybrid network structures of disconnected Voat subverses. As the ecosystem of socio-technical platforms diversifies, it becomes critical to not solely focus on the underlying technologies but also consider the structure of how users interact through the technical infrastructure.

MoDELS · 黑盒 · Performer · Extensibility · AIM ·

2023 年 7 月 27 日

Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability

Usha Bhalla,Suraj Srinivas,Himabindu Lakkaraju

With the increased deployment of machine learning models in various real-world applications, researchers and practitioners alike have emphasized the need for explanations of model behaviour. To this end, two broad strategies have been outlined in prior literature to explain models. Post hoc explanation methods explain the behaviour of complex black-box models by highlighting features that are critical to model predictions; however, prior work has shown that these explanations may not be faithful, and even more concerning is our inability to verify them. Specifically, it is nontrivial to evaluate if a given attribution is correct with respect to the underlying model. Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture, meaning their explanations are naturally faithful and verifiable, but they often exhibit poor predictive performance due to their limited expressive power. In this work, we aim to bridge the gap between the aforementioned strategies by proposing Verifiability Tuning (VerT), a method that transforms black-box models into models that naturally yield faithful and verifiable feature attributions. We begin by introducing a formal theoretical framework to understand verifiability and show that attributions produced by standard models cannot be verified. We then leverage this framework to propose a method to build verifiable models and feature attributions out of fully trained black-box models. Finally, we perform extensive experiments on semi-synthetic and real-world datasets, and show that VerT produces models that (1) yield explanations that are correct and verifiable and (2) are faithful to the original black-box models they are meant to explain.

Performer · 泛函 · 優化器 · 經驗風險最小化 · 可約的 ·

2023 年 7 月 27 日

From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms

Omar Besbes,Will Ma,Omar Mouchtaki

In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.

語言模型化 · MoDELS · Performance · 情景 · SimPLe ·

2023 年 7 月 27 日

In-Context Retrieval-Augmented Language Models

Ori Ram,Yoav Levine,Itay Dalmedigos,Dor Muhlgay,Amnon Shashua,Kevin Leyton-Brown,Yoav Shoham

from arxiv, Accepted for publication in Transactions of the Association for Computational Linguistics (TACL). pre-MIT Press publication version

Retrieval-Augmented Language Modeling (RALM) methods, which condition a language model (LM) on relevant documents from a grounding corpus during generation, were shown to significantly improve language modeling performance. In addition, they can mitigate the problem of factually inaccurate text generation and provide natural source attribution mechanism. Existing RALM approaches focus on modifying the LM architecture in order to facilitate the incorporation of external information, significantly complicating deployment. This paper considers a simple alternative, which we dub In-Context RALM: leaving the LM architecture unchanged and prepending grounding documents to the input, without any further training of the LM. We show that In-Context RALM that builds on off-the-shelf general purpose retrievers provides surprisingly large LM gains across model sizes and diverse corpora. We also demonstrate that the document retrieval and ranking mechanism can be specialized to the RALM setting to further boost performance. We conclude that In-Context RALM has considerable potential to increase the prevalence of LM grounding, particularly in settings where a pretrained LM must be used without modification or even via API access.

ChatGPT · Prompt · Performer · 可辨認的 · state-of-the-art ·

2023 年 7 月 26 日

Is ChatGPT a Good Personality Recognizer? A Preliminary Study

Yu Ji,Wen Wu,Hong Zheng,Yi Hu,Xi Chen,Liang He

from arxiv, 15 pages, 13 figures, 5 tables

In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. The experimental results on two representative real-world datasets reveal that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, by employing the level-oriented prompting strategy to optimize zero-shot chain-of-thought prompting, the performance gap between ChatGPT and corresponding state-of-the-art model has been narrowed even more. However, we observe that ChatGPT shows unfairness towards certain sensitive demographic attributes such as gender and age. Additionally, we discover that eliciting the personality recognition ability of ChatGPT helps improve its performance on personality-related downstream tasks such as sentiment classification and stress prediction.

MoDELS · SimPLe · Performer · 梯度提升決策樹 · Learning ·

2023 年 7 月 26 日

TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning

Yury Gorishniy,Ivan Rubachev,Nikolay Kartashev,Daniil Shlenskii,Akim Kotelnikov,Artem Babenko

from arxiv, Code: //github.com/yandex-research/tabular-dl-tabr

Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR -- a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed ``GBDT-friendly'' benchmark (see the first figure).

知識 (knowledge) · 語言模型化 · Performer · MoDELS · state-of-the-art ·

2023 年 5 月 8 日

Augmented Large Language Models with Parametric Knowledge Guiding

Ziyang Luo,Can Xu,Pu Zhao,Xiubo Geng,Chongyang Tao,Jing Ma,Qingwei Lin,Daxin Jiang

Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities. However, their performance may be suboptimal for long-tail or domain-specific tasks due to limited exposure to domain-specific knowledge and vocabulary. Additionally, the lack of transparency of most state-of-the-art (SOTA) LLMs, which can only be accessed via APIs, impedes further fine-tuning with custom data. Moreover, data privacy is a significant concern. To address these challenges, we propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge at runtime without altering the LLMs' parameters. Our PKG is based on open-source "white-box" small language models, allowing offline storage of any knowledge that LLMs require. We demonstrate that our PKG framework can enhance the performance of "black-box" LLMs on a range of long-tail and domain-specific downstream tasks requiring factual, tabular, medical, and multimodal knowledge.