国产乱伦对白刺激视频-一区二区三区有码在线观看

Many important computer vision applications are naturally formulated as regression problems. Within medical imaging, accurate regression models have the potential to automate various tasks, helping to lower costs and improve patient outcomes. Such safety-critical deployment does however require reliable estimation of model uncertainty, also under the wide variety of distribution shifts that might be encountered in practice. Motivated by this, we set out to investigate the reliability of regression uncertainty estimation methods under various real-world distribution shifts. To that end, we propose an extensive benchmark of 8 image-based regression datasets with different types of challenging distribution shifts. We then employ our benchmark to evaluate many of the most common uncertainty estimation methods, as well as two state-of-the-art uncertainty scores from the task of out-of-distribution detection. We find that while methods are well calibrated when there is no distribution shift, they all become highly overconfident on many of the benchmark datasets. This uncovers important limitations of current uncertainty estimation methods, and the proposed benchmark therefore serves as a challenge to the research community. We hope that our benchmark will spur more work on how to develop truly reliable regression uncertainty estimation methods. Code is available at //github.com/fregu856/regression_uncertainty.

相關內容

估計/估計量

關注 3

Performer · MoDELS · Processing（編程語言） · 控制器 · 講稿 ·

2023 年 12 月 28 日

Why Do Probabilistic Clinical Models Fail To Transport Between Sites?

Thomas A. Lasko,Eric V. Strobl,William W. Stead

from arxiv, 20 pages, 3 figures

The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we present common sources for this failure to transport, which we divide into sources under the control of the experimenter and sources inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.

MoDELS · CASES · 張成子空間 · Pair · 多峰值 ·

2023 年 12 月 26 日

What You See is What You Read? Improving Text-Image Alignment Evaluation

Michal Yarom,Yonatan Bitton,Soravit Changpinyo,Roee Aharoni,Jonathan Herzig,Oran Lang,Eran Ofek,Idan Szpektor

from arxiv, Accepted to NeurIPS 2023. Website: //wysiwyr-itm.github.io/

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to-image and image-to-text generation tasks, with human judgements for whether a given text-image pair is semantically aligned. We then describe two automatic methods to determine alignment: the first involving a pipeline based on question generation and visual question answering models, and the second employing an end-to-end classification approach by finetuning multimodal pretrained models. Both methods surpass prior approaches in various text-image alignment tasks, with significant improvements in challenging cases that involve complex composition or unnatural images. Finally, we demonstrate how our approaches can localize specific misalignments between an image and a given text, and how they can be used to automatically re-rank candidates in text-to-image generation.

ChatGPT · 推斷 · 估計/估計量 · 有偏 · AI ·

2023 年 12 月 26 日

Can ChatGPT Read Who You Are?

Erik Derner,Dalibor Ku?era,Nuria Oliver,Jan Zahálka

The interplay between artificial intelligence (AI) and psychology, particularly in personality assessment, represents an important emerging area of research. Accurate personality trait estimation is crucial not only for enhancing personalization in human-computer interaction but also for a wide variety of applications ranging from mental health to education. This paper analyzes the capability of a generic chatbot, ChatGPT, to effectively infer personality traits from short texts. We report the results of a comprehensive user study featuring texts written in Czech by a representative population sample of 155 participants. Their self-assessments based on the Big Five Inventory (BFI) questionnaire serve as the ground truth. We compare the personality trait estimations made by ChatGPT against those by human raters and report ChatGPT's competitive performance in inferring personality traits from text. We also uncover a 'positivity bias' in ChatGPT's assessments across all personality dimensions and explore the impact of prompt composition on accuracy. This work contributes to the understanding of AI capabilities in psychological assessment, highlighting both the potential and limitations of using large language models for personality inference. Our research underscores the importance of responsible AI development, considering ethical implications such as privacy, consent, autonomy, and bias in AI applications.

INFORMS · YouTube · 可理解性 · Analysis · 可辨認的 ·

2023 年 12 月 26 日

YouTube Video Analytics for Patient Health Literacy: Evidence from Colonoscopy Preparation Videos

Yawen Guo,Xiao Liu,Anjana Susarla,Rema Padman

from arxiv, The 30th WORKSHOP ON INFORMATION TECHNOLOGIES AND SYSTEMS

Videos can be an effective way to deliver contextualized, just-in-time medical information for patient education. However, video analysis, from topic identification and retrieval to extraction and analysis of medical information and understandability from a patient perspective are extremely challenging tasks. This study utilizes data analysis methods to retrieve medical information from YouTube videos concerning colonoscopy to manage health conditions. We first use the YouTube Data API to collect metadata of desired videos on select search keywords and use Google Video Intelligence API to analyze texts, frames and objects data. Then we annotate the YouTube video materials on medical information, video understandability annotation and recommendation. We develop a bidirectional long short-term memory (BLSTM) model to identify medical terms in videos and build three classifiers to group videos based on the level of encoded medical information, video understandability level and whether the videos are recommended. Our study provides healthcare practitioners and patients with guidelines for generating new educational video content and enabling management of health conditions.

有偏 · 得分 · MoDELS · GPT-3.5 · 訓練數據 ·

2023 年 12 月 26 日

AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?

Ehsan Latif,Xiaoming Zhai,Lei Liu

from arxiv, An answer to the questions regarding AI gender bias, Submitted to Educational Technology & Society

This study delves into the pervasive issue of gender issues in artificial intelligence (AI), specifically within automatic scoring systems for student-written responses. The primary objective is to investigate the presence of gender biases, disparities, and fairness in generally targeted training samples with mixed-gender datasets in AI scoring outcomes. Utilizing a fine-tuned version of BERT and GPT-3.5, this research analyzes more than 1000 human-graded student responses from male and female participants across six assessment items. The study employs three distinct techniques for bias analysis: Scoring accuracy difference to evaluate bias, mean score gaps by gender (MSG) to evaluate disparity, and Equalized Odds (EO) to evaluate fairness. The results indicate that scoring accuracy for mixed-trained models shows an insignificant difference from either male- or female-trained models, suggesting no significant scoring bias. Consistently with both BERT and GPT-3.5, we found that mixed-trained models generated fewer MSG and non-disparate predictions compared to humans. In contrast, compared to humans, gender-specifically trained models yielded larger MSG, indicating that unbalanced training data may create algorithmic models to enlarge gender disparities. The EO analysis suggests that mixed-trained models generated more fairness outcomes compared with gender-specifically trained models. Collectively, the findings suggest that gender-unbalanced data do not necessarily generate scoring bias but can enlarge gender disparities and reduce scoring fairness.

MoDELS · 假陽性 · INFORMS · TOOLS · Less ·

2023 年 12 月 25 日

Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

Tirth Patel,Fred Lu,Edward Raff,Charles Nicholas,Cynthia Matuszek,James Holt

from arxiv, To appear in Conference on Applied Machine Learning for Information Security 2023

Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0.1\% change can cause an overwhelming number of false positives. However, academic research is often restrained to public datasets on the order of ten thousand samples and is too small to detect improvements that may be relevant to industry. Working within these constraints, we devise an approach to generate a benchmark of configurable difficulty from a pool of available samples. This is done by leveraging malware family information from tools like AVClass to construct training/test splits that have different generalization rates, as measured by a secondary model. Our experiments will demonstrate that using a less accurate secondary model with disparate features is effective at producing benchmarks for a more sophisticated target model that is under evaluation. We also ablate against alternative designs to show the need for our approach.

Learning · 圖像分割 · 深度模型 · 講稿 · 評論員 ·

2022 年 7 月 28 日

Learning with Limited Annotations: A Survey on Deep Semi-Supervised Learning for Medical Image Segmentation

Rushi Jiao,Yichi Zhang,Le Ding,Rong Cai,Jicong Zhang

Medical image segmentation is a fundamental and critical step in many image-guided clinical approaches. Recent success of deep learning-based segmentation methods usually relies on a large amount of labeled data, which is particularly difficult and costly to obtain especially in the medical imaging domain where only experts can provide reliable and accurate annotations. Semi-supervised learning has emerged as an appealing strategy and been widely applied to medical image segmentation tasks to train deep models with limited annotations. In this paper, we present a comprehensive review of recently proposed semi-supervised learning methods for medical image segmentation and summarized both the technical novelties and empirical results. Furthermore, we analyze and discuss the limitations and several unsolved problems of existing approaches. We hope this review could inspire the research community to explore solutions for this challenge and further promote the developments in medical image segmentation field.

真實值 · 可辨認的 · 數據集 · HTTPS · 計算學習理論 ·

2021 年 12 月 15 日

Do Feature Attribution Methods Correctly Attribute Features?

Yilun Zhou,Serena Booth,Marco Tulio Ribeiro,Julie Shah

from arxiv, AAAI 2022. Video summary at //www.youtube.com/watch?v=kAodFw6jvvo

Feature attribution methods are popular in interpretable machine learning. These methods compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation, complicated in particular by the lack of ground truth attribution. To address this, we propose a dataset modification procedure to induce such ground truth. Using this procedure, we evaluate three common methods: saliency maps, rationales, and attentions. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods applied on datasets in the wild. We further discuss possible avenues for remedy and recommend new attribution methods to be tested against ground truth before deployment. The code is available at \url{//github.com/YilunZhou/feature-attribution-evaluation}.

AdderNet · Neural Networks · Networking · 卷積 · 模型評估 ·

2019 年 12 月 31 日

AdderNet: Do We Really Need Multiplications in Deep Learning?

Hanting Chen,Yunhe Wang,Chunjing Xu,Boxin Shi,Chao Xu,Qi Tian,Chang Xu

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

文本分類 · 語言模型化 · BERT · state-of-the-art · MoDELS ·

2019 年 5 月 14 日

How to Fine-Tune BERT for Text Classification?

Chi Sun,Xipeng Qiu,Yige Xu,Xuanjing Huang

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.