黄色视频在线观看男人插女人的视频在线观看_国产高清一区二区在线影院_无码久久AV一区二区三区_最近日本精品一卡2卡3卡4卡_亚洲欧美日韩精品在线_中文字幕永久一区二区三区色欲_不卡一区二区三区视频播放

Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple yet effective prompting techniques such as Chain-of-Thought have seen limited applicability to ToM. In this paper, we turn to the prominent cognitive science theory "Simulation Theory" to bridge this gap. We introduce SimToM, a novel two-stage prompting framework inspired by Simulation Theory's notion of perspective-taking. To implement this idea on current ToM benchmarks, SimToM first filters context based on what the character in question knows before answering a question about their mental state. Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods, and our analysis reveals the importance of perspective-taking to Theory-of-Mind capabilities. Our findings suggest perspective-taking as a promising direction for future research into improving LLMs' ToM capabilities.

相關內容

Cognition

關注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：認(ren)知(zhi)：國際(ji)認(ren)知(zhi)科學雜志。 Publisher：Elsevier。 SIT：

TOOLS · Analysis · 相關系數 · 得分 · 模型評估 ·

2024 年 1 月 10 日

Enhanced Muscle and Fat Segmentation for CT-Based Body Composition Analysis: A Comparative Study

Benjamin Hou,Tejas Sudharshan Mathai,Jianfei Liu,Christopher Parnell,Ronald M. Summers

Purpose: Body composition measurements from routine abdominal CT can yield personalized risk assessments for asymptomatic and diseased patients. In particular, attenuation and volume measures of muscle and fat are associated with important clinical outcomes, such as cardiovascular events, fractures, and death. This study evaluates the reliability of an Internal tool for the segmentation of muscle and fat (subcutaneous and visceral) as compared to the well-established public TotalSegmentator tool. Methods: We assessed the tools across 900 CT series from the publicly available SAROS dataset, focusing on muscle, subcutaneous fat, and visceral fat. The Dice score was employed to assess accuracy in subcutaneous fat and muscle segmentation. Due to the lack of ground truth segmentations for visceral fat, Cohen's Kappa was utilized to assess segmentation agreement between the tools. Results: Our Internal tool achieved a 3% higher Dice (83.8 vs. 80.8) for subcutaneous fat and a 5% improvement (87.6 vs. 83.2) for muscle segmentation respectively. A Wilcoxon signed-rank test revealed that our results were statistically different with p<0.01. For visceral fat, the Cohen's kappa score of 0.856 indicated near-perfect agreement between the two tools. Our internal tool also showed very strong correlations for muscle volume (R^2=0.99), muscle attenuation (R^2=0.93), and subcutaneous fat volume (R^2=0.99) with a moderate correlation for subcutaneous fat attenuation (R^2=0.45). Conclusion: Our findings indicated that our Internal tool outperformed TotalSegmentator in measuring subcutaneous fat and muscle. The high Cohen's Kappa score for visceral fat suggests a reliable level of agreement between the two tools. These results demonstrate the potential of our tool in advancing the accuracy of body composition analysis.

MS · 可約的 · MoDELS · 表示 · Extensibility ·

2024 年 1 月 10 日

CrossDiff: Exploring Self-Supervised Representation of Pansharpening via Cross-Predictive Diffusion Model

Yinghui Xing,Litao Qu,ShiZhou Zhang,Xiuwei Zhang,Yanning Zhang

Fusion of a panchromatic (PAN) image and corresponding multispectral (MS) image is also known as pansharpening, which aims to combine abundant spatial details of PAN and spectral information of MS. Due to the absence of high-resolution MS images, available deep-learning-based methods usually follow the paradigm of training at reduced resolution and testing at both reduced and full resolution. When taking original MS and PAN images as inputs, they always obtain sub-optimal results due to the scale variation. In this paper, we propose to explore the self-supervised representation of pansharpening by designing a cross-predictive diffusion model, named CrossDiff. It has two-stage training. In the first stage, we introduce a cross-predictive pretext task to pre-train the UNet structure based on conditional DDPM, while in the second stage, the encoders of the UNets are frozen to directly extract spatial and spectral features from PAN and MS, and only the fusion head is trained to adapt for pansharpening task. Extensive experiments show the effectiveness and superiority of the proposed model compared with state-of-the-art supervised and unsupervised methods. Besides, the cross-sensor experiments also verify the generalization ability of proposed self-supervised representation learners for other satellite's datasets. We will release our code for reproducibility.

語言模型化 · MoDELS · Performer · Guidance · 查準率/準確率 ·

2024 年 1 月 10 日

ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain

Bingchao Wang

Recently, various Large Language Models (LLMs) evaluation datasets have emerged, but most of them have issues with distorted rankings and difficulty in model capabilities analysis. Addressing these concerns, this paper introduces ANGO, a Chinese multi-choice question evaluation benchmark. ANGO proposes \textit{Keypoint} categorization standard for the first time, each question in ANGO can correspond to multiple keypoints, effectively enhancing interpretability of evaluation results. Base on performance of real humans, we build a quantifiable question difficulty standard and divide ANGO questions into 9 difficulty levels, which provide more precise guidance for model training. To minimize data leakage impact and fully leverage ANGO's innovative features, we have engineered exclusive sampling strategies and a new evaluation framework that support swift testset iteration. Our experiments demonstrate that ANGO poses a stronger challenge to models and reveals more details in evaluation result compared to existing benchmarks.

MoDELS · Learning · 機器學習建模 · ML · UniFormer ·

2024 年 1 月 9 日

PhilEO Bench: Evaluating Geo-Spatial Foundation Models

Casper Fibaek,Luke Camilleri,Andreas Luyts,Nikolaos Dionelis,Bertrand Le Saux

from arxiv, 6 pages, 5 figures, Submitted to IGARSS 2024

Massive amounts of unlabelled data are captured by Earth Observation (EO) satellites, with the Sentinel-2 constellation generating 1.6 TB of data daily. This makes Remote Sensing a data-rich domain well suited to Machine Learning (ML) solutions. However, a bottleneck in applying ML models to EO is the lack of annotated data as annotation is a labour-intensive and costly process. As a result, research in this domain has focused on Self-Supervised Learning and Foundation Model approaches. This paper addresses the need to evaluate different Foundation Models on a fair and uniform benchmark by introducing the PhilEO Bench, a novel evaluation framework for EO Foundation Models. The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset containing labels for three downstream tasks, building density estimation, road segmentation, and land cover classification. We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE, at multiple n-shots and convergence rates.

Analysis · DeepFakes · 判別器 · 可辨認的 · 離散化 ·

2024 年 1 月 9 日

Improving Video Deepfake Detection: A DCT-Based Approach with Patch-Level Analysis

Luca Guarnera,Salvatore Manganello,Sebastiano Battiato

A new algorithm for the detection of deepfakes in digital videos is presented. The I-frames were extracted in order to provide faster computation and analysis than approaches described in the literature. To identify the discriminating regions within individual video frames, the entire frame, background, face, eyes, nose, mouth, and face frame were analyzed separately. From the Discrete Cosine Transform (DCT), the Beta components were extracted from the AC coefficients and used as input to standard classifiers. Experimental results show that the eye and mouth regions are those most discriminative and able to determine the nature of the video under analysis.

state-of-the-art · 有向 · Things · 可約的 · 設計 ·

2024 年 1 月 9 日

Breaking the Interference and Fading Gridlock in Backscatter Communications: State-of-the-Art, Design Challenges, and Future Directions

Bowen Gu,Dong Li,Haiyang Ding,Gongpu Wang,Chintha Tellambura

As the Internet of Things (IoT) advances by leaps and bounds, a multitude of devices are becoming interconnected, marking the onset of an era where all things are connected. While this growth opens up opportunities for novel products and applications, it also leads to increased energy demand and battery reliance for IoT devices, creating a significant bottleneck that hinders sustainable progress. At this juncture, backscatter communication (BackCom), as a low-power and passive communication method, emerges as one of the promising solutions to this energy impasse by reducing the manufacturing costs and energy consumption of IoT devices. However, BackCom systems face challenges such as complex interference environments, including direct link interference (DLI) and mutual interference (MI) between tags, which can severely disrupt the efficiency of BackCom networks. Moreover, double-path fading is another major issue that leads to the degraded system performance. To fully unleash the potential of BackComs, the purpose of this paper is to furnish a comprehensive review of existing solutions with a focus on combatting these specific interference challenges and overcoming dual-path fading, offering an insightful analysis and comparison of various strategies for effectively mitigating these issues. Specifically, we begin by introducing the preliminaries for the BackCom, including its history, operating mechanisms, main architectures, etc, providing a foundational understanding of the field. Then, we delve into fundamental issues related to BackCom systems, such as solutions for the DLI, the MI, and the double-path fading. This paper thoroughly provides state-of-the-art advances for each case, particularly highlighting how the latest innovations in theoretical approaches and system design can strategically address these challenges.

簇 · 語言模型化 · MoDELS · 偽標記 · 文本分類 ·

2024 年 1 月 8 日

IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification

Abdullah Alsuhaibani,Hamad Zogan,Imran Razzak,Shoaib Jameel,Guandong Xu

from arxiv, Published in The 17th ACM International Conference on Web Search and Data Mining

Language models such as Bidirectional Encoder Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification. However, some tasks still pose challenges for these models, including text classification with limited labels. This can result in a cold-start problem. Although some approaches have attempted to address this problem through single-stage clustering as an intermediate training step coupled with a pre-trained language model, which generates pseudo-labels to improve classification, these methods are often error-prone due to the limitations of the clustering algorithms. To overcome this, we have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably, resulting in reduced prediction errors. The key novelty in our model, IDoFew, is that the two-stage clustering coupled with two different clustering algorithms helps exploit the advantages of the complementary algorithms that reduce the errors in generating reliable pseudo-labels for fine-tuning. Our approach has shown significant improvements compared to strong comparative models.

INTERACT · INFORMS · state-of-the-art · HTTPS · MoDELS ·

2024 年 1 月 8 日

Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting

Haowei Du,Dinghao Zhang,Chen Li,Yang Li,Dongyan Zhao

from arxiv, Findings of EMNLP2023 (short)

Recent approaches in Incomplete Utterance Rewriting (IUR) fail to capture the source of important words, which is crucial to edit the incomplete utterance, and introduce words from irrelevant utterances. We propose a novel and effective multi-task information interaction framework including context selection, edit matrix construction, and relevance merging to capture the multi-granularity of semantic information. Benefiting from fetching the relevant utterance and figuring out the important words, our approach outperforms existing state-of-the-art models on two benchmark datasets Restoration-200K and CANAND in this field. Code will be provided on \url{//github.com/yanmenxue/QR}.

AIGC · Extensibility · state-of-the-art · AIM · HTTPS ·

2024 年 1 月 8 日

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

Fanda Fan,Chunjie Luo,Wanling Gao,Jianfeng Zhan

The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. We have open-sourced the dataset and evaluation code on the project website: //www.benchcouncil.org/AIGCBench.

神經元 · MoDELS · Attention · BLEU · 標注 ·

2024 年 1 月 5 日

MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning

Alfirsa Damasyifa Fauzulhaq,Wahyu Parwitayasa,Joseph Ananda Sugihdharma,M. Fadli Ridhani,Novanto Yudistira

Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based model for text generation. Previous work, namely MILAN (Mutual Information-guided Linguistic Annotation of Neuron), has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder. MILAN can show great result on short sequence neuron captioning, but it does not show great result on long sequence neuron captioning, so in this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one, in order to combine all the advantages from several attention mechanism. Using our compound dataset, we obtained higher BLEU and F1-Score on our proposed model, achieving 17.742 and 0.4811 respectively. At some point where the model converges at the peak, our model obtained BLEU of 21.2262 and BERTScore F1-Score of 0.4870.