嘘嘘中国免费观看网站-美女午夜一区视频在线播放

Visual quality measures (VQMs) are designed to support analysts by automatically detecting and quantifying patterns in visualizations. We propose a new VQM for visual grouping patterns in scatterplots, called ClustML, which is trained on previously collected human subject judgments. Our model encodes scatterplots in the parametric space of a Gaussian Mixture Model and uses a classifier trained on human judgment data to estimate the perceptual complexity of grouping patterns. The numbers of initial mixture components and final combined groups. It improves on existing VQMs, first, by better estimating human judgments on two-Gaussian cluster patterns and, second, by giving higher accuracy when ranking general cluster patterns in scatterplots. We use it to analyze kinship data for genome-wide association studies, in which experts rely on the visual analysis of large sets of scatterplots. We make the benchmark datasets and the new VQM available for practical use and further improvements.

相關內容

簇

關注 1

MoDELS · 縮放 · 機器人 · Agent · Performer ·

2024 年 1 月 23 日

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Michael Ahn,Debidatta Dwibedi,Chelsea Finn,Montse Gonzalez Arenas,Keerthana Gopalakrishnan,Karol Hausman,Brian Ichter,Alex Irpan,Nikhil Joshi,Ryan Julian,Sean Kirmani,Isabel Leal,Edward Lee,Sergey Levine,Yao Lu,Isabel Leal,Sharath Maddineni,Kanishka Rao,Dorsa Sadigh,Pannag Sanketi,Pierre Sermanet,Quan Vuong,Stefan Welker,Fei Xia,Ted Xiao,Peng Xu,Steve Xu,Zhuo Xu

from arxiv, 26 pages, 9 figures

Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. AutoRT leverages vision-language models (VLMs) for scene understanding and grounding, and further uses large language models (LLMs) for proposing diverse and novel instructions to be performed by a fleet of robots. Guiding data collection by tapping into the knowledge of foundation models enables AutoRT to effectively reason about autonomy tradeoffs and safety while significantly scaling up data collection for robot learning. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.

entity · 解碼 · 實體對齊 · 圖 · Processing（編程語言） ·

2024 年 1 月 23 日

Gradient Flow of Energy: A General and Efficient Approach for Entity Alignment Decoding

Yuanyi Wang,Haifeng Sun,Jingyu Wang,Qi Qi,Shaoling Sun,Jianxin Liao

Entity alignment (EA), a pivotal process in integrating multi-source Knowledge Graphs (KGs), seeks to identify equivalent entity pairs across these graphs. Most existing approaches regard EA as a graph representation learning task, concentrating on enhancing graph encoders. However, the decoding process in EA - essential for effective operation and alignment accuracy - has received limited attention and remains tailored to specific datasets and model architectures, necessitating both entity and additional explicit relation embeddings. This specificity limits its applicability, particularly in GNN-based models. To address this gap, we introduce a novel, generalized, and efficient decoding approach for EA, relying solely on entity embeddings. Our method optimizes the decoding process by minimizing Dirichlet energy, leading to the gradient flow within the graph, to promote graph homophily. The discretization of the gradient flow produces a fast and scalable approach, termed Triple Feature Propagation (TFP). TFP innovatively channels gradient flow through three views: entity-to-entity, entity-to-relation, and relation-to-entity. This generalized gradient flow enables TFP to harness the multi-view structural information of KGs. Rigorous experimentation on diverse real-world datasets demonstrates that our approach significantly enhances various EA methods. Notably, the approach achieves these advancements with less than 6 seconds of additional computational time, establishing a new benchmark in efficiency and adaptability for future EA methods.

大語言模型 · TOOLS · 語言模型化 · 任務對話系統 · MoDELS ·

2024 年 1 月 23 日

LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools

Qianli Wang,Tatiana Anikina,Nils Feldhus,Josef van Genabith,Leonhard Hennig,Sebastian M?ller

Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding, as one-off explanations may occasionally fall short in providing sufficient information to the user. Current solutions for dialogue-based explanations, however, require many dependencies and are not easily transferable to tasks they were not designed for. With LLMCheckup, we present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior. We enable LLMs to generate all explanations by themselves and take care of intent recognition without fine-tuning, by connecting them with a broad spectrum of Explainable AI (XAI) tools, e.g. feature attributions, embedding-based similarity, and prompting strategies for counterfactual and rationale generation. LLM (self-)explanations are presented as an interactive dialogue that supports follow-up questions and generates suggestions. LLMCheckup provides tutorials for operations available in the system, catering to individuals with varying levels of expertise in XAI and supports multiple input modalities. We introduce a new parsing strategy called multi-prompt parsing substantially enhancing the parsing accuracy of LLMs. Finally, we showcase the tasks of fact checking and commonsense question answering.

Extensibility · 未標記 · 標注 · MoDELS · Performer ·

2024 年 1 月 23 日

AdaEmbed: Semi-supervised Domain Adaptation in the Embedding Space

Ali Mottaghi,Mohammad Abdullah Jamal,Serena Yeung,Omid Mohareri

Semi-supervised domain adaptation (SSDA) presents a critical hurdle in computer vision, especially given the frequent scarcity of labeled data in real-world settings. This scarcity often causes foundation models, trained on extensive datasets, to underperform when applied to new domains. AdaEmbed, our newly proposed methodology for SSDA, offers a promising solution to these challenges. Leveraging the potential of unlabeled data, AdaEmbed facilitates the transfer of knowledge from a labeled source domain to an unlabeled target domain by learning a shared embedding space. By generating accurate and uniform pseudo-labels based on the established embedding space, the model overcomes the limitations of conventional SSDA, thus enhancing performance significantly. Our method's effectiveness is validated through extensive experiments on benchmark datasets such as DomainNet, Office-Home, and VisDA-C, where AdaEmbed consistently outperforms all the baselines, setting a new state of the art for SSDA. With its straightforward implementation and high data efficiency, AdaEmbed stands out as a robust and pragmatic solution for real-world scenarios, where labeled data is scarce. To foster further research and application in this area, we are sharing the codebase of our unified framework for semi-supervised domain adaptation.

Performer · MoDELS · INTERACT · 操作 · 代價 ·

2024 年 1 月 22 日

The Effect of Predictive Formal Modelling at Runtime on Performance in Human-Swarm Interaction

Ayodeji O. Abioye,William Hunt,Yue Gu,Eike Schneiders,Mohammad Naiseh,Joel E. Fischer,Sarvapali D. Ramchurn,Mohammad D. Soorati,Blair Archibald,Michele Sevegnani

from arxiv, This work has been accepted in HRI '24 LBR track. It consist of 5 pages, 2 figures, and 2 tables. This is the author's submitted manuscript, for your personal use. Not for redistribution. The definitive Version of Record will be published in the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), //doi.org/10.1145/3610978.3640725

Formal Modelling is often used as part of the design and testing process of software development to ensure that components operate within suitable bounds even in unexpected circumstances. In this paper, we use predictive formal modelling (PFM) at runtime in a human-swarm mission and show that this integration can be used to improve the performance of human-swarm teams. We recruited 60 participants to operate a simulated aerial swarm to deliver parcels to target locations. In the PFM condition, operators were informed of the estimated completion times given the number of drones deployed, whereas in the No-PFM condition, operators did not have this information. The operators could control the mission by adding or removing drones from the mission and thereby, increasing or decreasing the overall mission cost. The evaluation of human-swarm performance relied on four key metrics: the time taken to complete tasks, the number of agents involved, the total number of tasks accomplished, and the overall cost associated with the human-swarm task. Our results show that PFM modelling at runtime improves mission performance without significantly affecting the operator's workload or the system's usability.

Analysis · 語言模型化 · Engineering · MoDELS · 變換 ·

2024 年 1 月 22 日

Emotion Classification In Software Engineering Texts: A Comparative Analysis of Pre-trained Transformers Language Models

Mia Mohammad Imran

Emotion recognition in software engineering texts is critical for understanding developer expressions and improving collaboration. This paper presents a comparative analysis of state-of-the-art Pre-trained Language Models (PTMs) for fine-grained emotion classification on two benchmark datasets from GitHub and Stack Overflow. We evaluate six transformer models - BERT, RoBERTa, ALBERT, DeBERTa, CodeBERT and GraphCodeBERT against the current best-performing tool SEntiMoji. Our analysis reveals consistent improvements ranging from 1.17\% to 16.79\% in terms of macro-averaged and micro-averaged F1 scores, with general domain models outperforming specialized ones. To further enhance PTMs, we incorporate polarity features in attention layer during training, demonstrating additional average gains of 1.0\% to 10.23\% over baseline PTMs approaches. Our work provides strong evidence for the advancements afforded by PTMs in recognizing nuanced emotions like Anger, Love, Fear, Joy, Sadness, and Surprise in software engineering contexts. Through comprehensive benchmarking and error analysis, we also outline scope for improvements to address contextual gaps.

MoDELS · Learning · Processing（編程語言） · 3D · Vision ·

2024 年 1 月 22 日

Local Agnostic Video Explanations: a Study on the Applicability of Removal-Based Explanations to Video

F. Xavier Gaya-Morey,Jose M. Buades-Rubio,Cristina Manresa-Yee

Explainable artificial intelligence techniques are becoming increasingly important with the rise of deep learning applications in various domains. These techniques aim to provide a better understanding of complex "black box" models and enhance user trust while maintaining high learning performance. While many studies have focused on explaining deep learning models in computer vision for image input, video explanations remain relatively unexplored due to the temporal dimension's complexity. In this paper, we present a unified framework for local agnostic explanations in the video domain. Our contributions include: (1) Extending a fine-grained explanation framework tailored for computer vision data, (2) Adapting six existing explanation techniques to work on video data by incorporating temporal information and enabling local explanations, and (3) Conducting an evaluation and comparison of the adapted explanation methods using different models and datasets. We discuss the possibilities and choices involved in the removal-based explanation process for visual data. The adaptation of six explanation methods for video is explained, with comparisons to existing approaches. We evaluate the performance of the methods using automated metrics and user-based evaluation, showing that 3D RISE, 3D LIME, and 3D Kernel SHAP outperform other methods. By decomposing the explanation process into manageable steps, we facilitate the study of each choice's impact and allow for further refinement of explanation methods to suit specific datasets and models.

Attention · Performer · 變換 · 層規范化 · MoDELS ·

2024 年 1 月 22 日

MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo

Chenjie Cao,Xinlin Ren,Yanwei Fu

from arxiv, Accepted to ICLR2024

Recent advancements in learning-based Multi-View Stereo (MVS) methods have prominently featured transformer-based models with attention mechanisms. However, existing approaches have not thoroughly investigated the profound influence of transformers on different MVS modules, resulting in limited depth estimation capabilities. In this paper, we introduce MVSFormer++, a method that prudently maximizes the inherent characteristics of attention to enhance various components of the MVS pipeline. Formally, our approach involves infusing cross-view information into the pre-trained DINOv2 model to facilitate MVS learning. Furthermore, we employ different attention mechanisms for the feature encoder and cost volume regularization, focusing on feature and spatial aggregations respectively. Additionally, we uncover that some design details would substantially impact the performance of transformer modules in MVS, including normalized 3D positional encoding, adaptive attention scaling, and the position of layer normalization. Comprehensive experiments on DTU, Tanks-and-Temples, BlendedMVS, and ETH3D validate the effectiveness of the proposed method. Notably, MVSFormer++ achieves state-of-the-art performance on the challenging DTU and Tanks-and-Temples benchmarks.

示例 · 估計/估計量 · Learning · Extensibility · motivation ·

2024 年 1 月 22 日

Recurrent Generic Contour-based Instance Segmentation with Progressive Learning

Hao Feng,Keyi Zhou,Wengang Zhou,Yufei Yin,Jiajun Deng,Qi Sun,Houqiang Li

Contour-based instance segmentation has been actively studied, thanks to its flexibility and elegance in processing visual objects within complex backgrounds. In this work, we propose a novel deep network architecture, i.e., PolySnake, for generic contour-based instance segmentation. Motivated by the classic Snake algorithm, the proposed PolySnake achieves superior and robust segmentation performance with an iterative and progressive contour refinement strategy. Technically, PolySnake introduces a recurrent update operator to estimate the object contour iteratively. It maintains a single estimate of the contour that is progressively deformed toward the object boundary. At each iteration, PolySnake builds a semantic-rich representation for the current contour and feeds it to the recurrent operator for further contour adjustment. Through the iterative refinements, the contour progressively converges to a stable status that tightly encloses the object instance. Beyond the scope of general instance segmentation, extensive experiments are conducted to validate the effectiveness and generalizability of our PolySnake in two additional specific task scenarios, including scene text detection and lane detection. The results demonstrate that the proposed PolySnake outperforms the existing advanced methods on several multiple prevalent benchmarks across the three tasks. The codes and pre-trained models are available at //github.com/fh2019ustc/PolySnake

多峰值 · Taxonomy · MoDELS · 可理解性 · 有向 ·

2023 年 2 月 9 日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Hongyu Zhou,Xin Zhou,Zhiwei Zeng,Lingzi Zhang,Zhiqi Shen

from arxiv, 33 pages, 4 figures

Recommendation systems have become popular and effective tools to help users discover their interesting items by modeling the user preference and item property based on implicit interactions (e.g., purchasing and clicking). Humans perceive the world by processing the modality signals (e.g., audio, text and image), which inspired researchers to build a recommender system that can understand and interpret data from different modalities. Those models could capture the hidden relations between different modalities and possibly recover the complementary information which can not be captured by a uni-modal approach and implicit interactions. The goal of this survey is to provide a comprehensive review of the recent research efforts on the multimodal recommendation. Specifically, it shows a clear pipeline with commonly used techniques in each step and classifies the models by the methods used. Additionally, a code framework has been designed that helps researchers new in this area to understand the principles and techniques, and easily runs the SOTA models. Our framework is located at: //github.com/enoche/MMRec