国产一区二区高清无码,欧美日韩国产视频

Graphs are often used to model relationships between entities. The identification and visualization of clusters in graphs enable insight discovery in many application areas, such as life sciences and social sciences. Force-directed graph layouts promote the visual saliency of clusters, as they bring adjacent nodes closer together, and push non-adjacent nodes apart. At the same time, matrices can effectively show clusters when a suitable row/column ordering is applied, but are less appealing to untrained users not providing an intuitive node-link metaphor. It is thus worth exploring layouts combining the strengths of the node-link metaphor and node ordering. In this work, we study the impact of node ordering on the visual saliency of clusters in orderable node-link diagrams, namely radial diagrams, arc diagrams and symmetric arc diagrams. Through a crowdsourced controlled experiment, we show that users can count clusters consistently more accurately, and to a large extent faster, with orderable node-link diagrams than with three state-of-the art force-directed layout algorithms, i.e., `Linlog', `Backbone' and `sfdp'. The measured advantage is greater in case of low cluster separability and/or low compactness. A free copy of this paper and all supplemental materials are available at //osf.io/kc3dg/.

相關內容

簇

關注 1

互信息 · INFORMS · 模型評估 · Performer · MoDELS ·

2024 年 10 月 2 日

An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings

Soham Govande

There is a growing need for pluralistic alignment methods that can steer language models towards individual attributes and preferences. One such method, Self-Supervised Alignment with Mutual Information (SAMI), uses conditional mutual information to encourage the connection between behavioral preferences and model responses. We conduct two experiments exploring SAMI in multi-task settings. First, we compare SAMI to Direct Preference Optimization (DPO) on a multi-task benchmark (MT-Bench), using a stronger model to generate training data for a weaker one across diverse categories (humanities, STEM, extraction, coding, math, reasoning, and roleplay). Our results indicate that one iteration of SAMI has a 57% win rate against DPO, with significant variation in performance between task categories. Second, we examine SAMI's impact on mathematical accuracy (GSM-8K) relative to supervised fine-tuning (SFT). While SAMI increases zero-shot performance by 1.1%, SFT is more effective with a 3.2% boost. However, SAMI shows interesting scaling trends. When given 10 attempts, SAMI improves accuracy by 3.9%, while SFT achieves a 10.1% increase. Combining SAMI with SFT yields an additional improvement of 1.3% in multi-attempt settings, though single-attempt accuracy remains unchanged.

Neural Networks · Networking · Learning · 循環神經網絡 · 反向傳播 ·

2024 年 10 月 1 日

Gradient-Free Training of Recurrent Neural Networks using Random Perturbations

Jesus Garcia Fernandez,Sander Keemink,Marcel van Gerven

Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle to propagate gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. We subsequently conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability compared to BPTT, strongly outperforming standard node and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic computing applications

MoDELS · Attention · entity · 相關系數 · 掩碼 ·

2024 年 10 月 1 日

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Changming Xiao,Qi Yang,Feng Zhou,Changshui Zhang

from arxiv, A revised version of this paper will be published in Neurocomputing, see //doi.org/10.1016/j.neucom.2024.128437

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. In this work, a simple but effective method is proposed to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without re-training nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is found to be generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called "personalized referring image segmentation" with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.

可理解性 · TOOLS · 3D · 表示 · 機器人 ·

2024 年 9 月 30 日

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Qiaojun Yu,Siyuan Huang,Xibin Yuan,Zhengkai Jiang,Ce Hao,Xin Li,Haonan Chang,Junbo Wang,Liu Liu,Hongsheng Li,Peng Gao,Cewu Lu

Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at://sites.google.com/view/uni-aff/home

Learning · 圖 · 表示 · Networking · 泛化理論 ·

2024 年 9 月 30 日

Whole-Graph Representation Learning For the Classification of Signed Networks

Noé Cecillon,Vincent Labatut,Richard Dufour,Nejat Ar?n?k

Graphs are ubiquitous for modeling complex systems involving structured data and relationships. Consequently, graph representation learning, which aims to automatically learn low-dimensional representations of graphs, has drawn a lot of attention in recent years. The overwhelming majority of existing methods handle unsigned graphs. However, signed graphs appear in an increasing number of application domains to model systems involving two types of opposed relationships. Several authors took an interest in signed graphs and proposed methods for providing vertex-level representations, but only one exists for whole-graph representations, and it can handle only fully connected graphs. In this article, we tackle this issue by proposing two approaches to learning whole-graph representations of general signed graphs. The first is a SG2V, a signed generalization of the whole-graph embedding method Graph2vec that relies on a modification of the Weisfeiler--Lehman relabelling procedure. The second one is WSGCN, a whole-graph generalization of the signed vertex embedding method SGCN that relies on the introduction of master nodes into the GCN. We propose several variants of both these approaches. A bottleneck in the development of whole-graph-oriented methods is the lack of data. We constitute a benchmark composed of three collections of signed graphs with corresponding ground truths. We assess our methods on this benchmark, and our results show that the signed whole-graph methods learn better representations for this task. Overall, the baseline obtains an F-measure score of 58.57, when SG2V and WSGCN reach 73.01 and 81.20, respectively. Our source code and benchmark dataset are both publicly available online.

圖 · 可辨認的 · 知識 (knowledge) · Analysis · 統計量 ·

2024 年 9 月 27 日

Explainable Enrichment-Driven GrAph Reasoner (EDGAR) for Large Knowledge Graphs with Applications in Drug Repurposing

Olawumi Olasunkanmi,Evan Morris,Yaphet Kebede,Harlin Lee,Stanley Ahalt,Alexander Tropsha,Chris Bizon

from arxiv, 10 pages, 5 figures, 4 tables

Knowledge graphs (KGs) represent connections and relationships between real-world entities. We propose a link prediction framework for KGs named Enrichment-Driven GrAph Reasoner (EDGAR), which infers new edges by mining entity-local rules. This approach leverages enrichment analysis, a well-established statistical method used to identify mechanisms common to sets of differentially expressed genes. EDGAR's inference results are inherently explainable and rankable, with p-values indicating the statistical significance of each enrichment-based rule. We demonstrate the framework's effectiveness on a large-scale biomedical KG, ROBOKOP, focusing on drug repurposing for Alzheimer disease (AD) as a case study. Initially, we extracted 14 known drugs from the KG and identified 20 contextual biomarkers through enrichment analysis, revealing functional pathways relevant to shared drug efficacy for AD. Subsequently, using the top 1000 enrichment results, our system identified 1246 additional drug candidates for AD treatment. The top 10 candidates were validated using evidence from medical literature. EDGAR is deployed within ROBOKOP, complete with a web user interface. This is the first study to apply enrichment analysis to large graph completion and drug repurposing.

語音識別 · 可理解性 · 大語言模型 · MoDELS · Extensibility ·

2024 年 9 月 27 日

Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM

Robin Shing-Hei Yuen,Timothy Tin-Long Tse,Jian Zhu

from arxiv, Corrected style from final to preprint

Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. Our approach reduces latency and improves the model's native understanding of speech, paving the way for more efficient and natural real-time audio interactions. We also release a large-scale synthetic conversational dataset to facilitate further research.

控制器 · 正則化項 · 離散化 · Extensibility · 線性的 ·

2024 年 9 月 26 日

Galerkin Method of Regularized Stokeslets for Procedural Fluid Flow with Control Curves

Ryusuke Sugimoto,Jeff Lait,Christopher Batty,Toshiya Hachisuka

from arxiv, Accepted to ACM SIGGRAPH Asia 2024 Technical Communications. See //rsugimoto.net/GalerkinMRS/ for updates

We present a new procedural incompressible velocity field authoring tool, which lets users design a volumetric flow by directly specifying velocity along control curves. Our method combines the Method of Regularized Stokeslets with Galerkin discretization. Based on the highly viscous Stokes flow assumption, we find the force along a given set of curves that satisfies the velocity constraints along them. We can then evaluate the velocity anywhere inside the surrounding infinite 2D or 3D domain. We also show the extension of our method to control the angular velocity along control curves. Compared to a collocation discretization, our method is not very sensitive to the vertex sampling rate along control curves and only requires a small linear system solve.

視頻描述生成（Video Caption） · INFORMS · Performer · 蒸餾 · Extensibility ·

2020 年 3 月 31 日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Boxiao Pan,Haoye Cai,De-An Huang,Kuan-Hui Lee,Adrien Gaidon,Ehsan Adeli,Juan Carlos Niebles

from arxiv, CVPR 2020

Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.