成人午夜性影院视频,尹人香蕉网在线视频观看

This report introduces the Attention Visualizer package, which is crafted to visually illustrate the significance of individual words in encoder-only transformer-based models. In contrast to other methods that center on tokens and self-attention scores, our approach will examine the words and their impact on the final embedding representation. Libraries like this play a crucial role in enhancing the interpretability and explainability of neural networks. They offer the opportunity to illuminate their internal mechanisms, providing a better understanding of how they operate and can be enhanced. You can access the code and review examples on the following GitHub repository: //github.com/AlaFalaki/AttentionVisualizer.

相關內容

Attention

關注 1

3D · NeRF · Extensibility · EASE · motivation ·

2023 年 10 月 16 日

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu,Yan-Pei Cao,Jay Zhangjie Wu,Weijia Mao,Yuchao Gu,Rui Zhao,Jussi Keppo,Ying Shan,Mike Zheng Shou

from arxiv, Project Page: //showlab.github.io/DynVideo-E/

Despite remarkable research advances in diffusion-based video editing, existing methods are limited to short-length videos due to the contradiction between long-range consistency and frame-wise editing. Recent approaches attempt to tackle this challenge by introducing video-2D representations to degrade video editing to image editing. However, they encounter significant difficulties in handling large-scale motion- and view-change videos especially for human-centric videos. This motivates us to introduce the dynamic Neural Radiance Fields (NeRF) as the human-centric video representation to ease the video editing problem to a 3D space editing task. As such, editing can be performed in the 3D spaces and propagated to the entire video via the deformation field. To provide finer and direct controllable editing, we propose the image-based 3D space editing pipeline with a set of effective designs. These include multi-view multi-pose Score Distillation Sampling (SDS) from both 2D personalized diffusion priors and 3D diffusion priors, reconstruction losses on the reference image, text-guided local parts super-resolution, and style transfer for 3D background space. Extensive experiments demonstrate that our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% ~ 95% in terms of human preference. Compelling video comparisons are provided in the project page //showlab.github.io/DynVideo-E/. Our code and data will be released to the community.

圖像降噪 · 去噪 · BSNS · Attention · Networking ·

2023 年 10 月 16 日

PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

Hyemi Jang,Junsung Park,Dahuin Jung,Jaihyun Lew,Ho Bae,Sungroh Yoon

from arxiv, Accepted to NeurIPS 2023

Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised denoising model from learning identical mapping, each output pixel should not be influenced by its corresponding input pixel; This requirement is known as J-invariance. Blind-spot networks (BSNs) have been a prevalent choice to ensure J-invariance in self-supervised image denoising. However, constructing variations of BSNs by injecting additional operations such as downsampling can expose blinded information, thereby violating J-invariance. Consequently, convolutions designed specifically for BSNs have been allowed only, limiting architectural flexibility. To overcome this limitation, we propose PUCA, a novel J-invariant U-Net architecture, for self-supervised denoising. PUCA leverages patch-unshuffle/shuffle to dramatically expand receptive fields while maintaining J-invariance and dilated attention blocks (DABs) for global context incorporation. Experimental results demonstrate that PUCA achieves state-of-the-art performance, outperforming existing methods in self-supervised image denoising.

向量化 · 自動問答 · 模態 · MoDELS · 語言模型化 ·

2023 年 10 月 13 日

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen,Oleg Sinavski,Jan Hünermann,Alice Karnsund,Andrew James Willmott,Danny Birch,Daniel Maund,Jamie Shotton

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

塑造 · 離散化 · 模型評估 · 表示 · 優化器 ·

2023 年 10 月 13 日

Medial Skeletal Diagram: A Generalized Medial Axis Approach for Compact 3D Shape Representation

Minghao Guo,Bohan Wang,Wojciech Matusik

from arxiv, 22 pages, 28 figures

We propose the Medial Skeletal Diagram, a novel skeletal representation that tackles the prevailing issues around compactness and reconstruction accuracy in existing skeletal representations. Our approach augments the continuous elements in the medial axis representation to effectively shift the complexity away from discrete elements. To that end, we introduce generalized enveloping primitives, an enhancement of the standard primitives in medial axis, which ensures efficient coverage of intricate local features of the input shape and substantially reduces the number of discrete elements required. Moreover, we present a computational framework that constructs a medial skeletal diagram from an arbitrary closed manifold mesh. Our optimization pipeline ensures that the resulting medial skeletal diagram comprehensively covers the input shape with the fewest primitives. Additionally, each optimized primitive undergoes a post-refinement process to guarantee an accurate match with the source mesh in both geometry and tessellation. We validate our approach on a comprehensive benchmark of 100 shapes, demonstrating its compactness of the discrete elements and superior reconstruction accuracy across a variety of cases. Furthermore, we exemplify the versatility of our representation in downstream applications such as shape optimization, shape generation, mesh decomposition, mesh alignment, mesh compression, and user-interactive design.

Learning · 表示 · 強化學習 · 移動平均 · HTTPS ·

2023 年 10 月 13 日

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Yanjie Ze,Yuyao Liu,Ruizhe Shi,Jiaxin Qin,Zhecheng Yuan,Jiashun Wang,Huazhe Xu

from arxiv, NeurIPS 2023. Code and videos: //yanjieze.com/H-InDex

Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at //yanjieze.com/H-InDex .

控制器 · Guidance · MoDELS · state-of-the-art · Conformer ·

2023 年 10 月 12 日

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Yiming Xie,Varun Jampani,Lei Zhong,Deqing Sun,Huaizu Jiang

from arxiv, Project page: //neu-vi.github.io/omnicontrol/

We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion. Both the spatial and realism guidance are essential and they are highly complementary for balancing control accuracy and motion realism. By combining them, OmniControl generates motions that are realistic, coherent, and consistent with the spatial constraints. Experiments on HumanML3D and KIT-ML datasets show that OmniControl not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.

LIDAR · 數據集 · 真實值 · 傳感器 · state-of-the-art ·

2023 年 10 月 12 日

MUN-FRL: A Visual Inertial LiDAR Dataset for Aerial Autonomous Navigation and Mapping

Ravindu G. Thalagala,Sahan M. Gunawardena,Oscar De Silva,Awantha Jayasiri,Arthur Gubbels,George K. I Mann,Raymond G. Gosine

This paper presents a unique outdoor aerial visual-inertial-LiDAR dataset captured using a multi-sensor payload to promote the global navigation satellite system (GNSS)-denied navigation research. The dataset features flight distances ranging from 300m to 5km, collected using a DJI M600 hexacopter drone and the National Research Council (NRC) Bell 412 Advanced Systems Research Aircraft (ASRA). The dataset consists of hardware synchronized monocular images, IMU measurements, 3D LiDAR point-clouds, and high-precision real-time kinematic (RTK)-GNSS based ground truth. Ten datasets were collected as ROS bags over 100 mins of outdoor environment footage ranging from urban areas, highways, hillsides, prairies, and waterfronts. The datasets were collected to facilitate the development of visual-inertial-LiDAR odometry and mapping algorithms, visual-inertial navigation algorithms, object detection, segmentation, and landing zone detection algorithms based upon real-world drone and full-scale helicopter data. All the datasets contain raw sensor measurements, hardware timestamps, and spatio-temporally aligned ground truth. The intrinsic and extrinsic calibrations of the sensors are also provided along with raw calibration datasets. A performance summary of state-of-the-art methods applied on the datasets is also provided.

語言模型化 · MoDELS · 知識 (knowledge) · Extensibility · INTERACT ·

2023 年 10 月 12 日

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

Licheng Wen,Daocheng Fu,Xin Li,Xinyu Cai,Tao Ma,Pinlong Cai,Min Dou,Botian Shi,Liang He,Yu Qiao

Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging large language models with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to instill knowledge-driven capability into autonomous driving systems from the perspective of how humans drive.

INFORMS · 6G · Wireless Networks · 可約的 · 可交換的 ·

2023 年 10 月 12 日

Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications

Wensheng Lin,Yuna Yan,Lixin Li,Zhu Han,Tad Matsumoto

This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.

Performer · 判別器 · 正例 · 假陽性 · 監督 ·

2018 年 5 月 24 日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Pengda Qin,Weiran Xu,William Yang Wang

Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.