亚洲AV永久无码精品九之,国产一区二区日韩欧美在线,97碰成视频免费天天碰,国产日韩综合精品一区二区三区四,国产日韩另类综合一区二区

This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code will be released soon.

相關內容

DeepFakes

關注 4

Continuity · 正則化項 · 圖 · 流形 · MoDELS ·

2024 年 3 月 7 日

MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment

Kanglei Zhou,Liyuan Wang,Xingxing Zhang,Hubert P. H. Shum,Frederick W. B. Li,Jianguo Li,Xiaohui Liang

Action Quality Assessment (AQA) evaluates diverse skills but models struggle with non-stationary data. We propose Continual AQA (CAQA) to refine models using sparse new data. Feature replay preserves memory without storing raw inputs. However, the misalignment between static old features and the dynamically changing feature manifold causes severe catastrophic forgetting. To address this novel problem, we propose Manifold-Aligned Graph Regularization (MAGR), which first aligns deviated old features to the current feature manifold, ensuring representation consistency. It then constructs a graph jointly arranging old and new features aligned with quality scores. Experiments show MAGR outperforms recent strong baselines with up to 6.56%, 5.66%, 15.64%, and 9.05% correlation gains on the MTL-AQA, FineDiving, UNLV-Dive, and JDM-MSA split datasets, respectively. This validates MAGR for continual assessment challenges arising from non-stationary skill variations.

等變 · 可行 · MoDELS · Processing（編程語言） · 高斯混合（模型） ·

2024 年 3 月 7 日

Towards Feasible Dynamic Grasping: Leveraging Gaussian Process Distance Field, SE(3) Equivariance and Riemannian Mixture Models

Ho Jin Choi,Nadia Figueroa

from arxiv, 7 pages, 7 figures

This paper introduces a novel approach to improve robotic grasping in dynamic environments by integrating Gaussian Process Distance Fields (GPDF), SE(3) equivariant networks, and Riemannian Mixture Models. The aim is to enable robots to grasp moving objects effectively. Our approach comprises three main components: object shape reconstruction, grasp sampling, and implicit grasp pose selection. GPDF accurately models the shape of objects, which is essential for precise grasp planning. SE(3) equivariance ensures that the sampled grasp poses are equivariant to the object's pose changes, enhancing robustness in dynamic scenarios. Riemannian Gaussian Mixture Models are employed to assess reachability, providing a feasible and adaptable grasping strategies. Feasible grasp poses are targeted by novel task or joint space reactive controllers formulated using Gaussian Mixture Models and Gaussian Processes. This method resolves the challenge of discrete grasp pose selection, enabling smoother grasping execution. Experimental validation confirms the effectiveness of our approach in generating feasible grasp poses and achieving successful grasps in dynamic environments. By integrating these advanced techniques, we present a promising solution for enhancing robotic grasping capabilities in real-world scenarios.

LIDAR · Processing（編程語言） · state-of-the-art · 3D · CARS ·

2024 年 3 月 6 日

Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving

Riccardo Pieroni,Simone Specchia,Matteo Corno,Sergio Matteo Savaresi

from arxiv, Published at IEEE European Control Conference 2024

This paper presents a novel multi-modal Multi-Object Tracking (MOT) algorithm for self-driving cars that combines camera and LiDAR data. Camera frames are processed with a state-of-the-art 3D object detector, whereas classical clustering techniques are used to process LiDAR observations. The proposed MOT algorithm comprises a three-step association process, an Extended Kalman filter for estimating the motion of each detected dynamic obstacle, and a track management phase. The EKF motion model requires the current measured relative position and orientation of the observed object and the longitudinal and angular velocities of the ego vehicle as inputs. Unlike most state-of-the-art multi-modal MOT approaches, the proposed algorithm does not rely on maps or knowledge of the ego global pose. Moreover, it uses a 3D detector exclusively for cameras and is agnostic to the type of LiDAR sensor used. The algorithm is validated both in simulation and with real-world data, with satisfactory results.

大語言模型 · 語言模型化 · MoDELS · Performer · state-of-the-art ·

2024 年 3 月 6 日

SaulLM-7B: A pioneering Large Language Model for Law

Pierre Colombo,Telmo Pessoa Pires,Malik Boudiaf,Dominic Culver,Rui Melo,Caio Corro,Andre F. T. Martins,Fabrizio Esposito,Vera Lúcia Raposo,Sofia Morgado,Michael Desa

In this paper, we introduce SaulLM-7B, a large language model (LLM) tailored for the legal domain. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens. SaulLM-7B exhibits state-of-the-art proficiency in understanding and processing legal documents. Additionally, we present a novel instructional fine-tuning method that leverages legal datasets to further enhance SaulLM-7B's performance in legal tasks. SaulLM-7B is released under the CC-BY-SA-4.0 License.

剪枝 · 詞元分析器 · 多峰值 · 可約的 · 變換 ·

2024 年 3 月 5 日

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao,Peng Ye,Shengze Li,Chong Yu,Yansong Tang,Jiwen Lu,Tao Chen

from arxiv, 19 pages, 9 figures, Published in CVPR2024

Vision-Language Transformers (VLTs) have shown great success recently, but are meanwhile accompanied by heavy computation costs, where a major reason can be attributed to the large number of visual and language tokens. Existing token pruning research for compressing VLTs mainly follows a single-modality-based scheme yet ignores the critical role of aligning different modalities for guiding the token pruning process, causing the important tokens for one modality to be falsely pruned in another modality branch. Meanwhile, existing VLT pruning works also lack the flexibility to dynamically compress each layer based on different input samples. To this end, we propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs. Specifically, we first introduce a well-designed Multi-modality Alignment Guidance (MAG) module that can align features of the same semantic concept from different modalities, to ensure the pruned tokens are less important for all modalities. We further design a novel Dynamic Token Pruning (DTP) module, which can adaptively adjust the token compression ratio in each layer based on different input instances. Extensive experiments on various benchmarks demonstrate that MADTP significantly reduces the computational complexity of kinds of multimodal models while preserving competitive performance. Notably, when applied to the BLIP model in the NLVR2 dataset, MADTP can reduce the GFLOPs by 80% with less than 4% performance degradation.

近似 · DNN · Adam · 邊 · 論文 ·

2024 年 3 月 5 日

AdAM: Adaptive Fault-Tolerant Approximate Multiplier for Edge DNN Accelerators

Mahdi Taheri,Natalia Cherezova,Samira Nazari,Ahsan Rafiq,Ali Azarpeyvand,Tara Ghasempouri,Masoud Daneshtalab,Jaan Raik,Maksim Jenihhin

In this paper, we propose an architecture of a novel adaptive fault-tolerant approximate multiplier tailored for ASIC-based DNN accelerators.

SCAN · 3D · Processing（編程語言） · 點云 · 可約的 ·

2024 年 3 月 5 日

3D-BBS: Global Localization for 3D Point Cloud Scan Matching Using Branch-and-Bound Algorithm

Koki Aoki,Kenji Koide,Shuji Oishi,Masashi Yokozuka,Atsuhiko Banno,Junichi Meguro

from arxiv, IEEE International Conference on Robotics and Automation (ICRA2024)

This paper presents an accurate and fast 3D global localization method, 3D-BBS, that extends the existing branch-and-bound (BnB)-based 2D scan matching (BBS) algorithm. To reduce memory consumption, we utilize a sparse hash table for storing hierarchical 3D voxel maps. To improve the processing cost of BBS in 3D space, we propose an efficient roto-translational space branching. Furthermore, we devise a batched BnB algorithm to fully leverage GPU parallel processing. Through experiments in simulated and real environments, we demonstrated that the 3D-BBS enabled accurate global localization with only a 3D LiDAR scan roughly aligned in the gravity direction and a 3D pre-built map. This method required only 878 msec on average to perform global localization and outperformed state-of-the-art global registration methods in terms of accuracy and processing speed.

評分函數 · 泛函 · 得分 · MoDELS · 無監督 ·

2024 年 3 月 5 日

DISYRE: Diffusion-Inspired SYnthetic REstoration for Unsupervised Anomaly Detection

Sergio Naval Marimont,Matthew Baugh,Vasilis Siomos,Christos Tzelepis,Bernhard Kainz,Giacomo Tarroni

from arxiv, 5 pages, 3 figures. Accepted for publication in ISBI 2024

Unsupervised Anomaly Detection (UAD) techniques aim to identify and localize anomalies without relying on annotations, only leveraging a model trained on a dataset known to be free of anomalies. Diffusion models learn to modify inputs $x$ to increase the probability of it belonging to a desired distribution, i.e., they model the score function $\nabla_x \log p(x)$. Such a score function is potentially relevant for UAD, since $\nabla_x \log p(x)$ is itself a pixel-wise anomaly score. However, diffusion models are trained to invert a corruption process based on Gaussian noise and the learned score function is unlikely to generalize to medical anomalies. This work addresses the problem of how to learn a score function relevant for UAD and proposes DISYRE: Diffusion-Inspired SYnthetic REstoration. We retain the diffusion-like pipeline but replace the Gaussian noise corruption with a gradual, synthetic anomaly corruption so the learned score function generalizes to medical, naturally occurring anomalies. We evaluate DISYRE on three common Brain MRI UAD benchmarks and substantially outperform other methods in two out of the three tasks.

端到端 · MoDELS · 預測器/決策函數 · Performer · AIM ·

2024 年 3 月 3 日

PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Tianhua Qi,Wenming Zheng,Cheng Lu,Yuan Zong,Hailun Lian

from arxiv, Accepted to ICASSP2024

In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception. To improve the content naturalness of converted audio, we have developed an end-to-end EVC architecture inspired by the high audio quality of VITS. By seamlessly integrating an acoustic converter and vocoder, we effectively address the common issue of mismatch between emotional prosody training and run-time conversion that is prevalent in existing EVC models. To further enhance the emotional naturalness, we introduce an emotion descriptor to model the subtle prosody variations of different speech emotions. Additionally, we propose a prosody predictor, which predicts prosody features from text based on the provided emotion label. Notably, we introduce a prosody alignment loss to establish a connection between latent prosody features from two distinct modalities, ensuring effective training. Experimental results show that the performance of PAVITS is superior to the state-of-the-art EVC methods. Speech Samples are available at //jeremychee4.github.io/pavits4EVC/ .

知識 (knowledge) · 圖 · 知識圖譜 · 數據集 · Vine ·

2023 年 5 月 22 日

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Yuqi Zhu,Xiaohan Wang,Jing Chen,Shuofei Qiao,Yixin Ou,Yunzhi Yao,Shumin Deng,Huajun Chen,Ningyu Zhang

from arxiv, Work in progress

This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in //github.com/zjunlp/AutoKG.