露脸视频一区二区三区在线播放-一级A婬片试看28分钟

In 2021, the pioneering work on TypeNet showed that keystroke dynamics verification could scale to hundreds of thousands of users with minimal performance degradation. Recently, the KVC-onGoing competition has provided an open and robust experimental protocol for evaluating keystroke dynamics verification systems of such scale, including considerations of algorithmic fairness. This article describes Type2Branch, the model and techniques that achieved the lowest error rates at the KVC-onGoing, in both desktop and mobile scenarios. The novelty aspects of the proposed Type2Branch include: i) synthesized timing features emphasizing user behavior deviation from the general population, ii) a dual-branch architecture combining recurrent and convolutional paths with various attention mechanisms, iii) a new loss function named Set2set that captures the global structure of the embedding space, and iv) a training curriculum of increasing difficulty. Considering five enrollment samples per subject of approximately 50 characters typed, the proposed Type2Branch achieves state-of-the-art performance with mean per-subject EERs of 0.77% and 1.03% on evaluation sets of respectively 15,000 and 5,000 subjects for desktop and mobile scenarios. With a uniform global threshold for all subjects, the EERs are 3.25% for desktop and 3.61% for mobile, outperforming previous approaches by a significant margin.

相關內容

Performer

關注 10

Networking · Neural Networks · 路徑 · 約束 · CASES ·

2024 年 6 月 13 日

Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Xin Li,Wenyang Gan,Pang Wen,Daqi Zhu

To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.

DeepFakes · Processing（編程語言） · MoDELS · 數據集 · 大語言模型 ·

2024 年 6 月 12 日

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Yi Lu,Yuankun Xie,Ruibo Fu,Zhengqi Wen,Jianhua Tao,Zhiyong Wang,Xin Qi,Xuefei Liu,Yongwei Li,Yukun Liu,Xiaopeng Wang,Shuchen Shi

from arxiv, Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set.

Networking · 3D · 標注 · state-of-the-art · 模型評估 ·

2024 年 6 月 11 日

EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

Yining Shi,Kun Jiang,Ke Wang,Kangan Qian,Yunlong Wang,Jiusi Li,Tuopu Wen,Mengmeng Yang,Yiliang Xu,Diange Yang

from arxiv, preprint under review

3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupancy networks (occnets) are both computationally heavy and label-hungry. In terms of model complexity, occnets are commonly composed of heavy Conv3D modules or transformers on the voxel level. In terms of label annotations requirements, occnets are supervised with large-scale expensive dense voxel labels. Model and data inefficiency, caused by excessive network parameters and label annotations requirement, severely hinder the onboard deployment of occnets. This paper proposes an efficient 3d occupancy network (EFFOcc), that targets the minimal network complexity and label requirement while achieving state-of-the-art accuracy. EFFOcc only uses simple 2D operators, and improves Occ accuracy to the state-of-the-art on multiple large-scale benchmarks: Occ3D-nuScenes, Occ3D-Waymo, and OpenOccupancy-nuScenes. On Occ3D-nuScenes benchmark, EFFOcc has only 18.4M parameters, and achieves 50.46 in terms of mean IoU (mIoU), to our knowledge, it is the occnet with minimal parameters compared with related occnets. Moreover, we propose a two-stage active learning strategy to reduce the requirements of labelled data. Active EFFOcc trained with 6\% labelled voxels achieves 47.19 mIoU, which is 95.7% fully supervised performance. The proposed EFFOcc also supports improved vision-only occupancy prediction with the aid of region-decomposed distillation. Code and demo videos will be available at //github.com/synsin0/EFFOcc.

估計/估計量 · 估計誤差 · 方差 · 控制器 · MoDELS ·

2024 年 6 月 10 日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Jackson Loper,Jeffrey Regier

from arxiv, Submitted to JASA

With the advent of high-throughput screenings, it has become increasingly common for studies to devote limited resources to estimating many parameters imprecisely rather than to estimating a few parameters well. In these studies, only two or three independent replicates measure each parameter, and therefore it is challenging to assess the variance of these measurements. One solution is to pool variance estimates across different parameters using a parametric model of estimator error. However, such models are difficult to specify correctly, especially in the presence of ``batch effects.'' In this paper, we propose new model-free methods for assessing and controlling estimator error. Our focus is on type S error, which is of particular importance in many settings. To produce tight confidence intervals without making unrealistic assumptions, we improve on Hoeffding's bounds for sums of bounded random variables and obtain the tightest possible Chernoff-Cram\'er bound. Our methods compare favorably with existing practice for high-throughput screenings, such as methods based on the Irreproducible Discovery Rate (IDR) and the Benjamini-Hochberg procedure. Existing practices fail to control error at the nominal level in some cases and are needlessly conservative in others.

語音增強 · 數據集 · 情景 · 在線 · Performer ·

2024 年 6 月 10 日

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter,Yi-Chiao Wu,Steven Krenn,Simon Welker,Bunlong Lay,Shinji Watanabe,Alexander Richard,Timo Gerkmann

from arxiv, Accepted at Interspeech 2024

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

分離的 · HTTPS · contrastive · Learning · Prompt ·

2024 年 6 月 9 日

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Mark Hamilton,Andrew Zisserman,John R. Hershey,William T. Freeman

from arxiv, Computer Vision and Pattern Recognition 2024

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visually aligned features solely through watching videos. We show that DenseAV can discover the ``meaning'' of words and the ``location'' of sounds without explicit localization supervision. Furthermore, it automatically discovers and distinguishes between these two types of associations without supervision. We show that DenseAV's localization abilities arise from a new multi-head feature aggregation operator that directly compares dense image and audio representations for contrastive learning. In contrast, many other systems that learn ``global'' audio and video representations cannot localize words and sound. Finally, we contribute two new datasets to improve the evaluation of AV representations through speech and sound prompted semantic segmentation. On these and other datasets we show DenseAV dramatically outperforms the prior art on speech and sound prompted semantic segmentation. DenseAV outperforms the previous state-of-the-art, ImageBind, on cross-modal retrieval using fewer than half of the parameters. Project Page: \href{//aka.ms/denseav}{//aka.ms/denseav}

INTERACT · INFORMS · Performer · 值域 · Networking ·

2024 年 6 月 7 日

EAIA: An Efficient and Anonymous Identity Authentication Scheme in 5G-V2V

Qianmin Du,Jianhong Zhou,Maode Ma

Vehicle Ad-hoc Networks (VANETs) have experienced significant development in recent years, playing a crucial role in enhancing the driving experience by enabling safer and more efficient inter-vehicle interactions through information exchange. Vehicle-to-vehicle (V2V) communication is particularly vital as it not only helps to prevent collisions and improve traffic efficiency but also provides essential situational awareness to drivers or autonomous driving systems. Communication is typically supported by Roadside Units (RSUs); however, in practical applications, vehicles may exceed the communication range of RSUs, thus exposing them to various malicious attacks. Additionally, considering the limited computational resources of onboard units (OBUs) in vehicles, there is a high demand for designing lightweight security protocols that support V2V communication. To address this issue, this paper proposes an efficient anonymous V2V identity authentication protocol tailored for scenarios that lack RSU support. The proposed protocol has been formally assessed using the Scyther tool, demonstrating its capability to withstand major typical malicious attacks. Performance evaluations indicate that the proposed protocol is efficient in terms of communication and computational overhead, making it a viable solution for V2V vehicle communication.

向量化 · Performer · 編譯器 · 環 · 代碼 ·

2024 年 6 月 7 日

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

Jubi Taneja,Avery Laird,Cong Yan,Madan Musuvathi,Shuvendu K. Lahiri

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

contrastive · CLIP · 損失 · 模態 · Performer ·

2024 年 6 月 6 日

It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap

Abrar Fahim,Alex Murphy,Alona Fyshe

Multi-modal contrastive models such as CLIP achieve state-of-the-art performance in zero-shot classification by embedding input images and texts on a joint representational space. Recently, a modality gap has been reported in two-encoder contrastive models like CLIP, meaning that the image and text embeddings reside in disjoint areas of the latent space. Previous studies suggest that this gap exists due to 1) the cone effect, 2) mismatched pairs in the dataset, and 3) insufficient training. We show that, even when accounting for all these factors, and even when using the same modality, the contrastive loss actually creates a gap during training. As a result, We propose that the modality gap is inherent to the two-encoder contrastive loss and rename it the contrastive gap. We present evidence that attributes this contrastive gap to low uniformity in CLIP space, resulting in embeddings that occupy only a small portion of the latent space. To close the gap, we adapt the uniformity and alignment properties of unimodal contrastive loss to the multi-modal setting and show that simply adding these terms to the CLIP loss distributes the embeddings more uniformly in the representational space, closing the gap. In our experiments, we show that the modified representational space achieves better performance than default CLIP loss in downstream tasks such as zero-shot image classification and multi-modal arithmetic.

解碼 · 大語言模型 · 語言模型化 · Performer · ReQuEST ·

2024 年 6 月 6 日

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Yinmin Zhong,Shengyu Liu,Junda Chen,Jianbo Hu,Yibo Zhu,Xuanzhe Liu,Xin Jin,Hao Zhang

from arxiv, OSDI 2024

DistServe improves the performance of large language models (LLMs) serving by disaggregating the prefill and decoding computation. Existing LLM serving systems colocate the two phases and batch the computation of prefill and decoding across all users and requests. We find that this strategy not only leads to strong prefill-decoding interferences but also couples the resource allocation and parallelism plans for both phases. LLM applications often emphasize individual latency for each phase: time to first token (TTFT) for the prefill phase and time per output token (TPOT) of each request for the decoding phase. In the presence of stringent latency requirements, existing systems have to prioritize one latency over the other, or over-provision compute resources to meet both. DistServe assigns prefill and decoding computation to different GPUs, hence eliminating prefill-decoding interferences. Given the application's TTFT and TPOT requirements, DistServe co-optimizes the resource allocation and parallelism strategy tailored for each phase. DistServe also places the two phases according to the serving cluster's bandwidth to minimize the communication caused by disaggregation. As a result, DistServe significantly improves LLM serving performance in terms of the maximum rate that can be served within both TTFT and TPOT constraints on each GPU. Our evaluations show that on various popular LLMs, applications, and latency requirements, DistServe can serve 7.4x more requests or 12.6x tighter SLO, compared to state-of-the-art systems, while staying within latency constraints for > 90% of requests.