斗破苍穹第四季25集免费观看_水蜜桃在线精品视频观看_99久久精品福利久久久久久_午夜1314美女爱做视频_国产高清在线观看91精品_影音先锋色资源站_成年黄网站大全免费无码

The availability of large-scale authentic face databases has been crucial to the significant advances made in face recognition research over the past decade. However, legal and ethical concerns led to the recent retraction of many of these databases by their creators, raising questions about the continuity of future face recognition research without one of its key resources. Synthetic datasets have emerged as a promising alternative to privacy-sensitive authentic data for face recognition development. However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. Through extensive evaluations, our proposed synthetic-based face recognition approach pushed the limits of state-of-the-art performances, achieving, for example, 98.00% accuracy on the Labeled Faces in the Wild (LFW) benchmark, far ahead from the recent synthetic-based face recognition solutions with 95.40% and bridging the gap to authentic-based face recognition with 99.82% accuracy.

相關內容

模(mo)型評估

關注 1730

機器學習系(xi)統設計系(xi)統評估標準

語音識別 · Continuity · 偽標記 · Performer · MoDELS ·

2023 年 9 月 29 日

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

Andrew Rouditchenko,Ronan Collobert,Tatiana Likhomanenko

from arxiv, Under review

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can perform speech recognition using both audio and visual modalities, or only one modality. Our method uses the same audio-visual model for both supervised training and pseudo-label generation, mitigating the need for external speech recognition models to generate pseudo-labels. AV-CPL obtains significant improvements in VSR performance on the LRS3 dataset while maintaining practical ASR and AVSR performance. Finally, using visual-only speech data, our method is able to leverage unlabeled visual speech to improve VSR.

Networking · 殘差網絡 · 模型評估 · 回合 · Subspace ·

2023 年 9 月 28 日

An Enhanced Low-Resolution Image Recognition Method for Traffic Environments

Zongcai Tan,Zhenhai Gao

Currently, low-resolution image recognition is confronted with a significant challenge in the field of intelligent traffic perception. Compared to high-resolution images, low-resolution images suffer from small size, low quality, and lack of detail, leading to a notable decrease in the accuracy of traditional neural network recognition algorithms. The key to low-resolution image recognition lies in effective feature extraction. Therefore, this paper delves into the fundamental dimensions of residual modules and their impact on feature extraction and computational efficiency. Based on experiments, we introduce a dual-branch residual network structure that leverages the basic architecture of residual networks and a common feature subspace algorithm. Additionally, it incorporates the utilization of intermediate-layer features to enhance the accuracy of low-resolution image recognition. Furthermore, we employ knowledge distillation to reduce network parameters and computational overhead. Experimental results validate the effectiveness of this algorithm for low-resolution image recognition in traffic environments.

contrastive · Weight · Learning · Attention · 對比學習 ·

2023 年 9 月 28 日

ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation

Hitoshi Teshima,Naoki Wake,Diego Thomas,Yuta Nakashima,Hiroshi Kawasaki,Katsushi Ikeuchi

Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using text, audio and other information as the input, however, they generate gestures mainly based on audio, which is called a beat gesture. Although the ratio of the beat gesture is more than 70% of actual human gestures, content based gestures sometimes play an important role to make avatars more realistic and human-like. In this paper, we propose a attention-based contrastive learning for text-to-gesture (ACT2G), where generated gestures represent content of the text by estimating attention weight for each word from the input text. In the method, since text and gesture features calculated by the attention weight are mapped to the same latent space by contrastive learning, once text is given as input, the network outputs a feature vector which can be used to generate gestures related to the content. User study confirmed that the gestures generated by ACT2G were better than existing methods. In addition, it was demonstrated that wide variation of gestures were generated from the same text by changing attention weights by creators.

特化 · Networking · Extensibility · Neural Networks · 推斷 ·

2023 年 9 月 27 日

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

Matteo Grimaldi,Darshan C. Ganji,Ivan Lazarevich,Sudhakar Sah

from arxiv, Code is available at //github.com/Deeplite/activ-sparse

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of $1.25 \times$ with a minimal accuracy drop of $1.1\%$ for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques.

MoDELS · 估計/估計量 · 異常檢測 · 規范化的 · Neck ·

2023 年 9 月 27 日

Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Jian Xiao,Tianyuan Liu,Genlin Ji

Previous approaches to detecting human anomalies in videos have typically relied on implicit modeling by directly applying the model to video or skeleton data, potentially resulting in inaccurate modeling of motion information. In this paper, we conduct an exploratory study and introduce a new idea called HKVAD (Human Kinematic-inspired Video Anomaly Detection) for video anomaly detection, which involves the explicit use of human kinematic features to detect anomalies. To validate the effectiveness and potential of this perspective, we propose a pilot method that leverages the kinematic features of the skeleton pose, with a specific focus on the walking stride, skeleton displacement at feet level, and neck level. Following this, the method employs a normalizing flow model to estimate density and detect anomalies based on the estimated density. Based on the number of kinematic features used, we have devised three straightforward variant methods and conducted experiments on two highly challenging public datasets, ShanghaiTech and UBnormal. Our method achieves good results with minimal computational resources, validating its effectiveness and potential.

流 · 層 · 通道 · Networking · INFORMS ·

2023 年 9 月 27 日

Grain-128PLE: Generic Physical-Layer Encryption for IoT Networks

Marcus de Ree,Georgios Mantas,Jonathan Rodriguez

from arxiv, Paper accepted to the GLOBECOM 2023 conference

Physical layer security (PLS) encompasses techniques proposed at the physical layer to achieve information security objectives while requiring a minimal resource footprint. The channel coding-based secrecy and signal modulation-based encryption approaches are reliant on certain channel conditions or a certain communications protocol stack to operate on, which prevents them from being a generic solution. This paper presents Grain-128PLE, a lightweight physical layer encryption (PLE) scheme that is derived from the Grain-128AEAD v2 stream cipher. The Grain-128PLE stream cipher performs encryption and decryption at the physical layer, in between the channel coding and signal modulation processes. This placement, like that of the A5 stream cipher that had been used in the GSM communications standard, makes it a generic solution for providing data confidentiality in IoT networks. The design of Grain-128PLE maintains the structure of the main building blocks of the original Grain-128AEAD v2 stream cipher, evaluated for its security strength during NIST's recent Lightweight Cryptography competition, and is therefore expected to achieve similar levels of security.

Microsoft Surface · MoDELS · ASSETS · Learning · 3D ·

2023 年 9 月 27 日

Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation

Rui Chen,Yongwei Chen,Ningxin Jiao,Kui Jia

from arxiv, Accepted by ICCV 2023. Project page: //fantasia3d.github.io/

Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: //fantasia3d.github.io/.

在線 · Learning · 評論員 · 異常點 · 機器人 ·

2023 年 9 月 27 日

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

Jiazhao Zhang,Nandiraju Gireesh,Jilong Wang,Xiaomeng Fang,Chaoyi Xu,Weiguang Chen,Liu Dai,He Wang

Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses.

損失 · Networking · 解碼 · Extensibility · 可約的 ·

2023 年 9 月 27 日

GRACE++: Loss-Resilient Real-Time Video through Neural Codecs

Yihua Cheng,Ziyi Zhang,Hanchen Li,Anton Arapin,Yue Zhang,Qizheng Zhang,Yuhan Liu,Xu Zhang,Francis Yan,Amrita Mazumdar,Nick Feamster,Junchen Jiang

In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE++, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE++'s enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE++ achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE++ exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE++ reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE++ registers a 38% higher mean opinion score (MOS) than other baselines.

INTERACT · 鏈路預測 · entity · Extensibility · 圖 ·

2019 年 11 月 1 日

InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

Shikhar Vashishth,Soumya Sanyal,Vikram Nitin,Nilesh Agrawal,Partha Talukdar

from arxiv, 11 pages

Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas -- feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis -- that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.