亚州AV无码专区在线电影_亚洲日本午夜一区二区三区_可以直接免费有效黄色网站_午夜性色福利在线视频潮_亚洲熟妇久久中文精品无码_成年奭片免费观看视频天天看_九九精品国产欧美一区二区

Human action recognition still exists many challenging problems such as different viewpoints, occlusion, lighting conditions, human body size and the speed of action execution, although it has been widely used in different areas. To tackle these challenges, the Kinect depth sensor has been developed to record real time depth sequences, which are insensitive to the color of human clothes and illumination conditions. Many methods on recognizing human action have been reported in the literature such as HON4D, HOPC, RBD and HDG, which use the 4D surface normals, pointclouds, skeleton-based model and depth gradients respectively to capture discriminative information from depth videos or skeleton data. In this research project, the performance of four aforementioned algorithms will be analyzed and evaluated using five benchmark datasets, which cover challenging issues such as noise, change of viewpoints, background clutters and occlusions. We also implemented and improved the HDG algorithm, and applied it in cross-view action recognition using the UWA3D Multiview Activity dataset. Moreover, we used different combinations of individual feature vectors in HDG for performance evaluation. The experimental results show that our improvement of HDG outperforms other three state-of-the-art algorithms for cross-view action recognition.

相關內容

Performer

關注 10

Analysis · 基準 · INFORMS · Less · 相似度 ·

2023 年 11 月 13 日

Sketch-based Video Object Segmentation: Benchmark and Analysis

Ruolin Yang,Da Li,Conghui Hu,Timothy Hospedales,Honggang Zhang,Yi-Zhe Song

from arxiv, BMVC 2023

Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate what the most effective design for incorporating a sketch reference is. Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.

TransAct · 可約的 · MoDELS · 情景 · 結點 ·

2023 年 11 月 13 日

Reducing Latency of DAG-based Consensus in the Asynchronous Setting via the UTXO Model

Keyang Liu,Maxim Jourenko,Mario Larangeira

from arxiv, Accepted by ISPA 2023

DAG-based consensus has attracted significant interest due to its high throughput in asynchronous network settings. However, existing protocols such as DAG-rider (Keidar et al., PODC 2021) and ``Narwhal and Tusk'' (Danezis et al., Eurosys 2022) face two undesired practical issues: (1) high transaction latency and (2) high cost to verify transaction outcomes. To address (1), this work introduces a novel commit rule based on the Unspent Transaction Output (UTXO) Data Model, which allows a node to predict the transaction results before triggering the commitment. We propose a new consensus algorithm named ``Board and Clerk'', which reduces the transaction latency by half for roughly 50% of transactions. As the tolerance for faults escalates, more transactions can partake in this latency reduction. In addition, we also propose the Hyper-Block Model with two flexible proposing strategies to tackle (2): blocking and non-blocking. Using our proposed strategies, each node first predicts the transaction results if its proposal is committed and packs this result as a commitment in its proposal. The hyper-block packs the signature of the proposal and the outputs of the consensus layer together in order to prove the transaction results.

圖 · 推斷 · Performer · Pair · prototype ·

2023 年 11 月 12 日

Pipelines and Beyond: Graph Types for ADTs with Futures

Francis Rinaldi,june wunder,Arthur Aevedo De Amorim,Stefan K. Muller

from arxiv, 65 pages, 41 figures, submitted to POPL 2024

Parallel programs are frequently modeled as dependency or cost graphs, which can be used to detect various bugs, or simply to visualize the parallel structure of the code. However, such graphs reflect just one particular execution and are typically constructed in a post-hoc manner. Graph types, which were introduced recently to mitigate this problem, can be assigned statically to a program by a type system and compactly represent the family of all graphs that could result from the program. Unfortunately, prior work is restricted in its treatment of futures, an increasingly common and especially dynamic form of parallelism. In short, each instance of a future must be statically paired with a vertex name. Previously, this led to the restriction that futures could not be placed in collections or be used to construct data structures. Doing so is not a niche exercise: such structures form the basis of numerous algorithms that use forms of pipelining to achieve performance not attainable without futures. All but the most limited of these examples are out of reach of prior graph type systems. In this paper, we propose a graph type system that allows for almost arbitrary combinations of futures and recursive data types. We do so by indexing datatypes with a type-level vertex structure, a codata structure that supplies unique vertex names to the futures in a data structure. We prove the soundness of the system in a parallel core calculus annotated with vertex structures and associated operations. Although the calculus is annotated, this is merely for convenience in defining the type system. We prove that it is possible to annotate arbitrary recursive types with vertex structures, and show using a prototype inference engine that these annotations can be inferred from OCaml-like source code for several complex parallel algorithms.

多峰值 · Extensibility · 圖卷積神經網絡/圖卷積網絡 · MoDELS · Performer ·

2023 年 11 月 10 日

ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation

Yuting Liu,Enneng Yang,Yizhou Dang,Guibing Guo,Qiang Liu,Yuliang Liang,Linying Jiang,Xingwei Wang

Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of the ID embeddings in terms of feature semantics in the literature. In this paper, we revisit the value of ID embeddings for multimodal recommendation and conduct a thorough study regarding its semantics, which we recognize as subtle features of content and structures. Then, we propose a novel recommendation model by incorporating ID embeddings to enhance the semantic features of both content and structures. Specifically, we put forward a hierarchical attention mechanism to incorporate ID embeddings in modality fusing, coupled with contrastive learning, to enhance content representations. Meanwhile, we propose a lightweight graph convolutional network for each modality to amalgamate neighborhood and ID embeddings for improving structural representations. Finally, the content and structure representations are combined to form the ultimate item embedding for recommendation. Extensive experiments on three real-world datasets (Baby, Sports, and Clothing) demonstrate the superiority of our method over state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.

簇 · anchor · 情景 · MoDELS · 參數空間 ·

2023 年 11 月 10 日

Clustering based Multiple Anchors High-Dimensional Model Representation

Meixin Xiong,Liuhong Chen,Ju Ming,Xingchen Pan,Xinyu Yan

In this work, a cut high-dimensional model representation (cut-HDMR) expansion based on multiple anchors is constructed via the clustering method. Specifically, a set of random input realizations is drawn from the parameter space and grouped by the centroidal Voronoi tessellation (CVT) method. Then for each cluster, the centroid is set as the reference, thereby the corresponding zeroth-order term can be determined directly. While for non-zero order terms of each cut-HDMR, a set of discrete points is selected for each input component, and the Lagrange interpolation method is applied. For a new input, the cut-HDMR corresponding to the nearest centroid is used to compute its response. Numerical experiments with high-dimensional integral and elliptic stochastic partial differential equation as backgrounds show that the CVT based multiple anchors cut-HDMR can alleviate the negative impact of a single inappropriate anchor point, and has higher accuracy than the average of several expansions.

MoDELS · tuning · LORA · 控制器 · Extensibility ·

2023 年 11 月 10 日

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Yuchao Gu,Xintao Wang,Jay Zhangjie Wu,Yujun Shi,Yunpeng Chen,Zihan Fan,Wuyou Xiao,Rui Zhao,Shuning Chang,Weijia Wu,Yixiao Ge,Ying Shan,Mike Zheng Shou

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

Siamese · 相似度 · Learning · ICDAR · 數據集 ·

2023 年 11 月 9 日

SigScatNet: A Siamese + Scattering based Deep Learning Approach for Signature Forgery Detection and Similarity Assessment

Anmol Chokshi,Vansh Jain,Rajas Bhope,Sudhir Dhage

from arxiv, 7 pages, 8 figures

The surge in counterfeit signatures has inflicted widespread inconveniences and formidable challenges for both individuals and organizations. This groundbreaking research paper introduces SigScatNet, an innovative solution to combat this issue by harnessing the potential of a Siamese deep learning network, bolstered by Scattering wavelets, to detect signature forgery and assess signature similarity. The Siamese Network empowers us to ascertain the authenticity of signatures through a comprehensive similarity index, enabling precise validation and comparison. Remarkably, the integration of Scattering wavelets endows our model with exceptional efficiency, rendering it light enough to operate seamlessly on cost-effective hardware systems. To validate the efficacy of our approach, extensive experimentation was conducted on two open-sourced datasets: the ICDAR SigComp Dutch dataset and the CEDAR dataset. The experimental results demonstrate the practicality and resounding success of our proposed SigScatNet, yielding an unparalleled Equal Error Rate of 3.689% with the ICDAR SigComp Dutch dataset and an astonishing 0.0578% with the CEDAR dataset. Through the implementation of SigScatNet, our research spearheads a new state-of-the-art in signature analysis in terms of EER scores and computational efficiency, offering an advanced and accessible solution for detecting forgery and quantifying signature similarities. By employing cutting-edge Siamese deep learning and Scattering wavelets, we provide a robust framework that paves the way for secure and efficient signature verification systems.

Performer · Processing（編程語言） · Machine Learning · Learning · Extensibility ·

2023 年 11 月 9 日

Evaluation of Data Processing and Machine Learning Techniques in P300-based Authentication using Brain-Computer Interfaces

Eduardo López Bernal,Sergio López Bernal,Gregorio Martínez Pérez,Alberto Huertas Celdrán

Brain-Computer Interfaces (BCIs) are used in various application scenarios allowing direct communication between the brain and computers. Specifically, electroencephalography (EEG) is one of the most common techniques for obtaining evoked potentials resulting from external stimuli, as the P300 potential is elicited from known images. The combination of Machine Learning (ML) and P300 potentials is promising for authenticating subjects since the brain waves generated by each person when facing a particular stimulus are unique. However, existing authentication solutions do not extensively explore P300 potentials and fail when analyzing the most suitable processing and ML-based classification techniques. Thus, this work proposes i) a framework for authenticating BCI users using the P300 potential; ii) the validation of the framework on ten subjects creating an experimental scenario employing a non-invasive EEG-based BCI; and iii) the evaluation of the framework performance defining two experiments (binary and multiclass ML classification) and three testing configurations incrementally analyzing the performance of different processing techniques and the differences between classifying with epochs or statistical values. This framework achieved a performance close to 100\% f1-score in both experiments for the best classifier, highlighting its effectiveness in accurately authenticating users and demonstrating the feasibility of performing EEG-based authentication using P300 potentials.

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

視覺問答 · 自頂向下 · 圖像字幕 · 注意力機制 · 自下而上 ·

2018 年 3 月 14 日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson,Xiaodong He,Chris Buehler,Damien Teney,Mark Johnson,Stephen Gould,Lei Zhang

from arxiv, CVPR 2018 full oral, winner of the 2017 Visual Question Answering challenge

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.