嘘嘘中国免费观看网站-国产三级精品一区在线观看

Existing gesture interfaces only work with a fixed set of gestures defined either by interface designers or by users themselves, which introduces learning or demonstration efforts that diminish their naturalness. Humans, on the other hand, understand free-form gestures by synthesizing the gesture, context, experience, and common sense. In this way, the user does not need to learn, demonstrate, or associate gestures. We introduce GestureGPT, a free-form hand gesture understanding framework that mimics human gesture understanding procedures to enable a natural free-form gestural interface. Our framework leverages multiple Large Language Model agents to manage and synthesize gesture and context information, then infers the interaction intent by associating the gesture with an interface function. More specifically, our triple-agent framework includes a Gesture Description Agent that automatically segments and formulates natural language descriptions of hand poses and movements based on hand landmark coordinates. The description is deciphered by a Gesture Inference Agent through self-reasoning and querying about the interaction context (e.g., interaction history, gaze data), which is managed by a Context Management Agent. Following iterative exchanges, the Gesture Inference Agent discerns the user's intent by grounding it to an interactive function. We validated our framework offline under two real-world scenarios: smart home control and online video streaming. The average zero-shot Top-1/Top-5 grounding accuracies are 44.79%/83.59% for smart home tasks and 37.50%/73.44% for video streaming tasks. We also provide an extensive discussion that includes rationale for model selection, generalizability, and future research directions for a practical system etc.

相關內容

Agent

關注 15

Learning · 監督 · NeurIPS 2019 · 目標檢測 · 可辨認的 ·

2024 年 12 月 13 日

UN-DETR: Promoting Objectness Learning via Joint Supervision for Unknown Object Detection

Haomiao Liu,Hao Xu,Chuhuai Yue,Bo Ma

from arxiv, Accepted by AAAI-2025;15 pages, 11figures

Unknown Object Detection (UOD) aims to identify objects of unseen categories, differing from the traditional detection paradigm limited by the closed-world assumption. A key component of UOD is learning a generalized representation, i.e. objectness for both known and unknown categories to distinguish and localize objects from the background in a class-agnostic manner. However, previous methods obtain supervision signals for learning objectness in isolation from either localization or classification information, leading to poor performance for UOD. To address this issue, we propose a transformer-based UOD framework, UN-DETR. Based on this, we craft Instance Presence Score (IPS) to represent the probability of an object's presence. For the purpose of information complementarity, IPS employs a strategy of joint supervised learning, integrating attributes representing general objectness from the positional and the categorical latent space as supervision signals. To enhance IPS learning, we introduce a one-to-many assignment strategy to incorporate more supervision. Then, we propose Unbiased Query Selection to provide premium initial query vectors for the decoder. Additionally, we propose an IPS-guided post-process strategy to filter redundant boxes and correct classification predictions for known and unknown objects. Finally, we pretrain the entire UN-DETR in an unsupervised manner, in order to obtain objectness prior. Our UN-DETR is comprehensively evaluated on multiple UOD and known detection benchmarks, demonstrating its effectiveness and achieving state-of-the-art performance.

Learning · 樣本 · 小樣本學習 · Engineering · Performer ·

2024 年 12 月 13 日

MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

Jiliang Li,Yifan Zhang,Yu Huang,Kevin Leach

Recent growth and proliferation of malware has tested practitioners' ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners' ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning that achieves high accuracy with sparse training data. We present a novel domain-knowledge-aware technique for augmenting malware feature representations, enhancing few-shot performance of semi-supervised malware family classification. We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings. Our research confirms the feasibility and effectiveness of lightweight, domain-knowledge-aware feature augmentation methods and highlights the capabilities of similar semi-supervised classifiers in addressing malware classification issues.

MoDELS · Neural Networks · Networking · 講稿 · PDE ·

2024 年 12 月 12 日

Finite-PINN: A Physics-Informed Neural Network Architecture for Solving Solid Mechanics Problems with General Geometries

Haolin Li,Yuyang Miao,Zahra Sharif Khodaei,M. H. Aliabadi

PINN models have demonstrated impressive capabilities in addressing fluid PDE problems, and their potential in solid mechanics is beginning to emerge. This study identifies two key challenges when using PINN to solve general solid mechanics problems. These challenges become evident when comparing the limitations of PINN with the well-established numerical methods commonly used in solid mechanics, such as the finite element method (FEM). Specifically: a) PINN models generate solutions over an infinite domain, which conflicts with the finite boundaries typical of most solid structures; and b) the solution space utilised by PINN is Euclidean, which is inadequate for addressing the complex geometries often present in solid structures. This work proposes a PINN architecture used for general solid mechanics problems, termed the Finite-PINN model. The proposed model aims to effectively address these two challenges while preserving as much of the original implementation of PINN as possible. The unique architecture of the Finite-PINN model addresses these challenges by separating the approximation of stress and displacement fields, and by transforming the solution space from the traditional Euclidean space to a Euclidean-topological joint space. Several case studies presented in this paper demonstrate that the Finite-PINN model provides satisfactory results for a variety of problem types, including both forward and inverse problems, in both 2D and 3D contexts. The developed Finite-PINN model offers a promising tool for addressing general solid mechanics problems, particularly those not yet well-explored in current research.

Prompt · Learning · SOFT · 詞元分析器 · MoDELS ·

2024 年 12 月 12 日

ATPrompt: Textual Prompt Learning with Embedded Attributes

Zheng Li,Yibing Song,Penghai Zhao,Ming-Ming Cheng,Xiang Li,Jian Yang

from arxiv, Technical Report. Project Page: //zhengli97.github.io/ATPrompt/

Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks. However, current training is restricted to aligning images with predefined known categories and cannot be associated with unknown categories. In this work, we propose utilizing universal attributes as a bridge to enhance the alignment between images and unknown categories. Specifically, we introduce an Attribute-embedded Textual Prompt learning method for vision-language models, named ATPrompt. This approach expands the learning space of soft prompts from the original one-dimensional category level into the multi-dimensional attribute level by incorporating multiple universal attribute tokens into the learnable soft prompts. Through this modification, we transform the text prompt from a category-centric form to an attribute-category hybrid form. To finalize the attributes for downstream tasks, we propose a differentiable attribute search method that learns to identify representative and suitable attributes from a candidate pool summarized by a large language model. As an easy-to-use plug-in technique, ATPrompt can seamlessly replace the existing prompt format of textual-based methods, offering general improvements at a negligible computational cost. Extensive experiments on 11 datasets demonstrate the effectiveness of our method.

有偏 · MoDELS · 語言模型化 · Facebook AI Research · Notability ·

2024 年 12 月 12 日

SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration

Xin Guan,Nathaniel Demchak,Saloni Gupta,Ze Wang,Ediz Ertekin Jr.,Adriano Koshiyama,Emre Kazim,Zekun Wu

from arxiv, COLING 2025 Main Conference

The development of unbiased large language models is widely recognized as crucial, yet existing benchmarks fall short in detecting biases due to limited scope, contamination, and lack of a fairness baseline. SAGED(bias) is the first holistic benchmarking pipeline to address these problems. The pipeline encompasses five core stages: scraping materials, assembling benchmarks, generating responses, extracting numeric features, and diagnosing with disparity metrics. SAGED includes metrics for max disparity, such as impact ratio, and bias concentration, such as Max Z-scores. Noticing that metric tool bias and contextual bias in prompts can distort evaluation, SAGED implements counterfactual branching and baseline calibration for mitigation. For demonstration, we use SAGED on G20 Countries with popular 8b-level models including Gemma2, Llama3.1, Mistral, and Qwen2. With sentiment analysis, we find that while Mistral and Qwen2 show lower max disparity and higher bias concentration than Gemma2 and Llama3.1, all models are notably biased against countries like Russia and (except for Qwen2) China. With further experiments to have models role-playing U.S. presidents, we see bias amplifies and shifts in heterogeneous directions. Moreover, we see Qwen2 and Mistral not engage in role-playing, while Llama3.1 and Gemma2 role-play Trump notably more intensively than Biden and Harris, indicating role-playing performance bias in these models.

WEB · Performance · Performer · Networking · 異常檢測 ·

2024 年 12 月 12 日

WT-CFormer: High-Performance Web Traffic Anomaly Detection Based on Spatiotemporal Analysis

Yundi He,Runhua Shi,Boyan Wang

Web traffic (WT) refers to time-series data that captures the volume of data transmitted to and from a web server during a user's visit to a website. However, web traffic has different distributions coming from various sources as well as the imbalance between normal and abnormal categories, it is difficult to accurately and efficiently identify abnormal web traffic. Deep neural network approaches for web traffic anomaly detection have achieved cutting-edge classification performance. In order to achieve high-performance spatiotemporal detection of network attacks, we innovatively design WT-CFormer, which integrates Transformer and CNN, effectively capturing the temporal and spatial characteristics. We conduct a large numbr of experiments to evaluate the method we proposed. The results show that WT-CFormer has the highest performance, obtaining a recall as high as 96.79%, a precision of 97.35%, an F1 score of 97.07%, and an accuracy of 99.43%, which is 7.09%,1.15%, 4.77%, and 0.83% better than the state-of-the-art method, followed by C-LSTM, CTGA, random forest, and KNN algorithms. In addition, we find that the classification performance of WT-CFormer with only 50 training epochs outperforms C-LSTM with 500 training epochs, which greatly improves the convergence performance. Finally, we perform ablation experiments to demonstrate the necessity of each component within WT-CFormer.

設計 · 知識 (knowledge) · 基 · 知識庫 · 語言模型化 ·

2024 年 12 月 12 日

DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models

Mingyue Yuan,Jieshan Chen,Zhenchang Xing,Aaron Quigley,Yuyu Luo,Tianqi Luo,Gelareh Mohammadi,Qinghua Lu,Liming Zhu

from arxiv, 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

The rise of Large Language Models (LLMs) has streamlined frontend interface creation through tools like Vercel's V0, yet surfaced challenges in design quality (e.g., accessibility, and usability). Current solutions, often limited by their focus, generalisability, or data dependency, fall short in addressing these complexities. Moreover, none of them examine the quality of LLM-generated UI design. In this work, we introduce DesignRepair, a novel dual-stream design guideline-aware system to examine and repair the UI design quality issues from both code aspect and rendered page aspect. We utilised the mature and popular Material Design as our knowledge base to guide this process. Specifically, we first constructed a comprehensive knowledge base encoding Google's Material Design principles into low-level component knowledge base and high-level system design knowledge base. After that, DesignRepair employs a LLM for the extraction of key components and utilizes the Playwright tool for precise page analysis, aligning these with the established knowledge bases. Finally, we integrate Retrieval-Augmented Generation with state-of-the-art LLMs like GPT-4 to holistically refine and repair frontend code through a strategic divide and conquer approach. Our extensive evaluations validated the efficacy and utility of our approach, demonstrating significant enhancements in adherence to design guidelines, accessibility, and user experience metrics.

估計/估計量 · CASES · CASE · 機器人 · 支持向量機 ·

2024 年 12 月 12 日

Hybrid Model-Data Fault Diagnosis for Wafer Handler Robots: Tilt and Broken Belt Cases

Tim van Esch,Farhad Ghanipoor,Carlos Murguia,Nathan van de Wouw

This work proposes a hybrid model- and data-based scheme for fault detection, isolation, and estimation (FDIE) for a class of wafer handler (WH) robots. The proposed hybrid scheme consists of: 1) a linear filter that simultaneously estimates system states and fault-induced signals from sensing and actuation data; and 2) a data-driven classifier, in the form of a support vector machine (SVM), that detects and isolates the fault type using estimates generated by the filter. We demonstrate the effectiveness of the scheme for two critical fault types for WH robots used in the semiconductor industry: broken-belt in the lower arm of the WH robot (an abrupt fault) and tilt in the robot arms (an incipient fault). We derive explicit models of the robot motion dynamics induced by these faults and test the diagnostics scheme in a realistic simulation-based case study. These case study results demonstrate that the proposed hybrid FDIE scheme achieves superior performance compared to purely data-driven methods.

可約的 · 區塊鏈 · 可行 · 結點 · Better ·

2024 年 12 月 11 日

Pioplat: A Scalable, Low-Cost Framework for Latency Reduction in Ethereum Blockchain

Ke Wang,Qiao Wang,Yue Li,Zhi Guan,Zhong Chen

from arxiv, 12 pages, 5 figures

As decentralized applications on permissionless blockchains are prevalent, more and more latency-sensitive usage scenarios emerged, where the lower the latency of sending and receiving messages, the better the chance of earning revenue. To reduce latency, we present Pioplat, a feasible, customizable, and low-cost latency reduction framework consisting of multiple relay nodes on different continents and at least one instrumented variant of a full node. The node selection strategy of Pioplat and the low-latency communication protocol offer an elastic way to reduce latency effectively. We demonstrate Pioplat's feasibility with an implementation running on five continents and show that Pioplat can significantly reduce the latency of receiving blocks/transactions and sending transactions, thus fulfilling the requirements of most latency-sensitive use cases. Furthermore, we provide the complete implementation of Pioplat to promote further research and allow people to apply the framework to more blockchain systems.

Analysis · tuning · 多峰值 · MoDELS · 數據集 ·

2024 年 12 月 11 日

M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis

Ao Li,Longwei Xu,Chen Ling,Jinghui Zhang,Pengwei Wang

Sentiment analysis and emotion recognition are crucial for applications such as human-computer interaction and depression detection. Traditional unimodal methods often fail to capture the complexity of emotional expressions due to conflicting signals from different modalities. Current Multimodal Large Language Models (MLLMs) also face challenges in detecting subtle facial expressions and addressing a wide range of emotion-related tasks. To tackle these issues, we propose M2SE, a Multistage Multitask Sentiment and Emotion Instruction Tuning Strategy for general-purpose MLLMs. It employs a combined approach to train models on tasks such as multimodal sentiment analysis, emotion recognition, facial expression recognition, emotion reason inference, and emotion cause-pair extraction. We also introduce the Emotion Multitask dataset (EMT), a custom dataset that supports these five tasks. Our model, Emotion Universe (EmoVerse), is built on a basic MLLM framework without modifications, yet it achieves substantial improvements across these tasks when trained with the M2SE strategy. Extensive experiments demonstrate that EmoVerse outperforms existing methods, achieving state-of-the-art results in sentiment and emotion tasks. These results highlight the effectiveness of M2SE in enhancing multimodal emotion perception. The dataset and code are available at //github.com/xiaoyaoxinyi/M2SE.