蜜桃少妇AV久久久久久久,欧美老妇与ZOZOZ0交,91大神精品一区二区在线

The ability to anticipate possible future human actions is essential for a wide range of applications, including autonomous driving and human-robot interaction. Consequently, numerous methods have been introduced for action anticipation in recent years, with deep learning-based approaches being particularly popular. In this work, we review the recent advances of action anticipation algorithms with a particular focus on daily-living scenarios. Additionally, we classify these methods according to their primary contributions and summarize them in tabular form, allowing readers to grasp the details at a glance. Furthermore, we delve into the common evaluation metrics and datasets used for action anticipation and provide future directions with systematical discussions.

相關內容

Learning

關注 12

束搜索 · 解碼 · 可約的 · Performer · 語言模型化 ·

2023 年 11 月 14 日

Improved Beam Search for Hallucination Mitigation in Abstractive Summarization

Arvind Krishna Sridhar,Erik Visser

from arxiv, 8 pages, 2 figures

Advancement in large pretrained language models has significantly improved their performance for conditional language generation tasks including summarization albeit with hallucinations. To reduce hallucinations, conventional methods proposed improving beam search or using a fact checker as a postprocessing step. In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation. We propose an NLI-assisted beam re-ranking mechanism by computing entailment probability scores between the input context and summarization model-generated beams during saliency-enhanced greedy decoding. Moreover, a diversity metric is introduced to compare its effectiveness against vanilla beam search. Our proposed algorithm significantly outperforms vanilla beam decoding on XSum and CNN/DM datasets.

ForCES · CCC · 平穩的 · 值域 · 傳感器 ·

2023 年 11 月 14 日

An Intraoperative Force Perception and Signal Decoupling Method on Capsulorhexis Forceps

Heng Zhang

from arxiv, 12pages, 17 figures

Force perception on medical instruments is critical for understanding the mechanism between surgical tools and tissues for feeding back quantized force information, which is essential for guidance and supervision in robotic autonomous surgery. Especially for continuous curvilinear capsulorhexis (CCC), it always lacks a force measuring method, providing a sensitive, accurate, and multi-dimensional measurement to track the intraoperative force. Furthermore, the decoupling matrix obtained from the calibration can decorrelate signals with acceptable accuracy, however, this calculating method is not a strong way for thoroughly decoupling under some sensitive measuring situations such as the CCC. In this paper, a three-dimensional force perception method on capsulorhexis forceps by installing Fiber Bragg Grating sensors (FBGs) on prongs and a signal decoupling method combined with FASTICA is first proposed to solve these problems. According to experimental results, the measuring range is up to 1 N (depending on the range of wavelength shifts of sensors) and the resolution on x, y, and z axial force is 0.5, 0.5, and 2 mN separately. To minimize the coupling effects among sensors on measuring multi-axial forces, by unitizing the particular parameter and scaling the corresponding vector in the mixing matrix and recovered signals from FastICA, the signals from sensors can be decorrelated and recovered with the errors on axial forces decreasing up to 50% least. The calibration and calculation can also be simplified with half the parameters involved in the calculation. Experiments on thin sheets and in vitro porcine eyes were performed, and it was found that the tearing forces were stable and the time sequence of tearing forceps was stationary or first-order difference stationary during roughly circular crack propagating.

離散化 · Performer · 假陽性 · Networking · anchor ·

2023 年 11 月 13 日

Boundary Discretization and Reliable Classification Network for Temporal Action Detection

Zhenying Fang

from arxiv, 12 pages, Source code: //github.com/zhenyingfang/BDRC-Net

Temporal action detection aims to recognize the action category and determine the starting and ending time of each action instance in untrimmed videos. The mixed methods have achieved remarkable performance by simply merging anchor-based and anchor-free approaches. However, there are still two crucial issues in the mixed framework: (1) Brute-force merging and handcrafted anchors design affect the performance and practical application of the mixed methods. (2) A large number of false positives in action category predictions further impact the detection performance. In this paper, we propose a novel Boundary Discretization and Reliable Classification Network (BDRC-Net) that addresses the above issues by introducing boundary discretization and reliable classification modules. Specifically, the boundary discretization module (BDM) elegantly merges anchor-based and anchor-free approaches in the form of boundary discretization, avoiding the handcrafted anchors design required by traditional mixed methods. Furthermore, the reliable classification module (RCM) predicts reliable action categories to reduce false positives in action category predictions. Extensive experiments conducted on different benchmarks demonstrate that our proposed method achieves favorable performance compared with the state-of-the-art. For example, BDRC-Net hits an average mAP of 68.6% on THUMOS'14, outperforming the previous best by 1.5%. The code will be released at //github.com/zhenyingfang/BDRC-Net.

圖形處理器 · Networking · Neural Networks · 圖 · INFORMS ·

2023 年 11 月 13 日

Collaborative Goal Tracking of Multiple Mobile Robots Based on Geometric Graph Neural Network

Qingquan Lin,Weining Lu

Multi-robot systems are widely used in spatially distributed tasks, and their collaborative path planning is of great significance for working efficiency. Currently, different multi-robot collaborative path planning methods have been proposed, but how to process the sensory information of neighboring robots at different locations from a local perception perspective in real environment to make better decisions is still a major difficulty. To address this problem, this paper proposes a multi-robot collaborative path planning method based on geometric graph neural network (GeoGNN). GeoGNN introduces the relative position information of neighboring robots into each interaction layer of the graph neural network to better integrate neighbor sensing information. An expert data generation method is designed for the robot to advance in a single step, by which expert data are generated in ROS to train the network. Experimental results show that the accuracy of the proposed method is improved by about 5% compared to the model based only on CNN on the expert data set. In ROS simulation environment path planning test, the success rate is improved by about 4% compared to CNN and flowtime increase is reduced about 8%, which outperforms other graph neural network models.

視頻分類 · Automator · Networking · Neural Networks · Neck ·

2023 年 11 月 10 日

Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition

Takuro Fujii,Hayato Nakagawa,Teppei Takeshima,Yasushi Yumura,Tomoki Hamagami

from arxiv, Accepted at Winter Conference on Applications of Computer Vision (WACV) 2024

Infertility is a global health problem, and an increasing number of couples are seeking medical assistance to achieve reproduction, at least half of which are caused by men. The success rate of assisted reproductive technologies depends on sperm assessment, in which experts determine whether sperm can be used for reproduction based on morphology and motility of sperm. Previous sperm assessment studies with deep learning have used datasets comprising images that include only sperm heads, which cannot consider motility and other morphologies of sperm. Furthermore, the labels of the dataset are one-hot, which provides insufficient support for experts, because assessment results are inconsistent between experts, and they have no absolute answer. Therefore, we constructed the video dataset for sperm assessment whose videos include sperm head as well as neck and tail, and its labels were annotated with soft-label. Furthermore, we proposed the sperm assessment framework and the neural network, RoSTFine, for sperm video recognition. Experimental results showed that RoSTFine could improve the sperm assessment performances compared to existing video recognition models and focus strongly on important sperm parts (i.e., head and neck).

INTERACT · Agent · 語言模型化 · 回合 · Next ·

2023 年 8 月 6 日

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park,Joseph C. O'Brien,Carrie J. Cai,Meredith Ringel Morris,Percy Liang,Michael S. Bernstein

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

數據增強 · Taxonomy · 文本分類 · Machine Learning · 訓練數據 ·

2021 年 7 月 7 日

A Survey on Data Augmentation for Text Classification

Markus Bayer,Marc-André Kaufhold,Christian Reuter

from arxiv, 35 pages, 6 figures, 8 tables

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).

MoDELS · 圖卷積神經網絡/圖卷積網絡 · 圖 · 圖卷積 · Networking ·

2020 年 12 月 14 日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang,Di Hu,Xingjian Li,Dejing Dou

from arxiv, Accepted by the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at //github.com/redwang/DTGRM.

離散化 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2019 年 3 月 28 日

Learning Discrete Structures for Graph Neural Networks

Luca Franceschi,Mathias Niepert,Massimiliano Pontil,Xiao He

from arxiv, 18 pages

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

塑造 · 可辨認的 · Better · 目標檢測 · state-of-the-art ·

2018 年 1 月 10 日

From Superpixel to Human Shape Modelling for Carried Object Detection

Farnoosh Ghadiri,Robert Bergevin,Guillaume-Alexandre Bilodeau

Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.