99热日韩这里只有国产中文精品,日韩一区二区综合精品

Accurate 3D tracking of hand and fingers movements poses significant challenges in computer vision. The potential applications span across multiple domains, including human-computer interaction, virtual reality, industry, and medicine. While gesture recognition has achieved remarkable accuracy, quantifying fine movements remains a hurdle, particularly in clinical applications where the assessment of hand dysfunctions and rehabilitation training outcomes necessitate precise measurements. Several novel and lightweight frameworks based on Deep Learning have emerged to address this issue; however, their performance in accurately and reliably measuring fingers movements requires validation against well-established gold standard systems. In this paper, the aim is to validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D, that exploits the depth estimation of an RGB-Depth camera to achieve more accurate tracking of 3D movements. Three dynamic exercises commonly administered by clinicians to assess hand dysfunctions, namely Hand Opening-Closing, Single Finger Tapping and Multiple Finger Tapping are considered. Results demonstrate high temporal and spectral consistency of both frameworks with the gold standard. However, the enhanced GMH-D framework exhibits superior accuracy in spatial measurements compared to the baseline GMH, for both slow and fast movements. Overall, our study contributes to the advancement of hand tracking technology, the establishment of a validation procedure as a good-practice to prove efficacy of deep-learning-based hand-tracking, and proves the effectiveness of GMH-D as a reliable framework for assessing 3D hand movements in clinical applications.

相關內容

TAP

關注 812

ACM應用感知TAP(ACM Transactions on Applied Perception)旨在通過發表有助于統一這些領域研究的高質量論文來增強計算機科學與心理學/感知之間的協同作用。該期刊發表跨學科研究，在跨計算機科學和感知心理學的任何主題領域都具有重大而持久的價值。所有論文都必須包含感知和計算機科學兩個部分。主題包括但不限于：視覺感知：計算機圖形學，科學/數據/信息可視化，數字成像，計算機視覺，立體和3D顯示技術。聽覺感知：聽覺顯示和界面，聽覺聽覺編碼，空間聲音，語音合成和識別。觸覺：觸覺渲染，觸覺輸入和感知。感覺運動知覺：手勢輸入，身體運動輸入。感官感知：感官整合，多模式渲染和交互。官網地址：

簇 · 向量化 · 端到端 · Integration · 通道 ·

2023 年 9 月 22 日

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Naohiro Tawara,Marc Delcroix,Atsushi Ando,Atsunori Ogawa

from arxiv, 5 pages, 5 figures, Submitted to ICASSP 2024

This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies end-to-end neural diarization with vector clustering (EEND-VC) to each channel separately. It integrates the diarization result obtained from each channel using diarization output voting error reduction plus overlap (DOVER-LAP). To harness the knowledge from the target domain and results integrated across all channels, we apply self-supervised adaptation for each session by retraining the EEND-VC with pseudo-labels derived from DOVER-LAP. The proposed system was incorporated into NTT's submission for the distant automatic speech recognition task in the CHiME-7 challenge. Our system achieved 65 % and 62 % relative improvements on development and eval sets compared to the organizer-provided VC-based baseline diarization system, securing third place in diarization performance.

Microsoft Surface · Networking · 變換 · CNN · 可約的 ·

2023 年 9 月 22 日

CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

Xiaoheng Jiang,Kaiyi Guo,Yang Lu,Feng Yan,Hao Liu,Jiale Cao,Mingliang Xu,Dacheng Tao

Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. This can maintain the merit of CNN capturing detailed features and that of transformer depressing noises in the background, which facilitates accurate defect detection. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects, so as to further reduce the impact of the redundant background. Extensive experiments conducted on the surface defect datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer achieves state-of-the-art performance in defect detection.

語言模型化 · MoDELS · 正則化項 · NLP · 數據集 ·

2023 年 9 月 21 日

PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan pre-trained language models

Zhou Mingjun,Daiqing Zhuoma,Qun Nuo,Nyima Tashi

In this era of large language models (LLMs), the traditional training of models has become increasingly unimaginable for regular users and institutions. The exploration of efficient fine-tuning for high-resource languages on these models is an undeniable trend that is gradually gaining popularity. However, there has been very little exploration for various low-resource languages, such as Tibetan. Research in Tibetan NLP is inherently scarce and limited. While there is currently no existing large language model for Tibetan due to its low-resource nature, that day will undoubtedly arrive. Therefore, research on efficient fine-tuning for low-resource language models like Tibetan is highly necessary. Our research can serve as a reference to fill this crucial gap. Efficient fine-tuning strategies for pre-trained language models (PLMs) in Tibetan have seen minimal exploration. We conducted three types of efficient fine-tuning experiments on the publicly available TNCC-title dataset: "prompt-tuning," "Adapter lightweight fine-tuning," and "prompt-tuning + Adapter fine-tuning." The experimental results demonstrate significant improvements using these methods, providing valuable insights for advancing Tibetan language applications in the context of pre-trained models.

平滑 · 傳感器 · 講稿 · Performer · 離散化 ·

2023 年 9 月 21 日

Dissipative WENO stabilization of high-order discontinuous Galerkin methods for hyperbolic problems

Joshua Vedral

We present a new approach to stabilizing high-order Runge-Kutta discontinuous Galerkin (RKDG) schemes using weighted essentially non-oscillatory (WENO) reconstructions in the context of hyperbolic conservation laws. In contrast to RKDG schemes that overwrite finite element solutions with WENO reconstructions, our approach employs the reconstruction-based smoothness sensor presented by Kuzmin and Vedral (J. Comput. Phys. 487:112153, 2023) to control the amount of added numerical dissipation. Incorporating a dissipation-based WENO stabilization term into a discontinuous Galerkin (DG) discretization, the proposed methodology achieves high-order accuracy while effectively capturing discontinuities in the solution. As such, our approach offers an attractive alternative to WENO-based slope limiters for DG schemes. The reconstruction procedure that we use performs Hermite interpolation on stencils composed of a mesh cell and its neighboring cells. The amount of numerical dissipation is determined by the relative differences between the partial derivatives of reconstructed candidate polynomials and those of the underlying finite element approximation. The employed smoothness sensor takes all derivatives into account to properly assess the local smoothness of a high-order DG solution. Numerical experiments demonstrate the ability of our scheme to capture discontinuities sharply. Optimal convergence rates are obtained for all polynomial degrees.

MoDELS · ForCES · Processing（編程語言） · 估計/估計量 · 馬爾可夫鏈蒙特卡羅 ·

2023 年 9 月 21 日

Stochastic stiffness identification and response estimation of Timoshenko beams via physics-informed Gaussian processes

Gledson Rodrigo Tondo,Sebastian Rau,Igor Kavrakov,Guido Morgenthal

Machine learning models trained with structural health monitoring data have become a powerful tool for system identification. This paper presents a physics-informed Gaussian process (GP) model for Timoshenko beam elements. The model is constructed as a multi-output GP with covariance and cross-covariance kernels analytically derived based on the differential equations for deflections, rotations, strains, bending moments, shear forces and applied loads. Stiffness identification is performed in a Bayesian format by maximising a posterior model through a Markov chain Monte Carlo method, yielding a stochastic model for the structural parameters. The optimised GP model is further employed for probabilistic predictions of unobserved responses. Additionally, an entropy-based method for physics-informed sensor placement optimisation is presented, exploiting heterogeneous sensor position information and structural boundary conditions built into the GP model. Results demonstrate that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. Probabilistic predictions of structural responses and internal forces are in closer agreement with measured data. We validate our model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.

ML · MoDELS · 可約的 · state-of-the-art · Better ·

2023 年 9 月 21 日

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Seah Kim,Hyoukjun Kwon,Jinook Song,Jihyuck Jo,Yu-Hsin Chen,Liangzhen Lai,Vikas Chandra

from arxiv, 14 pages

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

優化器 · Performer · 損失 · Learning · Prompt ·

2023 年 9 月 20 日

Gradient constrained sharpness-aware prompt learning for vision-language models

Liangchen Liu,Nannan Wang,Dawei Zhou,Xinbo Gao,Decheng Liu,Xi Yang,Tongliang Liu

from arxiv, 19 pages 11 figures

This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comparing with existing generalizable methods that neglect the seen classes degradation, the setting of this problem is more strict and fits more closely with practical applications. To solve this problem, we start from the optimization perspective, and leverage the relationship between loss landscape geometry and model generalization ability. By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness, while each of them is indispensable. However, we find the optimizing gradient of existing methods cannot maintain high relevance to both loss value and loss sharpness during optimization, which severely affects their trade-off performance. To this end, we propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp), to dynamically constrain the optimizing gradient, thus achieving above two-fold optimization objective simultaneously. Extensive experiments verify the effectiveness of GCSCoOp in the trade-off problem.

機器人 · 情景 · 無限 · 離散化 · Continuity ·

2023 年 9 月 20 日

Space and move-optimal Arbitrary Pattern Formation on infinite rectangular grid by Oblivious Robot Swarm

Avisek Sharma,Satakshi Ghosh,Pritam Goswami,Buddhadeb Sau

Arbitrary Pattern Formation (APF) is a fundamental coordination problem in swarm robotics. It requires a set of autonomous robots (mobile computing units) to form any arbitrary pattern (given as input) starting from any initial pattern. The APF problem is well-studied in both continuous and discrete settings. This work concerns the discrete version of the problem. A set of robots is placed on the nodes of an infinite rectangular grid graph embedded in a euclidean plane. The movements of the robots are restricted to one of the four neighboring grid nodes from its current position. The robots are autonomous, anonymous, identical, and homogeneous, and operate Look-Compute-Move cycles. Here we have considered the classical $\mathcal{OBLOT}$ robot model, i.e., the robots have no persistent memory and no explicit means of communication. The robots have full unobstructed visibility. This work proposes an algorithm that solves the APF problem in a fully asynchronous scheduler under this setting assuming the initial configuration is asymmetric. The considered performance measures of the algorithm are space and number of moves required for the robots. The algorithm is asymptotically move-optimal. A definition of the space-complexity is presented here. We observe an obvious lower bound $\mathcal{D}$ of the space complexity and show that the proposed algorithm has the space complexity $\mathcal{D}+4$. On comparing with previous related works, we show that this is the first proposed algorithm considering $\mathcal{OBLOT}$ robot model that is asymptotically move-optimal and has the least space complexity which is almost optimal.

Analysis · CASE · 優化器 · ReQuEST · 相同 ·

2023 年 9 月 20 日

Vehicle-to-Grid and ancillary services:a profitability analysis under uncertainty

Federico Bianchi,Alessandro Falsone,Riccardo Vignali

from arxiv, Accepted by IFAC for publication under a Creative Commons Licence CC-BY-NC-ND

The rapid and massive diffusion of electric vehicles poses new challenges to the electric system, which must be able to supply these new loads, but at the same time opens up new opportunities thanks to the possible provision of ancillary services. Indeed, in the so-called Vehicle-to-Grid (V2G) set-up, the charging power can be modulated throughout the day so that a fleet of vehicles can absorb an excess of power from the grid or provide extra power during a shortage.To this end, many works in the literature focus on the optimization of each vehicle daily charging profiles to offer the requested ancillary services while guaranteeing a charged battery for each vehicle at the end of the day. However, the size of the economic benefits related to the provision of ancillary services varies significantly with the modeling approaches, different assumptions, and considered scenarios. In this paper we propose a profitability analysis with reference to a recently proposed framework for V2G optimal operation in presence of uncertainty. We provide necessary and sufficient conditions for profitability in a simplified case and we show via simulation that they also hold for the general case.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.