青柠在线观看免费高清1,99视频在线播放喷射,黄色片视频免费观看国产,国产精品国产亚洲精品看不卡15

We describe a comprehensive methodology for developing user-voice personalized automatic speech recognition (ASR) models by effectively training models on mobile phones, allowing user data and models to be stored and used locally. To achieve this, we propose a resource-aware sub-model-based training approach that considers the RAM, and battery capabilities of mobile phones. By considering the evaluation metric and resource constraints of the mobile phones, we are able to perform efficient training and halt the process accordingly. To simulate real users, we use speakers with various accents. The entire on-device training and evaluation framework was then tested on various mobile phones across brands. We show that fine-tuning the models and selecting the right hyperparameter values is a trade-off between the lowest achievable performance metric, on-device training time, and memory consumption. Overall, our methodology offers a comprehensive solution for developing personalized ASR models while leveraging the capabilities of mobile phones, and balancing the need for accuracy with resource constraints.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 樣本 · 泛化理論 · 穩健性 · 多樣性 ·

2024 年 1 月 2 日

GBSS:a global building semantic segmentation dataset for large-scale remote sensing building extraction

Yuping Hu,Xin Huang,Jiayi Li,Zhen Zhang

from arxiv, 5 pages,6 figures

Semantic segmentation techniques for extracting building footprints from high-resolution remote sensing images have been widely used in many fields such as urban planning. However, large-scale building extraction demands higher diversity in training samples. In this paper, we construct a Global Building Semantic Segmentation (GBSS) dataset (The dataset will be released), which comprises 116.9k pairs of samples (about 742k buildings) from six continents. There are significant variations of building samples in terms of size and style, so the dataset can be a more challenging benchmark for evaluating the generalization and robustness of building semantic segmentation models. We validated through quantitative and qualitative comparisons between different datasets, and further confirmed the potential application in the field of transfer learning by conducting experiments on subsets.

相關系數 · MoDELS · 得分 · 均方誤差 · Performer ·

2024 年 1 月 2 日

HAAQI-Net: A non-intrusive neural music quality assessment model for hearing aids

Dyah A. M. G. Wisnu,Epri Pratiwi,Stefano Rini,Ryandhimas E. Zezario,Hsin-Min Wang,Yu Tsao

This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users. In contrast to traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It takes an assessed music sample and a hearing loss pattern as input, generating a predicted HAAQI score. The model employs the pre-trained Bidirectional Encoder representation from Audio Transformers (BEATs) for acoustic feature extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a Longitudinal Concordance Correlation (LCC) of 0.9257, Spearman's Rank Correlation Coefficient (SRCC) of 0.9394, and Mean Squared Error (MSE) of 0.0080. Notably, this high performance comes with a substantial reduction in inference time: from 62.52 seconds (by HAAQI) to 2.71 seconds (by HAAQI-Net), serving as an efficient music quality assessment model for hearing aid users.

Networking · Neural Networks · 推斷 · 邊 · FPGA ·

2024 年 1 月 2 日

Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge

Alessio Carpegna,Alessandro Savino,Stefano Di Carlo

Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ presents a configurable multi-layer hardware SNN, a library of highly efficient neuron architectures, and a design framework, enabling the development of complex neural network accelerators with few lines of Python code. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD). On the MNIST, it demonstrates competitive performance compared to state-of-the-art SNN accelerators. It outperforms them in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMs (BRAMs), which makes it fit in very small FPGA, and power consumption, draining only 180mW for a complete inference on an input image. The latency is comparable to the ones observed in the state-of-the-art, with 780us/img. To the authors' knowledge, Spiker+ is the first SNN accelerator tested on the SHD. In this case, the accelerator requires 18,268 logic cells and 51 BRAM, with an overall power consumption of 430mW and a latency of 54 us for a complete inference on input data. This underscores the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution to deploy configurable and tunable SNN architectures in resource and power-constrained edge applications.

LVM · Prompt · SimPLe · 優化器 · 原點 ·

2024 年 1 月 2 日

SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM

Weijin Cheng,Jianzhi Liu,Jiawen Deng,Fuji Ren

from arxiv, 10 pages, 8 figures

Recently, text-to-image (T2I) synthesis has undergone significant advancements, particularly with the emergence of Large Language Models (LLM) and their enhancement in Large Vision Models (LVM), greatly enhancing the instruction-following capabilities of traditional T2I models. Nevertheless, previous methods focus on improving generation quality but introduce unsafe factors into prompts. We explore that appending specific camera descriptions to prompts can enhance safety performance. Consequently, we propose a simple and safe prompt engineering method (SSP) to improve image generation quality by providing optimal camera descriptions. Specifically, we create a dataset from multi-datasets as original prompts. To select the optimal camera, we design an optimal camera matching approach and implement a classifier for original prompts capable of automatically matching. Appending camera descriptions to original prompts generates optimized prompts for further LVM image generation. Experiments demonstrate that SSP improves semantic consistency by an average of 16% compared to others and safety metrics by 48.9%.

優化器 · 可約的 · 設計 · 分解的 · 應用統計 ·

2023 年 12 月 31 日

Radiation design in computed tomography via convex optimization

Anatoli Juditsky,Arkadi Nemirovski,Michael Zibulevsky

from arxiv, 11 pages, 4 figures

Proper X-ray radiation design (via dynamic fluence field modulation, FFM) allows to reduce effective radiation dose in computed tomography without compromising image quality. It takes into account patient anatomy, radiation sensitivity of different organs and tissues, and location of regions of interest. We account all these factors within a general convex optimization framework.

INFORMS · Networking · MoDELS · 目標檢測 · FAST ·

2023 年 12 月 31 日

Motion-aware Memory Network for Fast Video Salient Object Detection

Xing Zhao,Haoran Liang,Peipei Li,Guodao Sun,Dongdong Zhao,Ronghua Liang,Xiaofei He

from arxiv, 13 pages, 10 figures

Previous methods based on 3DCNN, convLSTM, or optical flow have achieved great success in video salient object detection (VSOD). However, they still suffer from high computational costs or poor quality of the generated saliency maps. To solve these problems, we design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. Furthermore, previous methods only considered single-frame prediction without temporal association. As a result, the model may not focus on the temporal information sufficiently. Thus, we initially introduce object motion prediction between inter-frame into VSOD. Our model follows standard encoder--decoder architecture. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. This approach is more efficient than the optical flow-based methods. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The semantic information of the high-level features is used to fuse the object details in the low-level features, and then the spatiotemporal features are obtained step by step to reconstruct the saliency maps. Moreover, inspired by the boundary supervision commonly used in image salient object detection (ISOD), we design a motion-aware loss for predicting object boundary motion and simultaneously perform multitask learning for VSOD and object motion prediction, which can further facilitate the model to extract spatiotemporal features accurately and maintain the object integrity. Extensive experiments on several datasets demonstrated the effectiveness of our method and can achieve state-of-the-art metrics on some datasets. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.

INTERACT · VR · 設計 · 可理解性 · prototype ·

2023 年 12 月 29 日

VR interaction for efficient virtual manufacturing: mini map for multi-user VR navigation platform

Huizhong Cao,Henrik S?derlund,Mélanie Despeisse,Francisco Garcia Rivera,Bj?rn Johansson

Over the past decade, the value and potential of VR applications in manufacturing have gained significant attention in accordance with the rise of Industry 4.0 and beyond. Its efficacy in layout planning, virtual design reviews, and operator training has been well-established in previous studies. However, many functional requirements and interaction parameters of VR for manufacturing remain ambiguously defined. One area awaiting exploration is spatial recognition and learning, crucial for understanding navigation within the virtual manufacturing system and processing spatial data. This is particularly vital in multi-user VR applications where participants' spatial awareness in the virtual realm significantly influences the efficiency of meetings and design reviews. This paper investigates the interaction parameters of multi-user VR, focusing on interactive positioning maps for virtual factory layout planning and exploring the user interaction design of digital maps as navigation aid. A literature study was conducted in order to establish frequently used technics and interactive maps from the VR gaming industry. Multiple demonstrators of different interactive maps provide a comprehensive A/B test which were implemented into a VR multi-user platform using the Unity game engine. Five different prototypes of interactive maps were tested, evaluated and graded by the 20 participants and 40 validated data streams collected. The most efficient interaction design of interactive maps is thus analyzed and discussed in the study.

特征選擇 · 長短期記憶網絡 · 集成學習 · Learning · 集成 ·

2023 年 12 月 29 日

Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting

Raquel Espinosa,Fernando Jiménez,José Palma

Time series forecasting plays a crucial role in diverse fields, necessitating the development of robust models that can effectively handle complex temporal patterns. In this article, we present a novel feature selection method embedded in Long Short-Term Memory networks, leveraging a multi-objective evolutionary algorithm. Our approach optimizes the weights and biases of the LSTM in a partitioned manner, with each objective function of the evolutionary algorithm targeting the root mean square error in a specific data partition. The set of non-dominated forecast models identified by the algorithm is then utilized to construct a meta-model through stacking-based ensemble learning. Furthermore, our proposed method provides an avenue for attribute importance determination, as the frequency of selection for each attribute in the set of non-dominated forecasting models reflects their significance. This attribute importance insight adds an interpretable dimension to the forecasting process. Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the generalization ability of conventional LSTMs, effectively reducing overfitting. Comparative analyses against state-of-the-art CancelOut and EAR-FS methods highlight the superior performance of our approach.

MoDELS · 語言模型化 · 可理解性 · anchor · Vision ·

2023 年 12 月 28 日

3VL: using Trees to teach Vision & Language models compositional concepts

Nir Yellinek,Leonid Karlinsky,Raja Giryes

Vision-Language models (VLMs) have proved effective at aligning image and text representations, producing superior zero-shot results when transferred to many downstream tasks. However, these representations suffer some key shortcomings in Compositional Language Concepts (CLC) understanding such as recognizing objects' attributes, states, and relations between different objects. Moreover, VLMs typically have poor interpretability, making it challenging to debug and mitigate compositional-understanding failures. In this work, we introduce the Tree-augmented Vision-Language (3VL) model architecture and training technique accompanied by our proposed Anchor inference method and Differential Relevance (DiRe) interpretability tool. By expanding the text of an arbitrary image-text pair into a hierarchical tree structure using language analysis tools, 3VL allows inducing this structure into the visual representation learned by the model, enhancing its interpretability and compositional reasoning. Additionally, we show how Anchor, a simple technique for text unification, can be employed to filter nuisance factors while increasing CLC understanding performance, e.g., on the fundamental VL-Checklist benchmark. We also exhibit how DiRe, which performs a differential comparison between VLM relevancy maps, enables us to generate compelling visualizations of the reasons for a model's success or failure.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.