一级片免费电影看黄片免费-国产三级精品一区在线观看

Shaojin Ding,Qiu David,David Rim,Yanzhang He,Oleg Rybakov,Bo Li,Rohit Prabhavalkar,Weiran Wang,Tara N. Sainath,Shivani Agrawal,Zhonglin Han,Jian Li,Amir Yazdanbakhsh

from arxiv, Accepted by ICASSP 2024. Preprint

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, with a low-bit quantization and N:M structured sparsity aware paradigm on the model weights, reducing the model complexity from parameter precision and matrix topology perspectives. We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method. A series of ablation studies validate the effectiveness of up to int4 quantization and 2:4 sparsity. However, a single compression technique fails to recover the performance well under extreme setups including int2 quantization and 1:4 sparsity. By contrast, our proposed method can compress the model to have 9.4% of the size, at the cost of only 7.3% relative word error rate (WER) regressions. We also provided in-depth analyses on the results and discussions on the limitations and potential solutions, which would be valuable for future studies.

相關內容

特化

關注 0

MoDELS · Agent · INTERACT · FM · 知識 (knowledge) ·

2024 年 2 月 2 日

Foundation Model Sherpas: Guiding Foundation Models through Knowledge and Reasoning

Debarun Bhattacharjya,Junkyu Lee,Don Joven Agravante,Balaji Ganesan,Radu Marinescu

from arxiv, 9 pages

Foundation models (FMs) such as large language models have revolutionized the field of AI by showing remarkable performance in various tasks. However, they exhibit numerous limitations that prevent their broader adoption in many real-world systems, which often require a higher bar for trustworthiness and usability. Since FMs are trained using loss functions aimed at reconstructing the training corpus in a self-supervised manner, there is no guarantee that the model's output aligns with users' preferences for a specific task at hand. In this survey paper, we propose a conceptual framework that encapsulates different modes by which agents could interact with FMs and guide them suitably for a set of tasks, particularly through knowledge augmentation and reasoning. Our framework elucidates agent role categories such as updating the underlying FM, assisting with prompting the FM, and evaluating the FM output. We also categorize several state-of-the-art approaches into agent interaction protocols, highlighting the nature and extent of involvement of the various agent roles. The proposed framework provides guidance for future directions to further realize the power of FMs in practical AI systems.

SCAN · INFORMS · 模型評估 · 圖 · Learning ·

2024 年 2 月 2 日

Learning Which Side to Scan: Multi-View Informed Active Perception with Side Scan Sonar for Autonomous Underwater Vehicles

Advaith V. Sethuraman,Philip Baldoni,Katherine A. Skinner,James McMahon

Autonomous underwater vehicles often perform surveys that capture multiple views of targets in order to provide more information for human operators or automatic target recognition algorithms. In this work, we address the problem of choosing the most informative views that minimize survey time while maximizing classifier accuracy. We introduce a novel active perception framework for multi-view adaptive surveying and reacquisition using side scan sonar imagery. Our framework addresses this challenge by using a graph formulation for the adaptive survey task. We then use Graph Neural Networks (GNNs) to both classify acquired sonar views and to choose the next best view based on the collected data. We evaluate our method using simulated surveys in a high-fidelity side scan sonar simulator. Our results demonstrate that our approach is able to surpass the state-of-the-art in classification accuracy and survey efficiency. This framework is a promising approach for more efficient autonomous missions involving side scan sonar, such as underwater exploration, marine archaeology, and environmental monitoring.

TransAct · state-of-the-art · Atom（文本編輯器） · FAST · 結點 ·

2024 年 2 月 1 日

Dumbo-NG: Fast Asynchronous BFT Consensus with Throughput-Oblivious Latency

Yingzi Gao,Yuan Lu,Zhenliang Lu,Qiang Tang,Jing Xu,Zhenfeng Zhang

Despite recent progresses of practical asynchronous Byzantine fault tolerant (BFT) consensus, the state-of-the-art designs still suffer from suboptimal performance. Particularly, to obtain maximum throughput, most existing protocols with guaranteed linear amortized communication complexity require each participating node to broadcast a huge batch of transactions, which dramatically sacrifices latency. Worse still, the f slowest nodes' broadcasts might never be agreed to output and thus can be censored (where f is the number of faults). Implementable mitigation to the threat either uses computationally costly threshold encryption or incurs communication blow-up, thus causing further efficiency issues. We present Dumbo-NG, a novel asynchronous BFT consensus (atomic broadcast) to solve the remaining practical issues. Its technical core is a non-trivial direct reduction from asynchronous atomic broadcast to multi-valued validated Byzantine agreement (MVBA) with quality property. Most interestingly, the new protocol structure empowers completely concurrent execution of transaction dissemination and asynchronous agreement. This brings about two benefits: (i) the throughput-latency tension is resolved to approach peak throughput with minimal increase in latency; (ii) the transactions broadcasted by any honest node can be agreed to output, thus conquering the censorship threat with no extra cost. We implement Dumbo-NG and compare it to the state-of-the-art asynchronous BFT with guaranteed censorship resilience including Dumbo (CCS'20) and Speeding-Dumbo (NDSS'22). We also apply the techniques from Speeding-Dumbo to DispersedLedger (NSDI'22) and obtain an improved variant of DispersedLedger called sDumbo-DL for comprehensive comparison. Extensive experiments reveal: Dumbo-NG realizes better peak throughput performance and its latency can almost remain stable when throughput grows.

極小點 · 獎勵函數 · 泛函 · Automator · 共軛 ·

2024 年 2 月 1 日

RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway

Biswajit Sadhu,Trijit Sadhu,S. Anand

from arxiv, 12 pages, 7 main figures, code link (GitHub)

Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection. Here, in this article, we address these intriguing issues and introduce a deep Q-learning based architecture (RadDQN) that operates on a radiation-aware reward function to provide time-efficient minimum radiation-exposure pathway in a radiation zone. We propose a set of unique exploration strategies that fine-tune the extent of exploration and exploitation based on the state-wise variation in radiation exposure during training. Further, we benchmark the predicted path with grid-based deterministic method. We demonstrate that the formulated reward function in conjugation with adequate exploration strategy is effective in handling several scenarios with drastically different radiation field distributions. When compared to vanilla DQN, our model achieves a superior convergence rate and higher training stability.

估計/估計量 · Legged Robot · 狀態估計 · 卡爾曼濾波 · 門控 ·

2024 年 1 月 31 日

OptiState: State Estimation of Legged Robots using Gated Networks with Transformer-based Vision and Kalman Filtering

Alexander Schperberg,Yusuke Tanaka,Saviz Mowlavi,Feng Xu,Bharathan Balaji,Dennis Hong

from arxiv, Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA), May 13-17, in Yokohama, Japan. 7 pages, 5 figures, 1 table

State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: //github.com/AlexS28/OptiState

MoDELS · Attention · 掩碼 · 可約的 · 損失 ·

2024 年 1 月 31 日

Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Zhipeng Bao,Yijun Li,Krishna Kumar Singh,Yu-Xiong Wang,Martial Hebert

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. While previous research efforts have individually tackled these issues, we assert that a holistic approach is paramount. Thus, we propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability. Comprehensive evaluations demonstrate the superior performance of our model in terms of image realism, text-image alignment, and adaptability, notably outperforming prominent baselines. Ultimately, this research paves the way for T2I diffusion models with enhanced compositional capacities and broader applicability.

Projection · 分解的 · 可理解性 · INTERACT · TEAM ·

2024 年 1 月 31 日

Challenges in Understanding the Relationship between Teamwork Quality and Project Success in Large-Scale Agile Projects

Torgeir Dings?yr,Phillip Schneider,Gunnar Rye Bergersen,Yngve Lindsj?rn

A number of methods for large-scale agile development have recently been suggested. Much of the advice in agile methods focuses on teamwork. Prior research has established that teamwork quality influences project success both for traditional software development teams and agile teams. Further, prior studies have also suggested that teamwork quality may play out differently in large projects compared to small. We investigated the relationship between teamwork quality and project success with a survey of 196 project participants across 34 teams in four projects, replicating a previous study on single teams. The new data do not fit the previously established theoretical model, which raises several concerns. The observed effect of teamwork quality on project success operates differently across projects. We discuss possible reasons, which include disagreements on what characterises success in large-scale agile development, "concept drift" of teamwork quality factors, the possibility that interteam factors might have more influence on project success than intrateam factors, and finally, that our study design does not capture all relevant levels and functions. We conclude with a call for more studies on the quality and frequency of interaction between teams in addition to internal team factors to further advance theory and practice within large-scale agile software development.

數據集 · GROUP · Elevate · 評論員 · 生物特征識別 ·

2022 年 11 月 3 日

Expanding Accurate Person Recognition to New Altitudes and Ranges: The BRIAR Dataset

David Cornett III,Joel Brogan,Nell Barber,Deniz Aykac,Seth Baird,Nick Burchfield,Carl Dukes,Andrew Duncan,Regina Ferrell,Jim Goddard,Gavin Jager,Matt Larson,Bart Murphy,Christi Johnson,Ian Shelley,Nisha Srinivas,Brandon Stockwell,Leanne Thompson,Matt Yohe,Robert Zhang,Scott Dolvin,Hector J. Santos-Villalobos,David S. Bolme

Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.

深度學習 · 估計/估計量 · 學成 · Hinton · ACM ·

2020 年 6 月 10 日

Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies

Yu Huang,Yue Chen

Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Almost at the same time, deep learning has made breakthrough by several pioneers, three of them (also called fathers of deep learning), Hinton, Bengio and LeCun, won ACM Turin Award in 2019. This is a survey of autonomous driving technologies with deep learning methods. We investigate the major fields of self-driving systems, such as perception, mapping and localization, prediction, planning and control, simulation, V2X and safety etc. Due to the limited space, we focus the analysis on several key areas, i.e. 2D and 3D object detection in perception, depth estimation from cameras, multiple sensor fusion on the data, feature and task level respectively, behavior modelling and prediction of vehicle driving and pedestrian trajectories.

Networking · Neural Networks · 縮放 · 方差 · 卷積神經網絡 ·

2018 年 4 月 2 日

SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

Xiaowei Hu,Xuemiao Xu,Yongjie Xiao,Hao Chen,Shengfeng He,Jing Qin,Pheng-Ann Heng

from arxiv, Submitted to IEEE Transactions On Intelligent Transportation Systems (T-ITS)

Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.