精品自在线观看影片天天看_日韩纯肉无遮挡一区二区视频_一区二区三区四区男女做_国产熟女白浆精品视频2_一级做A爰片久久毛片潮喷老女人_亚洲精品一区二区三区播放_日日模日日碰夜夜澡

End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks. However, the optimization of E2E models lacks an intuitive method for handling decoding shifts, especially in scenarios with a large number of domain-specific rare words that hold specific important meanings. Furthermore, the absence of knowledge-intensive speech datasets in academia has been a significant limiting factor, and the commonly used speech corpora exhibit significant disparities with realistic conversation. To address these challenges, we present Medical Interview (MED-IT), a multi-turn consultation speech dataset that contains a substantial number of knowledge-intensive named entities. We also explore methods to enhance the recognition performance of rare words for E2E models. We propose a novel approach, post-decoder biasing, which constructs a transform probability matrix based on the distribution of training transcriptions. This guides the model to prioritize recognizing words in the biasing list. In our experiments, for subsets of rare words appearing in the training speech between 10 and 20 times, and between 1 and 5 times, the proposed method achieves a relative improvement of 9.3% and 5.1%, respectively.

相關內容

E2E

關注 0

圖像還原 · MoDELS · IR · 控制器 · 樣本 ·

2024 年 4 月 15 日

Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models

Ziwei Luo,Fredrik K. Gustafsson,Zheng Zhao,Jens Sj?lund,Thomas B. Sch?n

from arxiv, CVPRW 2024; Code: //github.com/Algolzw/daclip-uir

Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.

Automator · 設計模式 · 泛函 · 設計 · 可辨認的 ·

2024 年 4 月 15 日

Characterization and Mitigation of Insufficiencies in Automated Driving Systems

Yuting Fu,Jochen Seemann,Caspar Hanselaar,Tim Beurskens,Andrei Terechko,Emilia Silvas,Maurice Heemels

from arxiv, Published at the 27th International Technical Conference on the Enhanced Safety of Vehicles (ESV), Apr 2023, Yokohama, Japan. Original publication //www-esv.nhtsa.dot.gov/Proceedings/27/27ESV-000110.pdf

Automated Driving (AD) systems have the potential to increase safety, comfort and energy efficiency. Recently, major automotive companies have started testing and validating AD systems (ADS) on public roads. Nevertheless, the commercial deployment and wide adoption of ADS have been moderate, partially due to system functional insufficiencies (FI) that undermine passenger safety and lead to hazardous situations on the road. FIs are defined in ISO 21448 Safety Of The Intended Functionality (SOTIF). FIs are insufficiencies in sensors, actuators and algorithm implementations, including neural networks and probabilistic calculations. Examples of FIs in ADS include inaccurate ego-vehicle localization on the road, incorrect prediction of a cyclist maneuver, unreliable detection of a pedestrian, etc. The main goal of our study is to formulate a generic architectural design pattern, which is compatible with existing methods and ADS, to improve FI mitigation and enable faster commercial deployment of ADS. First, we studied the 2021 autonomous vehicles disengagement reports published by the California Department of Motor Vehicles (DMV). The data clearly show that disengagements are five times more often caused by FIs rather than by system faults. We then made a comprehensive list of insufficiencies and their characteristics by analyzing over 10 hours of publicly available road test videos. In particular, we identified insufficiency types in four major categories: world model, motion plan, traffic rule, and operational design domain. The insufficiency characterization helps making the SOTIF analyses of triggering conditions more systematic and comprehensive. Based on our FI characterization, simulation experiments and literature survey, we define a novel generic architectural design pattern Daruma to dynamically select the channel that is least likely to have a FI at the moment.

模型選擇 · MoDELS · CC · 混合時間 · 估計/估計量 ·

2024 年 4 月 13 日

On the Computational Complexity of Private High-dimensional Model Selection

Saptarshi Roy,Zehua Wang,Ambuj Tewari

from arxiv, 27 pages, 2 figures

We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private best subset selection method with strong utility properties by adopting the well-known exponential mechanism for selecting the best model. We propose an efficient Metropolis-Hastings algorithm and establish that it enjoys polynomial mixing time to its stationary distribution. Furthermore, we also establish approximate differential privacy for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments that show the strong utility of our algorithm.

機器人 · Extensibility · 講稿 · 基 · 優化器 ·

2024 年 4 月 12 日

Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot

Adarsh Salagame,Kruthika Gangaraju,Harin Kumar Nallaguntla,Eric Sihite,Gunar Schirner,Alireza Ramezani

Object manipulation has been extensively studied in the context of fixed base and mobile manipulators. However, the overactuated locomotion modality employed by snake robots allows for a unique blend of object manipulation through locomotion, referred to as loco-manipulation. The following work presents an optimization approach to solving the loco-manipulation problem based on non-impulsive implicit contact path planning for our snake robot COBRA. We present the mathematical framework and show high-fidelity simulation results and experiments to demonstrate the effectiveness of our approach.

data integrity · 多跳 · Integration · Performance · 無人機 ·

2024 年 4 月 11 日

Towards Secure and Reliable Heterogeneous Real-time Telemetry Communication in Autonomous UAV Swarms

Pavlo Mykytyn,Marcin Brzozowski,Zoya Dyka,Peter Langend?rfer

from arxiv, 4 pages, 2 figures, submitted and accepted to iCCC24

In the era of cutting-edge autonomous systems, Unmanned Aerial Vehicles (UAVs) are becoming an essential part of the solutions for numerous complex challenges. This paper evaluates UAV peer-to-peer telemetry communication, highlighting its security vulnerabilities and explores a transition to a het-erogeneous multi-hop mesh all-to-all communication architecture to increase inter-swarm connectivity and reliability. Additionally, we suggest a symmetric key agreement and data encryption mechanism implementation for inter - swarm communication, to ensure data integrity and confidentiality without compromising performance.

CLIP · Prompt · 圖像分割 · 級聯 · tuning ·

2024 年 4 月 9 日

Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation

Sidra Aleem,Fangyijie Wang,Mayug Maniparambil,Eric Arazo,Julia Dietlmeier,Kathleen Curran,Noel E. O'Connor,Suzanne Little

The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts will be available online.

Prompt · Weight · 原點 · MoDELS · 優化器 ·

2024 年 4 月 5 日

Dynamic Prompt Optimizing for Text-to-Image Generation

Wenyi Mo,Tianyu Zhang,Yalong Bai,Bing Su,Ji-Rong Wen,Qing Yang

from arxiv, Accepted to CVPR 2024

Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts. Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images. However, the success of fine-control prompts depends on the accuracy of the text prompts and the careful selection of weights and time steps, which requires significant manual intervention. To address this, we introduce the \textbf{P}rompt \textbf{A}uto-\textbf{E}diting (PAE) method. Besides refining the original prompts for image generation, we further employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts. The reward function during training encourages the model to consider aesthetic score, semantic consistency, and user preferences. Experimental results demonstrate that our proposed method effectively improves the original prompts, generating visually more appealing images while maintaining semantic alignment. Code is available at //github.com/Mowenyii/PAE.

穩健性 · 控制器 · 回合 · Performer · LIDAR ·

2024 年 4 月 4 日

DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle

Oskar Natan,Jun Miura

We present DeepIPCv2, an autonomous driving model that perceives the environment using a LiDAR sensor for more robust drivability, especially when driving under poor illumination conditions where everything is not clearly visible. DeepIPCv2 takes a set of LiDAR point clouds as the main perception input. Since point clouds are not affected by illumination changes, they can provide a clear observation of the surroundings no matter what the condition is. This results in a better scene understanding and stable features provided by the perception module to support the controller module in estimating navigational control properly. To evaluate its performance, we conduct several tests by deploying the model to predict a set of driving records and perform real automated driving under three different conditions. We also conduct ablation and comparative studies with some recent models to justify its performance. Based on the experimental results, DeepIPCv2 shows a robust performance by achieving the best drivability in all driving scenarios. Furthermore, to support future research, we will upload the codes and data to //github.com/oskarnatan/DeepIPCv2.

知識 (knowledge) · Learning · MoDELS · 圖 · entity ·

2022 年 11 月 29 日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Yuanning Cui,Yuxin Wang,Zequn Sun,Wenqiang Liu,Yiqiao Jiang,Kexin Han,Wei Hu

from arxiv, Accepted in the 37th AAAI Conference on Artificial Intelligence (AAAI 2023)

Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.

2018 年 3 月 23 日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Chao Ma,Jia-Bin Huang,Xiaokang Yang,Ming-Hsuan Yang

from arxiv, IJCV 2018, Project page: //sites.google.com/site/chaoma99/cf-lstm

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.