非会员试看十分钟做受小视频_免费无码不卡AV一区二区_国产成人精品一、二区_把女人弄的特爽视频_久久久久久精品免费看A级不卡_婷婷综合五月中文字幕在线_99精品久久这里只有精品

This report provides a concise overview of the proposed North system, which aims to achieve automatic word/syllable recognition for Taiwanese Hakka (Sixian). The report outlines three key components of the system: the acquisition, composition, and utilization of the training data; the architecture of the model; and the hardware specifications and operational statistics. The demonstration of the system can be found at //asrvm.iis.sinica.edu.tw/hakka_sixian.

相關內容

語音(yin)識(shi)別(bie)

關注 753

語(yu)(yu)(yu)音識(shi)別(bie)(bie)是計(ji)(ji)算(suan)(suan)機科學和(he)計(ji)(ji)算(suan)(suan)語(yu)(yu)(yu)言學的一(yi)個跨學科子領域，它(ta)發展(zhan)了(le)一(yi)些方法(fa)和(he)技術，使(shi)計(ji)(ji)算(suan)(suan)機可以將口(kou)語(yu)(yu)(yu)識(shi)別(bie)(bie)和(he)翻譯成文本。它(ta)也被稱(cheng)為自動語(yu)(yu)(yu)音識(shi)別(bie)(bie)（ASR），計(ji)(ji)算(suan)(suan)機語(yu)(yu)(yu)音識(shi)別(bie)(bie)或語(yu)(yu)(yu)音轉文本（STT）。它(ta)整合(he)了(le)計(ji)(ji)算(suan)(suan)機科學，語(yu)(yu)(yu)言學和(he)計(ji)(ji)算(suan)(suan)機工程領域的知識(shi)和(he)研究。

Analysis · 情感分析 · MoDELS · Performer · AIM ·

2023 年 11 月 21 日

LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla Language

Aunabil Chakma,Masum Hasan

from arxiv, Accepted at BLP Workshop @EMNLP2023

This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several external datasets. Our final model is an ensemble of the three best BanglaBert variations. Our system has achieved overall 3rd in the Test Set among 30 participating teams with a score of 0.718. Additionally, we discuss the promising systems that didn't perform well namely task-adaptive pertaining and paraphrasing using BanglaT5. Training codes and external datasets which are used for our system are publicly available at //github.com/Aunabil4602/bnlp-workshop-task2-2023

Learning · Processing（編程語言） · MoDELS · HTTPS · 可理解性 ·

2023 年 11 月 20 日

Solving Math Word Problems with Reexamination

Yi Bin,Wenhao Shi,Yujuan Ding,Yang Yang,See-Kiong Ng

from arxiv, To be appeared at NeurIPS2023 Workshop on MATH-AI

Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules. This paper brings a different perspective of \textit{reexamination process} during training by introducing a pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual) learning scheme to model such process, which is model-agnostic thus can be adapted to any existing MWP solvers. The pseudo-dual task is specifically defined as filling the numbers in the expression back into the original word problem with numbers masked. To facilitate the effective joint learning of the two tasks, we further design a scheduled fusion strategy for the number infilling task, which smoothly switches the input from the ground-truth math expressions to the predicted ones. Our pseudo-dual learning scheme has been tested and proven effective when being equipped in several representative MWP solvers through empirical studies. \textit{The codes and trained models are available at:} \url{//github.com/steven640pixel/PsedualMWP}. \end{abstract}

視覺問答 · 多峰值 · 自動問答 · Vision · 多模態學習 ·

2023 年 11 月 20 日

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering

Triet M. Thai,Anh T. Vo,Hao K. Tieu,Linh N. P. Bui,Thien T. B. Nguyen

from arxiv, ImageCLEF2023 published version: //ceur-ws.org/Vol-3497/paper-129.pdf

In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain, which includes gastroscopy and colonoscopy images. Our team approached Task 1 of the challenge by proposing a multimodal learning method with image enhancement to improve the VQA performance on gastrointestinal images. The multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture for features extraction from question and endoscopy image. The result of this study highlights the dominance of Transformer-based vision models over the CNNs and demonstrates the effectiveness of the image enhancement process, with six out of the eight vision models achieving better F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the development test set, while also producing good result on the private test set with accuracy of 82.01%.

Performer · AIM · 泛化理論 · 論文 · 模式識別 ·

2023 年 11 月 17 日

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data

Pietro Melzi,Ruben Tolosana,Ruben Vera-Rodriguez,Minchul Kim,Christian Rathgeb,Xiaoming Liu,Ivan DeAndres-Tame,Aythami Morales,Julian Fierrez,Javier Ortega-Garcia,Weisong Zhao,Xiangyu Zhu,Zheyu Yan,Xiao-Yu Zhang,Jinlin Wu,Zhen Lei,Suvidha Tripathi,Mahak Kothari,Md Haider Zama,Debayan Deb,Bernardo Biesseck,Pedro Vidal,Roger Granada,Guilherme Fickel,Gustavo Führ,David Menotti,Alexander Unnervik,Anjith George,Christophe Ecabert,Hatef Otroshi Shahreza,Parsa Rahimi,Sébastien Marcel,Ioannis Sarridis,Christos Koutlis,Georgia Baltsou,Symeon Papadopoulos,Christos Diou,Nicolò Di Domenico,Guido Borghi,Lorenzo Pellegrini,Enrique Mas-Candela,ángela Sánchez-Pérez,Andrea Atzori,Fadi Boutros,Naser Damer,Gianni Fenu,Mirko Marras

from arxiv, 10 pages, 1 figure, WACV 2024 Workshops

Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.

Performer · 層 · 縮放 · 圖 · 情景 ·

2023 年 11 月 17 日

Parallel Verification of Natural Deduction Proof Graphs

James T. Oswald,Brandon Rozek

from arxiv, In Proceedings LFMTP 2023, arXiv:2311.09918

Graph-based interactive theorem provers offer a visual representation of proofs, explicitly representing the dependencies and inferences between each of the proof steps in a graph or hypergraph format. The number and complexity of these dependency links can determine how long it takes to verify the validity of the entire proof. Towards this end, we present a set of parallel algorithms for the formal verification of graph-based natural-deduction (ND) style proofs. We introduce a definition of layering that captures dependencies between the proof steps (nodes). Nodes in each layer can then be verified in parallel as long as prior layers have been verified. To evaluate the performance of our algorithms on proof graphs, we propose a framework for finding the performance bounds and patterns using directed acyclic network topologies (DANTs). This framework allows us to create concrete instances of DANTs for empirical evaluation of our algorithms. With this, we compare our set of parallel algorithms against a serial implementation with two experiments: one scaling both the problem size and the other scaling the number of threads. Our findings show that parallelization results in improved verification performance for certain DANT instances. We also show that our algorithms scale for certain DANT instances with respect to the number of threads.

多峰值 · 可理解性 · Prompt · INFORMS · MoDELS ·

2023 年 11 月 17 日

Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding

Hao Sun,Ziwei Niu,Xinyao Yu,Jiaqing Liu,Yen-Wei Chen,Lanfen Lin

Understanding human perceptions presents a formidable multimodal challenge for computers, encompassing aspects such as sentiment tendencies and sense of humor. While various methods have recently been introduced to extract modality-invariant and specific information from diverse modalities, with the goal of enhancing the efficacy of multimodal learning, few works emphasize this aspect in large language models. In this paper, we introduce a novel multimodal prompt strategy tailored for tuning large language models. Our method assesses the correlation among different modalities and isolates the modality-invariant and specific components, which are then utilized for prompt tuning. This approach enables large language models to efficiently and effectively assimilate information from various modalities. Furthermore, our strategy is designed with scalability in mind, allowing the integration of features from any modality into pretrained large language models. Experimental results on public datasets demonstrate that our proposed method significantly improves performance compared to previous methods.

變換 · Vision · 可辨認的 · Taxonomy · Prompt ·

2022 年 1 月 24 日

Transformers in Medical Imaging: A Survey

Fahad Shamshad,Salman Khan,Syed Waqas Zamir,Muhammad Haris Khan,Munawar Hayat,Fahad Shahbaz Khan,Huazhu Fu

from arxiv, 41 pages, \url{//github.com/fahadshamshad/awesome-transformers-in-medical-imaging}

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at \url{//github.com/fahadshamshad/awesome-transformers-in-medical-imaging}.

Performer · 學成 · Boosting（一種模型訓練加速方式） · MoDELS · 可辨認的 ·

2021 年 12 月 22 日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Lin Yang,Yi Shen,Yue Mao,Longjun Cai

from arxiv, Accepted by AAAI-2022

Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.

Neural Networks · Networking · 可約的 · Continuity · 推斷 ·

2021 年 6 月 21 日

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami,Sehoon Kim,Zhen Dong,Zhewei Yao,Michael W. Mahoney,Kurt Keutzer

from arxiv, Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

命名實體識別 · entity · 學成 · 深度學習 · 可辨認的 ·

2018 年 12 月 22 日

A Survey on Deep Learning for Named Entity Recognition

Jing Li,Aixin Sun,Jianglei Han,Chenliang Li

from arxiv, 20 pages, 15 figures

Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.