无码人妻一区二区三区在线不卡_国产亚洲欧美丝袜在线观看三区_久久国产福国产秒拍_99在线观看免费视频精品_亚洲最新色大成永久一区二区_手机在线小视频观看_性色AV一区二区三区观看

In this work we perform a scoping review of the current literature on the detection of throat cancer from speech recordings using machine learning and artificial intelligence. We find 22 papers within this area and discuss their methods and results. We split these papers into two groups - nine performing binary classification, and 13 performing multi-class classification. The papers present a range of methods with neural networks being most commonly implemented. Many features are also extracted from the audio before classification, with the most common bring mel-frequency cepstral coefficients. None of the papers found in this search have associated code repositories and as such are not reproducible. Therefore, we create a publicly available code repository of our own classifiers. We use transfer learning on a multi-class problem, classifying three pathologies and healthy controls. Using this technique we achieve an unweighted average recall of 53.54%, sensitivity of 83.14%, and specificity of 64.00%. We compare our classifiers with the results obtained on the same dataset and find similar results.

相關內容

Performer

關注 10

3D · GaN · 原點 · 潛在 · 相互獨立的 ·

2023 年 9 月 8 日

DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

Junzhe Zhang,Yushi Lan,Shuai Yang,Fangzhou Hong,Quan Wang,Chai Kiat Yeo,Ziwei Liu,Chen Change Loy

from arxiv, ICCV 2023. Code: //github.com/junzhezhang/DeformToon3D Project page: //www.mmlab-ntu.com/project/deformtoon3d/

In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture. Although fine-tuning a pre-trained 3D GAN on the artistic domain can produce reasonable performance, this strategy has limitations in the 3D domain. In particular, fine-tuning can deteriorate the original GAN latent space, which affects subsequent semantic editing, and requires independent optimization and storage for each new style, limiting flexibility and efficient deployment. To overcome these challenges, we propose DeformToon3D, an effective toonification framework tailored for hierarchical 3D GAN. Our approach decomposes 3D toonification into subproblems of geometry and texture stylization to better preserve the original latent space. Specifically, we devise a novel StyleField that predicts conditional 3D deformation to align a real-space NeRF to the style space for geometry stylization. Thanks to the StyleField formulation, which already handles geometry stylization well, texture stylization can be achieved conveniently via adaptive style mixing that injects information of the artistic domain into the decoder of the pre-trained 3D GAN. Due to the unique design, our method enables flexible style degree control and shape-texture-specific style swap. Furthermore, we achieve efficient training without any real-world 2D-3D training pairs but proxy samples synthesized from off-the-shelf 2D toonification models.

LaMa · INTERACT · 回合 · 3D · Learning ·

2023 年 9 月 8 日

Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Jiye Lee,Hanbyul Joo

from arxiv, Accepted to ICCV 2023

Synthesizing interaction-involved human motions has been challenging due to the high complexity of 3D environments and the diversity of possible human behaviors within. We present LAMA, Locomotion-Action-MAnipulation, to synthesize natural and plausible long-term human movements in complex indoor environments. The key motivation of LAMA is to build a unified framework to encompass a series of everyday motions including locomotion, scene interaction, and object manipulation. Unlike existing methods that require motion data "paired" with scanned 3D scenes for supervision, we formulate the problem as a test-time optimization by using human motion capture data only for synthesis. LAMA leverages a reinforcement learning framework coupled with a motion matching algorithm for optimization, and further exploits a motion editing framework via manifold learning to cover possible variations in interaction and manipulation. Throughout extensive experiments, we demonstrate that LAMA outperforms previous approaches in synthesizing realistic motions in various challenging scenarios. Project page: //jiyewise.github.io/projects/LAMA/ .

任務對話系統 · MoDELS · 知識 (knowledge) · 數據集 · 多樣性 ·

2023 年 9 月 8 日

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

Hongpeng Lin,Ludan Ruan,Wenke Xia,Peiyu Liu,Jingyuan Wen,Yixin Xu,Di Hu,Ruihua Song,Wayne Xin Zhao,Qin Jin,Zhiwu Lu

from arxiv, Accepted to ACM Multimedia 2023

To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of TikTalk, propose a video-based multi-modal chitchat task, and evaluate several dialogue baselines. Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall. Furthermore, no existing model can solve all the above challenges well. There is still a large room for future improvements, even for LLM with visual extensions. Our dataset is available at \url{//ruc-aimind.github.io/projects/TikTalk/}.

Learning · 機器人 · CASE · Automator · HTTPS ·

2023 年 9 月 6 日

Robotic Table Tennis: A Case Study into a High Speed Learning System

David B. D'Ambrosio,Jonathan Abelian,Saminda Abeyruwan,Michael Ahn,Alex Bewley,Justin Boyd,Krzysztof Choromanski,Omar Cortes,Erwin Coumans,Tianli Ding,Wenbo Gao,Laura Graesser,Atil Iscen,Navdeep Jaitly,Deepali Jain,Juhana Kangaspunta,Satoshi Kataoka,Gus Kouretas,Yuheng Kuang,Nevena Lazic,Corey Lynch,Reza Mahjourian,Sherry Q. Moore,Thinh Nguyen,Ken Oslund,Barney J Reed,Krista Reymann,Pannag R. Sanketi,Anish Shankar,Pierre Sermanet,Vikas Sindhwani,Avi Singh,Vincent Vanhoucke,Grace Vesom,Peng Xu

from arxiv, Published and presented at Robotics: Science and Systems (RSS2023)

We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at //youtu.be/uFcnWjB42I0.

連結 · Networking · 層 · MoDELS · AIM ·

2023 年 9 月 6 日

Delving into Ipsilateral Mammogram Assessment under Multi-View Network

Thai Ngoc Toan Truong,Thanh-Huy Nguyen,Ba Thinh Lam,Vu Minh Duy Nguyen,Hong Phuc Nguyen

In many recent years, multi-view mammogram analysis has been focused widely on AI-based cancer assessment. In this work, we aim to explore diverse fusion strategies (average and concatenate) and examine the model's learning behavior with varying individuals and fusion pathways, involving Coarse Layer and Fine Layer. The Ipsilateral Multi-View Network, comprising five fusion types (Pre, Early, Middle, Last, and Post Fusion) in ResNet-18, is employed. Notably, the Middle Fusion emerges as the most balanced and effective approach, enhancing deep-learning models' generalization performance by +2.06% (concatenate) and +5.29% (average) in VinDr-Mammo dataset and +2.03% (concatenate) and +3% (average) in CMMD dataset on macro F1-Score. The paper emphasizes the crucial role of layer assignment in multi-view network extraction with various strategies.

圖片分類 · Automator · 標注 · 規范化的 · Performer ·

2023 年 9 月 6 日

Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach

Jikai Zhang,Carlos Santos,Christine Park,Maciej Mazurowski,Roy Colglazier

from arxiv, This is the preprint version

Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We developed our approach using 7,382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WAUC: 0.903) value and higher AUC-ROC values among all classes (normal AUC-ROC: 0.894; abnormal AUC-ROC: 0.896, arthroplasty AUC-ROC: 0.990) compared to the baseline model (WAUC=0.857; normal AUC-ROC: 0.842; abnormal AUC-ROC: 0.848, arthroplasty AUC-ROC: 0.987), trained using only manually labeled data. DeLong tests show that the improvement is significant on normal (p-value<0.002) and abnormal (p-value<0.001) images. Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.

Performer · MoDELS · 推斷 · AUC · 可約的 ·

2023 年 9 月 5 日

Self-Supervised Pretraining Improves Performance and Inference Efficiency in Multiple Lung Ultrasound Interpretation Tasks

Blake VanBerlo,Brian Li,Jesse Hoey,Alexander Wong

from arxiv, 10 pages, 5 figures, submitted to IEEE Access

In this study, we investigated whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple classification tasks in B-mode lung ultrasound analysis. When fine-tuning on three lung ultrasound tasks, pretrained models resulted in an improvement of the average across-task area under the receiver operating curve (AUC) by 0.032 and 0.061 on local and external test sets respectively. Compact nonlinear classifiers trained on features outputted by a single pretrained model did not improve performance across all tasks; however, they did reduce inference time by 49% compared to serial execution of separate fine-tuned models. When training using 1% of the available labels, pretrained models consistently outperformed fully supervised models, with a maximum observed test AUC increase of 0.396 for the task of view classification. Overall, the results indicate that self-supervised pretraining is useful for producing initial weights for lung ultrasound classifiers.

MoDELS · AIM · 評論員 · 語言模型化 · 知識 (knowledge) ·

2022 年 12 月 20 日

Towards Reasoning in Large Language Models: A Survey

Jie Huang,Kevin Chen-Chuan Chang

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.

注意力機制 · Cognition · Performer · 深度學習 · Boosting（一種模型訓練加速方式） ·

2022 年 4 月 16 日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Mohammed Hassanin,Saeed Anwar,Ibrahim Radwan,Fahad S Khan,Ajmal Mian

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.

Automator · AutoML · Machine Learning · 學成 · 可約的 ·

2019 年 1 月 17 日

Taking Human out of Learning Applications: A Survey on Automated Machine Learning

Quanming Yao,Mengshuo Wang,Yuqiang Chen,Wenyuan Dai,Hu Yi-Qi,Li Yu-Feng,Tu Wei-Wei,Yang Qiang,Yu Yang

from arxiv, This is a preliminary and will be kept updated

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursue good learning performance, human experts are heavily involved in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automated machine learning (AutoML) has emerged as a hot topic with both industrial and academic interest. In this paper, we provide an up to date survey on AutoML. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers most existing approaches to date but also can guide the design for new methods. Subsequently, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future research.