在线亚洲91SE亚洲综合在线,国模私拍视频一区二区

The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult to segment out important structures. In this project, we developed ClickSAM, which fine-tunes the Segment Anything Model using click prompts for ultrasound images. ClickSAM has two stages of training: the first stage is trained on single-click prompts centered in the ground-truth contours, and the second stage focuses on improving the model performance through additional positive and negative click prompts. By comparing the first stage predictions to the ground-truth masks, true positive, false positive, and false negative segments are calculated. Positive clicks are generated using the true positive and false negative segments, and negative clicks are generated using the false positive segments. The Centroidal Voronoi Tessellation algorithm is then employed to collect positive and negative click prompts in each segment that are used to enhance the model performance during the second stage of training. With click-train methods, ClickSAM exhibits superior performance compared to other existing models for ultrasound image segmentation.

相關內容

Segment Anything

關注 0

3D · 三維重建 · Automator · 數據預處理 · 線性的 ·

2024 年 3 月 27 日

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Qiuhong Shen,Xuanyu Yi,Zike Wu,Pan Zhou,Hanwang Zhang,Shuicheng Yan,Xinchao Wang

We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU.

Facebook AI Research · 模型評估 · MoDELS · Analysis · 預測準確率 ·

2024 年 3 月 27 日

Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health Inequities

Dana Moukheiber,Saurabh Mahindre,Lama Moukheiber,Mira Moukheiber,Mingchen Gao

from arxiv, ICCV CVAMD 2023

There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest X- ray multi-label classification. Transcending traditional protected attributes, we consider complex interactions within social determinants, enabling a more granular benchmark and evaluation of fairness. We present a simple and robust method that involves retraining the last classification layer of pre-trained models using a balanced dataset across groups. Additionally, we account for fairness constraints and integrate class-balanced fine-tuning for multi-label settings. The evaluation of our method on the MIMIC-CXR dataset demonstrates that our framework achieves an optimal tradeoff between accuracy and fairness compared to baseline methods.

MoDELS · 大語言模型 · 語言模型化 · 多樣性 · INTERACT ·

2024 年 3 月 25 日

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He,Yifan Wang,Lijun Wang,Huchuan Lu,Jun-Yan He,Jin-Peng Lan,Bin Luo,Xuansong Xie

from arxiv, CVPR 2024

Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks. Recent efforts have been made to equip MLLMs with visual perceiving and grounding capabilities. However, there still remains a gap in providing fine-grained pixel-level perceptions and extending interactions beyond text-specific inputs. In this work, we propose {\bf{AnyRef}}, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references, such as texts, boxes, images, or audio. This innovation empowers users with greater flexibility to engage with the model beyond textual and regional prompts, without modality-specific designs. Through our proposed refocusing mechanism, the generated grounding output is guided to better focus on the referenced object, implicitly incorporating additional pixel-level supervision. This simple modification utilizes attention scores generated during the inference of LLM, eliminating the need for extra computations while exhibiting performance enhancements in both grounding masks and referring expressions. With only publicly available training data, our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.

BART · MoDELS · 語言模型化 · Performer · Continuity ·

2024 年 3 月 25 日

Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT

Rohit Raju,Peeta Basa Pati,SA Gandheesh,Gayatri Sanjana Sannala,Suriya KS

Text continues to remain a relevant form of representation for information. Text documents are created either in digital native platforms or through the conversion of other media files such as images and speech. While the digital native text is invariably obtained through physical or virtual keyboards, technologies such as OCR and speech recognition are utilized to transform the images and speech signals into text content. All these variety of mechanisms of text generation also introduce errors into the captured text. This project aims at analyzing different kinds of error that occurs in text documents. The work employs two of the advanced deep neural network-based language models, namely, BART and MarianMT, to rectify the anomalies present in the text. Transfer learning of these models with available dataset is performed to finetune their capacity for error correction. A comparative study is conducted to investigate the effectiveness of these models in handling each of the defined error categories. It is observed that while both models can bring down the erroneous sentences by 20+%, BART can handle spelling errors far better (24.6%) than grammatical errors (8.8%).

state-of-the-art · 估計/估計量 · 自編碼器 · 潛在 · 圖像還原 ·

2024 年 3 月 25 日

Variational Bayes image restoration with compressive autoencoders

Maud Biquard,Marie Chabert,Thomas Oberlin

Regularization of inverse problems is of paramount importance in computational imaging. The ability of neural networks to learn efficient image representations has been recently exploited to design powerful data-driven regularizers. While state-of-the-art plug-and-play methods rely on an implicit regularization provided by neural denoisers, alternative Bayesian approaches consider Maximum A Posteriori (MAP) estimation in the latent space of a generative model, thus with an explicit regularization. However, state-of-the-art deep generative models require a huge amount of training data compared to denoisers. Besides, their complexity hampers the optimization involved in latent MAP derivation. In this work, we first propose to use compressive autoencoders instead. These networks, which can be seen as variational autoencoders with a flexible latent prior, are smaller and easier to train than state-of-the-art generative models. As a second contribution, we introduce the Variational Bayes Latent Estimation (VBLE) algorithm, which performs latent estimation within the framework of variational inference. Thanks to a simple yet efficient parameterization of the variational posterior, VBLE allows for fast and easy (approximate) posterior sampling. Experimental results on image datasets BSD and FFHQ demonstrate that VBLE reaches similar performance than state-of-the-art plug-and-play methods, while being able to quantify uncertainties faster than other existing posterior sampling techniques.

3D · 原始采樣 · 樣本 · 多樣性 · Performer ·

2024 年 3 月 24 日

MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion

Roy Kapon,Guy Tevet,Daniel Cohen-Or,Amit H. Bermano

We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation, using 2D diffusion models that were trained on motions obtained from in-the-wild videos. As such, MAS opens opportunities to exciting and diverse fields of motion previously under-explored as 3D data is scarce and hard to collect. MAS works by simultaneously denoising multiple 2D motion sequences representing different views of the same 3D motion. It ensures consistency across all views at each diffusion step by combining the individual generations into a unified 3D sequence, and projecting it back to the original views. We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers, rhythmic gymnastic performances featuring a ball apparatus, and horse races. In each of these domains, 3D motion capture is arduous, and yet, MAS generates diverse and realistic 3D sequences. Unlike the Score Distillation approach, which optimizes each sample by repeatedly applying small fixes, our method uses a sampling process that was constructed for the diffusion framework. As we demonstrate, MAS avoids common issues such as out-of-domain sampling and mode-collapse. //guytevet.github.io/mas-page/

分解的 · Extensibility · 離散數學 ·

2024 年 3 月 22 日

Exploring the Crochemore and Ziv-Lempel factorizations of some automatic sequences with the software Walnut

Marieh Jahannia,Manon Stipulanti

from arxiv, 15 pages, 2 figures

We explore the Ziv-Lempel and Crochemore factorizations of some classical automatic sequences making an extensive use of the theorem prover Walnut.

Performer · 設計 · Better · 泛函 · Networking ·

2024 年 3 月 22 日

Optimisation of photodetectors design: comparison between Montecarlo and Genetic Algorithms

Patricia M. E. Vázquez,Ligia Ciocci Brazzano,Francisco E. Veiras,Patricio A. Sorichetti

from arxiv, 10 pages, 14 figures

We present Montecarlo and Genetic Algorithm optimisations applied to the design of photodetectors based on a transimpedance amplifier and a photodiode. The circuit performance is evaluated with a merit function and the systematic search method is used as a reference. The design parameters are the feedback network components and the photodiode bias voltage. To evaluate the optimisations, we define the relative difference between its merit and the optimum merit obtained by the systematic search. In both algorithms, the relative difference decreases with the number of evaluations, following a power law. The power-law exponent for the Genetic Algorithm is larger than that of Montecarlo (0.74 vs. 0.50). We conclude that both algorithms are advantageous compared to the systematic search method, and that the Genetic Algorithm shows a better performance than Montecarlo.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.

Performer · Color · Networking · CRAFT · 均方誤差 ·

2018 年 1 月 25 日

C2MSNet: A Novel approach for single image haze removal

Akshay Dudhane,Subrahmanyam Murala

from arxiv, Accepted in Winter Conference on Applications of Computer Vision (WACV-2018)

Degradation of image quality due to the presence of haze is a very common phenomenon. Existing DehazeNet [3], MSCNN [11] tackled the drawbacks of hand crafted haze relevant features. However, these methods have the problem of color distortion in gloomy (poor illumination) environment. In this paper, a cardinal (red, green and blue) color fusion network for single image haze removal is proposed. In first stage, network fusses color information present in hazy images and generates multi-channel depth maps. The second stage estimates the scene transmission map from generated dark channels using multi channel multi scale convolutional neural network (McMs-CNN) to recover the original scene. To train the proposed network, we have used two standard datasets namely: ImageNet [5] and D-HAZY [1]. Performance evaluation of the proposed approach has been carried out using structural similarity index (SSIM), mean square error (MSE) and peak signal to noise ratio (PSNR). Performance analysis shows that the proposed approach outperforms the existing state-of-the-art methods for single image dehazing.