青柠在线观看免费高清1_在线看片日中文福利免费_国内一区二区三区精品视频在线播放_免费毛黄色视频在_久久99热精品这里久久精品_WWW亚洲一区二区三区在线观看_国产一级无码天天

Analyzing and reconstructing visual stimuli from brain signals effectively advances understanding of the human visual system. However, the EEG signals are complex and contain a amount of noise. This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG, such as difficulties in aligning EEG embeddings with the fine-grained semantic information and a heavy reliance on additional large self-collected dataset for training. To address these challenges, we propose a novel approach called BrainVis. Firstly, we divide the EEG signals into various units and apply a self-supervised approach on them to obtain EEG time-domain features, in an attempt to ease the training difficulty. Additionally, we also propose to utilize the frequency-domain features to enhance the EEG representations. Then, we simultaneously align EEG time-frequency embeddings with the interpolation of the coarse and fine-grained semantics in the CLIP space, to highlight the primary visual components and reduce the cross-modal alignment difficulty. Finally, we adopt the cascaded diffusion models to reconstruct images. Our proposed BrainVis outperforms state of the arts in both semantic fidelity reconstruction and generation quality. Notably, we reduce the training data scale to 10% of the previous work.

相關內容

可約的

關注 2

潛在 · Performer · 可理解性 · 詞表 · Analysis ·

2024 年 2 月 12 日

On the Semantics of LM Latent Space: A Vocabulary-defined Approach

Jian Gu,Chunyang Chen,Aldeida Aleti

from arxiv, under peer-review

Understanding the latent space of language models (LM) is crucial to refining their performance and interpretability. Existing analyses often fall short in providing disentangled (model-centric) insights into LM semantics, and neglect essential aspects of LM adaption. In response, we introduce a pioneering method called vocabulary-defined semantics, which establishes a reference frame within the LM latent space, ensuring disentangled semantic analysis grounded in LM vocabulary. Our approach transcends prior entangled analysis, leveraging LM vocabulary for model-centric insights. Furthermore, we propose a novel technique to compute logits, emphasising differentiability and local isotropy, and introduce a neural clustering module for semantically calibrating data representations during LM adaptation. Through extensive experiments across diverse text understanding datasets, our approach outperforms state-of-the-art methods of retrieval-augmented generation and parameter-efficient finetuning, showcasing its efficacy and broad applicability. Our findings not only shed light on LM mechanics, but also offer practical solutions to enhance LM performance and interpretability.

Automator · 可辨認的 · 樣例 · 變換 · 代碼 ·

2024 年 2 月 11 日

Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example

Malinda Dilhara,Abhiram Bellur,Timofey Bryksin,Danny Dig

from arxiv, This paper is accepted to Proceedings of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE - 2024), This is an author copy

Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. Large Language Models (LLMs), trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.

MoDELS · SimPLe · Networking · 通道 · 模型復雜度 ·

2024 年 2 月 11 日

U-SEANNet: A Simple, Efficient and Applied U-Shaped Network for Diagnosis of Nasal Diseases on Nasal Endoscopic Images

Yubiao Yue,Jun Xue,Chao Wang,Haihua Liang,Zhenzhang Li

from arxiv, There are some descriptive errors in the manuscript

Numerous studies have affirmed that deep learning models can facilitate early diagnosis of lesions in endoscopic images. However, the lack of available datasets stymies advancements in research on nasal endoscopy, and existing models fail to strike a good trade-off between model diagnosis performance, model complexity and parameters size, rendering them unsuitable for real-world application. To bridge these gaps, we created the first large-scale nasal endoscopy dataset, named 7-NasalEID, comprising 11,352 images that contain six common nasal diseases and normal samples. Subsequently, we proposed U-SEANNet, an innovative U-shaped architecture, underpinned by depth-wise separable convolution. Moreover, to enhance its capacity for detecting nuanced discrepancies in input images, U-SEANNet employs the Global-Local Channel Feature Fusion module, enabling it to utilize salient channel features from both global and local contexts. To demonstrate U-SEANNet's potential, we benchmarked U-SEANNet against seventeen modern architectures through five-fold cross-validation. The experimental results show that U-SEANNet achieves a commendable accuracy of 93.58%. Notably, U-SEANNet's parameters size and GFLOPs are only 0.78M and 0.21, respectively. Our findings suggest U-SEANNet is the state-of-the-art model for nasal diseases diagnosis in endoscopic images.

INTERACT · NCRF · Learning · Extensibility · 3D ·

2024 年 2 月 9 日

NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

Zhongqun Zhang,Jifei Song,Eduardo Pérez-Pellitero,Yiren Zhou,Hyung Jin Chang,Ale? Leonardis

from arxiv, Accepted by 3DV 2024

Modeling hand-object interactions is a fundamentally challenging task in 3D computer vision. Despite remarkable progress that has been achieved in this field, existing methods still fail to synthesize the hand-object interaction photo-realistically, suffering from degraded rendering quality caused by the heavy mutual occlusions between the hand and the object, and inaccurate hand-object pose estimation. To tackle these challenges, we present a novel free-viewpoint rendering framework, Neural Contact Radiance Field (NCRF), to reconstruct hand-object interactions from a sparse set of videos. In particular, the proposed NCRF framework consists of two key components: (a) A contact optimization field that predicts an accurate contact field from 3D query points for achieving desirable contact between the hand and the object. (b) A hand-object neural radiance field to learn an implicit hand-object representation in a static canonical space, in concert with the specifically designed hand-object motion field to produce observation-to-canonical correspondences. We jointly learn these key components where they mutually help and regularize each other with visual and geometric constraints, producing a high-quality hand-object reconstruction that achieves photo-realistic novel view synthesis. Extensive experiments on HO3D and DexYCB datasets show that our approach outperforms the current state-of-the-art in terms of both rendering quality and pose estimation accuracy.

Harmony · 設計 · 規范化的 · 表示 · 多樣性 ·

2024 年 2 月 8 日

Visual Harmony: Text-Visual Interplay in Circular Infographics

Shuqi He,Yuqing Chen,Yuxin Xia,Yichun Li,Hai-Ning Liang,Lingyun Yu

Infographics are visual representations designed for efficient and effective communication of data and knowledge. One crucial aspect of infographic design is the interplay between text and visual elements, particularly in circular visualizations where the textual descriptions can either be embedded within the graphics or placed adjacent to the visual representation. While several studies have examined text layout design in visualizations in general, the text-visual interplay in infographics and its subsequent perceptual effects remain underexplored. To address this, our study investigates how varying text placement and descriptiveness impact pleasantness, comprehension and overall memorability in the infographics viewing experience. We recruited 30 participants and presented them with a collection of 15 infographics across a diverse set of topics, including media and public events, health and nutrition, science and research, and sustainability. The text placement (embed, side-to-side) and descriptiveness (simplistic, normal, descriptive) were systematically manipulated, resulting in a total of six experimental conditions. Our key findings indicate that text placement can significantly influence the memorability of infographics, whereas descriptiveness can significantly impact the pleasantness of the viewing experience. Embedding text placement and simplistic text can potentially contribute to more effective infographic designs. These results offer valuable insights for infographic designers, contributing to the creation of more effective and memorable visual representations.

INFORMS · MoDELS · Integration · 模型評估 · Boosting（一種模型訓練加速方式） ·

2024 年 2 月 8 日

Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce

Chang Liu,Peng Hou,Anxiang Zeng,Han Yu

from arxiv, Accepted by IAAI 2024

Over the past decade, significant advances have been made in the field of image search for e-commerce applications. Traditional image-to-image retrieval models, which focus solely on image details such as texture, tend to overlook useful semantic information contained within the images. As a result, the retrieved products might possess similar image details, but fail to fulfil the user's search goals. Moreover, the use of image-to-image retrieval models for products containing multiple images results in significant online product feature storage overhead and complex mapping implementations. In this paper, we report the design and deployment of the proposed Multi-modal Item Embedding Model (MIEM) to address these limitations. It is capable of utilizing both textual information and multiple images about a product to construct meaningful product features. By leveraging semantic information from images, MIEM effectively supplements the image search process, improving the overall accuracy of retrieval results. MIEM has become an integral part of the Shopee image search platform. Since its deployment in March 2023, it has achieved a remarkable 9.90% increase in terms of clicks per user and a 4.23% boost in terms of orders per user for the image search feature on the Shopee e-commerce platform.

語言模型化 · 知識 (knowledge) · MoDELS · HTTPS · 有向 ·

2023 年 10 月 11 日

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances

Zihan Zhang,Meng Fang,Ling Chen,Mohammad-Reza Namazi-Rad,Jun Wang

from arxiv, EMNLP 2023 main conference, paper link at //github.com/hyintell/awesome-refreshing-llms

Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at //github.com/hyintell/awesome-refreshing-llms

Hugging Face · MoDELS · AI · ChatGPT · 模態 ·

2023 年 5 月 25 日

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Yongliang Shen,Kaitao Song,Xu Tan,Dongsheng Li,Weiming Lu,Yueting Zhuang

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards artificial general intelligence.

Performer · Machine Learning · 模型性能 · MoDELS · Processing（編程語言） ·

2021 年 8 月 2 日

A Survey of Human-in-the-loop for Machine Learning

Xingjiao Wu,Luwei Xiao,Yixuan Sun,Junhang Zhang,Tianlong Ma,Liang He

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.