亚洲欧洲日产国_国产亚洲一区二区三区在线_四虎最新永久在线精品免费_人妻无码久久久中文字幕_玖玖国产精品久久_久久九九精品国产观看_丝袜JK激烈娇喘视频

Shape modeling research in Computer Graphics has been an active area for decades. The ability to create and edit complex 3D shapes has been of key importance in Computer-Aided Design, Animation, Architecture, and Entertainment. With the growing popularity of Virtual and Augmented Reality, new applications and tools have been developed for artistic content creation; real-time interactive shape modeling has become increasingly important for a continuum of virtual and augmented reality environments (eXtended Reality (XR)). Shape modeling in XR opens new possibilities for intuitive design and shape modeling in an accessible way. Artificial Intelligence (AI) approaches generating shape information from text prompts are set to change how artists create and edit 3D models. There has been a substantial body of research on interactive 3D shape modeling. However, there is no recent extensive review of the existing techniques and what AI shape generation means for shape modeling in interactive XR environments. In this state-of-the-art paper, we fill this research gap in the literature by surveying free-form shape modeling work in XR, with a focus on sculpting and 3D sketching, the most intuitive forms of free-form shape modeling. We classify and discuss these works across five dimensions: contribution of the articles, domain setting, interaction tool, auto-completion, and collaborative designing. The paper concludes by discussing the disconnect between interactive 3D sculpting and sketching and how this will likely evolve with the prevalence of AI shape-generation tools in the future.

相關內容

塑造

關注 1

點云 · 3D · 傳感器 · 真實值 · state-of-the-art ·

2024 年 2 月 16 日

Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds

David Jin,Sushrut Karmalkar,Harry Zhang,Luca Carlone

from arxiv, 8 pages, Accepted by ICRA 2024

We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.

規范化的 · 判別器 · 未標記 · 偽標記 · Better ·

2024 年 2 月 16 日

NorMatch: Matching Normalizing Flows with Discriminative Classifiers for Semi-Supervised Learning

Zhongying Deng,Rihuan Ke,Carola-Bibiane Schonlieb,Angelica I Aviles-Rivero

from arxiv, Accepted to Transactions on Machine Learning Research

Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.

Continuity · Learning · 可理解性 · MoDELS · Performer ·

2024 年 2 月 16 日

Evaluating and Improving Continual Learning in Spoken Language Understanding

Muqiao Yang,Xiang Li,Umberto Cappellazzo,Shinji Watanabe,Bhiksha Raj

Continual learning has emerged as an increasingly important challenge across various tasks, including Spoken Language Understanding (SLU). In SLU, its objective is to effectively handle the emergence of new concepts and evolving environments. The evaluation of continual learning algorithms typically involves assessing the model's stability, plasticity, and generalizability as fundamental aspects of standards. However, existing continual learning metrics primarily focus on only one or two of the properties. They neglect the overall performance across all tasks, and do not adequately disentangle the plasticity versus stability/generalizability trade-offs within the model. In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning. By employing the proposed metric, we demonstrate how introducing various knowledge distillations can improve different aspects of these three properties of the SLU model. We further show that our proposed metric is more sensitive in capturing the impact of task ordering in continual learning, making it better suited for practical use-case scenarios.

contrastive · state-of-the-art · Boosting（一種模型訓練加速方式） · MoDELS · 對比學習 ·

2024 年 2 月 15 日

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Benedikt Alkin,Lukas Miklautz,Sepp Hochreiter,Johannes Brandstetter

We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models. The motivation behind MIM-Refiner is rooted in the insight that optimal representations within MIM models generally reside in intermediate layers. Accordingly, MIM-Refiner leverages multiple contrastive heads that are connected to diverse intermediate layers. In each head, a modified nearest neighbor objective helps to construct respective semantic clusters. The refinement process is short but effective. Within a few epochs, we refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features. Refining a ViT-H, pre-trained with data2vec 2.0 on ImageNet-1K, achieves new state-of-the-art results in linear probing (84.7%) and low-shot classification among models that are pre-trained on ImageNet-1K. In ImageNet-1K 1-shot classification, MIM-Refiner sets a new state-of-the-art of 64.2%, outperforming larger models that were trained on up to 2000x more data such as DINOv2-g, OpenCLIP-G and MAWS-6.5B. Project page: //ml-jku.github.io/MIM-Refiner

Analysis · 深度學習框架 · Learning · 深度學習 · 可約的 ·

2024 年 2 月 15 日

DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

Lars V?gtlin,Anna Scius-Bertrand,Paul Maergner,Andreas Fischer,Rolf Ingold

Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the framework also allows to reduce the time of model training significantly.

統計量 · Weight · 圖像分割 · Integration · MoDELS ·

2024 年 2 月 14 日

Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for Single Image Test-Time Adaptation

Haoyu Dong,Nicholas Konz,Hanxue Gu,Maciej A. Mazurowski

Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing. Existing TTA techniques rely on having multiple test images from the same domain, yet this may be impractical in real-world applications such as medical imaging, where data acquisition is expensive and imaging conditions vary frequently. Here, we approach such a task, of adapting a medical image segmentation model with only a single unlabeled test image. Most TTA approaches, which directly minimize the entropy of predictions, fail to improve performance significantly in this setting, in which we also observe the choice of batch normalization (BN) layer statistics to be a highly important yet unstable factor due to only having a single test domain example. To overcome this, we propose to instead \textit{integrate} over predictions made with various estimates of target domain statistics between the training and test statistics, weighted based on their entropy statistics.

特征選擇 · Learning · INFORMS · 可約的 · 演化計算 ·

2024 年 2 月 14 日

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Xubin Wang,Haojiong Shangguan,Fengyi Huang,Shangrui Wu,Weijia Jia

Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at //github.com/wangxb96/MEL.

Prompt · 優化器 · 大語言模型 · HTTPS · Integration ·

2024 年 2 月 13 日

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

Yongchao Chen,Jacob Arkin,Yilun Hao,Yang Zhang,Nicholas Roy,Chuchu Fan

from arxiv, 39 pages, 13 figures

Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework that incorporates human-designed feedback rules about potential errors to automatically offer direct suggestions for improvement. Our framework is stylized as a genetic algorithm in which an LLM generates new candidate prompts from a parent prompt and its associated feedback; we use a learned heuristic function that predicts prompt performance to efficiently sample from these candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across eight representative multi-step tasks (an average 27.7% and 28.2% improvement to current best methods on GPT-3.5 and GPT-4, respectively). We further show that the score function for tasks can be modified to better align with individual preferences. We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks. Datasets and Codes are available at //github.com/yongchao98/PROMST. Project Page is available at //yongchao98.github.io/MIT-REALM-PROMST.

多峰值 · 學成 · Extensibility · 深度學習 · Processing（編程語言） ·

2021 年 5 月 24 日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Jabeen Summaira,Xi Li,Amin Muhammad Shoib,Songyuan Li,Jabbar Abdul

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of past and current baseline approaches and an in-depth study of recent advancements in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning applications is proposed, elaborating on different applications in more depth. Architectures and datasets used in these applications are also discussed, along with their evaluation metrics. Last, main issues are highlighted separately for each domain along with their possible future research directions.

學成 · 深度學習 · 可辨認的 · MoDELS · 目標跟蹤 ·

2019 年 7 月 31 日

Deep Learning in Video Multi-Object Tracking: A Survey

Gioele Ciaparrone,Francisco Luque Sánchez,Siham Tabik,Luigi Troiano,Roberto Tagliaferri,Francisco Herrera

from arxiv, New in v2: corrected typos and various minor mistakes. Submitted to Neurocomputing. Main text: 25 pages, 5 figures, 6 tables. Summary table in appendix at the end of the paper

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.