蜜芽亚洲精品国产品国语在线试看_午夜日本大胆裸艺术_欧美激情精品视频一区二区三区_扒开末成年粉嫩的小缝图片_西西人体午夜大胆无码视频_韩国三级无码高在线观看_日韩色在线视频三区

In this work, we present GraspFlow, a refinement approach for generating context-specific grasps. We formulate the problem of grasp synthesis as a sampling problem: we seek to sample from a context-conditioned probability distribution of successful grasps. However, this target distribution is unknown. As a solution, we devise a discriminator gradient-flow method to evolve grasps obtained from a simpler distribution in a manner that mimics sampling from the desired target distribution. Unlike existing approaches, GraspFlow is modular, allowing grasps that satisfy multiple criteria to be obtained simply by incorporating the relevant discriminators. It is also simple to implement, requiring minimal code given existing auto-differentiation libraries and suitable discriminators. Experiments show that GraspFlow generates stable and executable grasps on a real-world Panda robot for a diverse range of objects. In particular, in 60 trials on 20 different household objects, the first attempted grasp was successful 94% of the time, and 100% grasp success was achieved by the second grasp. Moreover, incorporating a functional discriminator for robot-human handover improved the functional aspect of the grasp by up to 33%.

相關內容

判別器(qi)

關注 0

MoDELS · 估計/估計量 · 成比例 · Processing（編程語言） · 得分 ·

2023 年 10 月 1 日

Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models

Gen Li,Yuting Wei,Yuxin Chen,Yuejie Chi

from arxiv, Score estimation errors are included in the convergence theory in the new version

Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. In this work, we develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functions. For a popular deterministic sampler (based on the probability flow ODE), we establish a convergence rate proportional to $1/T$ (with $T$ the total number of steps), improving upon past results; for another mainstream stochastic sampler (i.e., a type of the denoising diffusion probabilistic model), we derive a convergence rate proportional to $1/\sqrt{T}$, matching the state-of-the-art theory. Imposing only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), our results characterize how $\ell_2$ score estimation errors affect the quality of the data generation processes. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach without resorting to toolboxes for SDEs and ODEs. Further, we design two accelerated variants, improving the convergence to $1/T^2$ for the ODE-based sampler and $1/T$ for the DDPM-type sampler, which might be of independent theoretical and empirical interest.

Prompt · 優化器 · Learning · MoDELS · 可辨認的 ·

2023 年 9 月 29 日

Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL

Hao Sun,Alihan Hüyük,Mihaela van der Schaar

In this study, we aim to enhance the arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization. We identify a previously overlooked objective of query dependency in such optimization and elucidate two ensuing challenges that impede the successful and economical design of prompt optimization techniques. One primary issue is the absence of an effective method to evaluate prompts during inference when the golden answer is unavailable. Concurrently, learning via interactions with the LLMs to navigate the expansive natural language prompting space proves to be resource-intensive. To address this, we introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data. Such data exists as by-products when diverse prompts are benchmarked on open-accessible datasets. With Prompt-OIRL, the query-dependent prompt optimization objective is achieved by first learning an offline reward model. This model can evaluate any query-prompt pairs without accessing LLMs. Subsequently, a best-of-N strategy is deployed to recommend the optimal prompt. Our experimental evaluations across various LLM scales and arithmetic reasoning datasets underscore both the efficacy and economic viability of the proposed approach.

估計/估計量 · 線性的 · Continuity · 查準率/準確率 · 簇 ·

2023 年 9 月 29 日

A 5-Point Minimal Solver for Event Camera Relative Motion Estimation

Ling Gao,Hang Su,Daniel Gehrig,Marco Cannici,Davide Scaramuzza,Laurent Kneip

Event-based cameras are ideal for line-based motion estimation, since they predominantly respond to edges in the scene. However, accurately determining the camera displacement based on events continues to be an open problem. This is because line feature extraction and dynamics estimation are tightly coupled when using event cameras, and no precise model is currently available for describing the complex structures generated by lines in the space-time volume of events. We solve this problem by deriving the correct non-linear parametrization of such manifolds, which we term eventails, and demonstrate its application to event-based linear motion estimation, with known rotation from an Inertial Measurement Unit. Using this parametrization, we introduce a novel minimal 5-point solver that jointly estimates line parameters and linear camera velocity projections, which can be fused into a single, averaged linear velocity when considering multiple lines. We demonstrate on both synthetic and real data that our solver generates more stable relative motion estimates than other methods while capturing more inliers than clustering based on spatio-temporal planes. In particular, our method consistently achieves a 100% success rate in estimating linear velocity where existing closed-form solvers only achieve between 23% and 70%. The proposed eventails contribute to a better understanding of spatio-temporal event-generated geometries and we thus believe it will become a core building block of future event-based motion estimation algorithms.

語音合成 · 可約的 · Learning · SSL · Processing（編程語言） ·

2023 年 9 月 29 日

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Po-chun Hsu,Ali Elkahky,Wei-Ning Hsu,Yossi Adi,Tu Anh Nguyen,Jade Copet,Emmanuel Dupoux,Hung-yi Lee,Abdelrahman Mohamed

from arxiv, Preprint

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90\% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.

Performer · 泛化理論 · Learning · 相似度 · 情景 ·

2023 年 9 月 28 日

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

Fida Mohammad Thoker,Hazel Doughty,Cees Snoek

from arxiv, Accepted in ICCV 2023

We propose a self-supervised method for learning motion-focused video representations. Existing approaches minimize distances between temporally augmented videos, which maintain high spatial similarity. We instead propose to learn similarities between videos with identical local motion dynamics but an otherwise different appearance. We do so by adding synthetic motion trajectories to videos which we refer to as tubelets. By simulating different tubelet motions and applying transformations, such as scaling and rotation, we introduce motion patterns beyond what is present in the pretraining data. This allows us to learn a video representation that is remarkably data efficient: our approach maintains performance when using only 25\% of the pretraining videos. Experiments on 10 diverse downstream settings demonstrate our competitive performance and generalizability to new domains and fine-grained actions.

穩健性 · 極大 · Continuity · 泛函 · 貪心 ·

2023 年 9 月 28 日

Non-Smooth, H?lder-Smooth, and Robust Submodular Maximization

Duksang Lee,Nam Ho-Nguyen,Dabeen Lee

We study the problem of maximizing a continuous DR-submodular function that is not necessarily smooth. We prove that the continuous greedy algorithm achieves an $[(1-1/e)\OPT-\epsilon]$ guarantee when the function is monotone and H\"older-smooth, meaning that it admits a H\"older-continuous gradient. For functions that are non-differentiable or non-smooth, we propose a variant of the mirror-prox algorithm that attains an $[(1/2)\OPT-\epsilon]$ guarantee. We apply our algorithmic frameworks to robust submodular maximization and distributionally robust submodular maximization under Wasserstein ambiguity. In particular, the mirror-prox method applies to robust submodular maximization to obtain a single feasible solution whose value is at least $(1/2)\OPT-\epsilon$. For distributionally robust maximization under Wasserstein ambiguity, we deduce and work over a submodular-convex maximin reformulation whose objective function is H\"older-smooth, for which we may apply both the continuous greedy and the mirror-prox algorithms.

entity · 3D · 圖 · Performer · 數據集 ·

2023 年 9 月 27 日

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Haonan Chang,Kowndinya Boyalakuntla,Shiyang Lu,Siwei Cai,Eric Jing,Shreesh Keskar,Shijie Geng,Adeeb Abbas,Lifeng Zhou,Kostas Bekris,Abdeslam Boularias

from arxiv, The code and dataset used for evaluation can be found at //github.com/changhaonan/OVSG}{//github.com/changhaonan/OVSG. This paper has been accepted by CoRL2023

We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.

多任務學習 · 學成 · 可理解性 · INFORMS · 泛化理論 ·

2022 年 3 月 28 日

Multi-Task Learning for Visual Scene Understanding

Simon Vandenhende

from arxiv, PhD Thesis

Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however, call for a multi-modal approach and, therefore, for multi-tasking models. Multi-task learning (MTL) aims to leverage useful information across tasks to improve the generalization capability of a model. This thesis is concerned with multi-task learning in the context of computer vision. First, we review existing approaches for MTL. Next, we propose several methods that tackle important aspects of multi-task learning. The proposed methods are evaluated on various benchmarks. The results show several advances in the state-of-the-art of multi-task learning. Finally, we discuss several possibilities for future work.

Prompt · MoDELS · 學成 · Extensibility · 向量化 ·

2022 年 3 月 10 日

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

from arxiv, CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.