亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='s7713'><strong id='s7713'></strong><small id='s7713'></small><button id='s7713'></button><li id='s7713'><noscript id='s7713'><big id='s7713'></big><dt id='s7713'></dt></noscript></li></tr><ol id='s7713'><option id='s7713'><table id='s7713'><blockquote id='s7713'><tbody id='s7713'></tbody></blockquote></table></option></ol><u id='s7713'></u><kbd id='s7713'><kbd id='s7713'></kbd></kbd>

<code id='s7713'><strong id='s7713'></strong></code>

<fieldset id='s7713'></fieldset>

<span id='s7713'></span>

<ins id='s7713'></ins>

<acronym id='s7713'><em id='s7713'></em><td id='s7713'><div id='s7713'></div></td></acronym><address id='s7713'><big id='s7713'><big id='s7713'></big><legend id='s7713'></legend></big></address>

<i id='s7713'><div id='s7713'><ins id='s7713'></ins></div></i>

<i id='s7713'></i>

·

SLAM · 可理解性 · 塑造 · 推斷 · 變換 ·

2023 年 7 月 10 日

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects

Jiahui Fu,Yilun Du,Kurran Singh,Joshua B. Tenenbaum,John J. Leonard

from arxiv, 15 Pages and 12 figures. Accepted to RSS 2023. Project webpage: //neuse-slam.github.io/neuse/

We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-equivariantly in tandem with the object in the physical world. With NeuSE, relative frame transforms can be directly derived from inferred latent codes. Our proposed SLAM paradigm, using NeuSE for object shape and pose characterization, can operate independently or in conjunction with typical SLAM systems. It directly infers SE(3) camera pose constraints that are compatible with general SLAM pose graph optimization, while also maintaining a lightweight object-centric map that adapts to real-world changes. Our approach is evaluated on synthetic and real-world sequences featuring changed objects and shows improved localization accuracy and change-aware mapping capability, when working either standalone or jointly with a common SLAM pipeline.

相關內容

SLAM

即時定位與地圖構建（SLAM或Simultaneouslocalizationandmapping）是這樣一種技術：使得機器人和自動駕駛汽車等設備能在未知環境（沒有先驗知識的前提下）建立地圖,或者在已知環境（已給出該地圖的先驗知識）中能更新地圖,并保證這些設備能在同時追蹤它們的當前位置。

INTERACT · 3D · 預測器/決策函數 · MoDELS · SimPLe ·

2023 年 8 月 31 日

InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

Sirui Xu,Zhengyuan Li,Yu-Xiong Wang,Liang-Yan Gui

from arxiv, ICCV 2023; Project Page: //sirui-xu.github.io/InterDiff/

This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.

MoDELS · 可理解性 · INTERACT · 語言模型化 · 模態 ·

2023 年 8 月 31 日

Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception

Riley Tavassoli,Mani Amani,Reza Akhavian

from arxiv, Preprint submitted to Information Fusion

Vision-language models (VLMs) have shown powerful capabilities in visual question answering and reasoning tasks by combining visual representations with the abstract skill set large language models (LLMs) learn during pretraining. Vision, while the most popular modality to augment LLMs with, is only one representation of a scene. In human-robot interaction scenarios, robot perception requires accurate scene understanding by the robot. In this paper, we define and demonstrate a method of aligning the embedding spaces of different modalities (in this case, inertial measurement unit (IMU) data) to the vision embedding space through a combination of supervised and contrastive training, enabling the VLM to understand and reason about these additional modalities without retraining. We opt to give the model IMU embeddings directly over using a separate human activity recognition model that feeds directly into the prompt to allow for any nonlinear interactions between the query, image, and IMU signal that would be lost by mapping the IMU data to a discrete activity label. Further, we demonstrate our methodology's efficacy through experiments involving human activity recognition using IMU data and visual inputs. Our results show that using multiple modalities as input improves the VLM's scene understanding and enhances its overall performance in various tasks, thus paving the way for more versatile and capable language models in multi-modal contexts.

目標檢測 · Attention · anchor · 講稿 · Extensibility ·

2023 年 8 月 31 日

CircleFormer: Circular Nuclei Detection in Whole Slide Images with Circle Queries and Attention

Hengxu Zhang,Pengpeng Liang,Zhiyong Sun,Bo Song,Erkang Cheng

from arxiv, Accepted at MICCAI 2023

Both CNN-based and Transformer-based object detection with bounding box representation have been extensively studied in computer vision and medical image analysis, but circular object detection in medical images is still underexplored. Inspired by the recent anchor free CNN-based circular object detection method (CircleNet) for ball-shape glomeruli detection in renal pathology, in this paper, we present CircleFormer, a Transformer-based circular medical object detection with dynamic anchor circles. Specifically, queries with circle representation in Transformer decoder iteratively refine the circular object detection results, and a circle cross attention module is introduced to compute the similarity between circular queries and image features. A generalized circle IoU (gCIoU) is proposed to serve as a new regression loss of circular object detection as well. Moreover, our approach is easy to generalize to the segmentation task by adding a simple segmentation branch to CircleFormer. We evaluate our method in circular nuclei detection and segmentation on the public MoNuSeg dataset, and the experimental results show that our method achieves promising performance compared with the state-of-the-art approaches. The effectiveness of each component is validated via ablation studies as well. Our code is released at //github.com/zhanghx-iim-ahu/CircleFormer.

MoDELS · 可理解性 · Learning · state-of-the-art · Performer ·

2023 年 8 月 30 日

ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding

Omer Veysel Cagatan

We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of language models.

MoDELS · 語言模型化 · INFORMS · Performer · CRAFT ·

2023 年 8 月 30 日

Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap

Michael Staniek,Raphael Schumann,Maike Züfle,Stefan Riezler

We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM). The Overpass Query Language (OverpassQL) allows users to formulate complex database queries and is widely adopted in the OSM ecosystem. Generating Overpass queries from natural language input serves multiple use-cases. It enables novice users to utilize OverpassQL without prior knowledge, assists experienced users with crafting advanced queries, and enables tool-augmented large language models to access information stored in the OSM database. In order to assess the performance of current sequence generation models on this task, we propose OverpassNL, a dataset of 8,352 queries with corresponding natural language inputs. We further introduce task specific evaluation metrics and ground the evaluation of the Text-to-OverpassQL task by executing the queries against the OSM database. We establish strong baselines by finetuning sequence-to-sequence models and adapting large language models with in-context examples. The detailed evaluation reveals strengths and weaknesses of the considered learning strategies, laying the foundations for further research into the Text-to-OverpassQL task.

語言模型化 · Integration · HTTPS · Prompt · 有向 ·

2023 年 8 月 29 日

AskIt: Unified Programming Interface for Programming with Large Language Models

Katsumi Okuda,Saman Amarasinghe

In the evolving landscape of software development, Large Language Models (LLMs) exhibit a unique phenomenon known as emergent abilities, demonstrating adeptness across numerous tasks, from text summarization to code generation. While these abilities open up novel avenues in software design and crafting, their incorporation presents substantial challenges. Developers grapple with decisions surrounding the direct embedding of LLMs within applications versus employing them for code generation. Moreover, effective prompt design becomes a critical concern, given the necessity of data extraction from natural language outputs. To address these intricacies, this paper introduces AskIt, a domain-specific language (DSL) specifically designed for LLMs. AskIt simplifies LLM integration, offering type-guided output control, template-based function definitions, and a unified interface that diminishes the distinction between LLM-based code generation and application integration. Furthermore, through Programming by Example (PBE), AskIt harnesses the power of few-shot learning at the programming language level. Our evaluations underscore AskIt's potency. Across 50 tasks, AskIt generated concise prompts for the given tasks, achieving a 16.14% reduction in prompt length relative to benchmarks. Additionally, by enabling the transition from direct LLM application usage to function generation, AskIt achieved significant speedups, as observed in our GSM8K benchmark experiments. Through these advancements, AskIt streamlines the integration of LLMs in software development, offering a more efficient, versatile approach for leveraging emergent abilities. The implementations of AskIt in TypeScript and Python are available at //github.com/katsumiok/ts-askit and //github.com/katsumiok/pyaskit, respectively.

Brackets · Networking · Neural Networks · 變換 · 時間步 ·

2023 年 8 月 29 日

Lie-Poisson Neural Networks (LPNets): Data-Based Computing of Hamiltonian Systems with Symmetries

Christopher Eldred,Fran?ois Gay-Balmaz,Sofiia Huraka,Vakhtang Putkaradze

from arxiv, 57 pages, 13 figures

An accurate data-based prediction of the long-term evolution of Hamiltonian systems requires a network that preserves the appropriate structure under each time step. Every Hamiltonian system contains two essential ingredients: the Poisson bracket and the Hamiltonian. Hamiltonian systems with symmetries, whose paradigm examples are the Lie-Poisson systems, have been shown to describe a broad category of physical phenomena, from satellite motion to underwater vehicles, fluids, geophysical applications, complex fluids, and plasma physics. The Poisson bracket in these systems comes from the symmetries, while the Hamiltonian comes from the underlying physics. We view the symmetry of the system as primary, hence the Lie-Poisson bracket is known exactly, whereas the Hamiltonian is regarded as coming from physics and is considered not known, or known approximately. Using this approach, we develop a network based on transformations that exactly preserve the Poisson bracket and the special functions of the Lie-Poisson systems (Casimirs) to machine precision. We present two flavors of such systems: one, where the parameters of transformations are computed from data using a dense neural network (LPNets), and another, where the composition of transformations is used as building blocks (G-LPNets). We also show how to adapt these methods to a larger class of Poisson brackets. We apply the resulting methods to several examples, such as rigid body (satellite) motion, underwater vehicles, a particle in a magnetic field, and others. The methods developed in this paper are important for the construction of accurate data-based methods for simulating the long-term dynamics of physical systems.

塑造 · 變換 · state-of-the-art · 曲率 · 真實值 ·

2023 年 8 月 29 日

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Ruijin Liu,Ning Lu,Dapeng Chen,Cheng Li,Zejian Yuan,Wei Peng

from arxiv, 9 pages, 8 figures, accepted by ACM MM 2023

We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' positions and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets.

Single-Shot · Branch · 目標檢測 · 推斷 · MS ·

2018 年 4 月 8 日

Single-Shot Object Detection with Enriched Semantics

Zhishuai Zhang,Siyuan Qiao,Cihang Xie,Wei Shen,Bo Wang,Alan L. Yuille

We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.

視覺問答 · 數據集 · Performer · state-of-the-art · MoDELS ·

2018 年 3 月 20 日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li,Qingyi Tao,Shafiq Joty,Jianfei Cai,Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='s7713'></tfoot>

<legend id='s7713'><style id='s7713'><dir id='s7713'><q id='s7713'></q></dir></style></legend>

<i id='s7713'><tr id='s7713'><dt id='s7713'><q id='s7713'><span id='s7713'><b id='s7713'><form id='s7713'><ins id='s7713'></ins><ul id='s7713'></ul><sub id='s7713'></sub></form><legend id='s7713'></legend><bdo id='s7713'><pre id='s7713'><center id='s7713'></center></pre></bdo></b><th id='s7713'></th></span></q></dt></tr></i><div id='s7713'><tfoot id='s7713'></tfoot><dl id='s7713'><fieldset id='s7713'></fieldset></dl></div>