人人干人人摸人人操_国产精品午夜福利鲁丝片在线_欧美精品一区二区三区高清_最近中文国语字幕在线播放视频_97成人精品视频在线观看_一级毛片女人18水真多免费看_麻豆自制传媒国产之光

This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric is proposed for the filter-based Lidar inertial odometry, which demonstrates an improvement from merely quantifying distance to incorporating variance disparity, further enriching the comprehensiveness and accuracy of the residual metric. Due to the strategic design of the residual metric, we propose a simple yet effective voxel-solely mapping scheme, which only necessities the maintenance of one centroid and one covariance matrix for each voxel. Experiments on different datasets demonstrate the robustness and accuracy of our framework for various data inputs and environments. To the benefit of the robotics society, we open source the code at //github.com/Ji1Xingyu/lio_gvm.

相關內容

穩健性

關注 3

Learning · INTERACT · 自助法/自舉法 · SimPLe · 對象識別 ·

2023 年 10 月 4 日

Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs

Mark Stefik,Robert Price

from arxiv, 110 pages, 28 figures

The mainstream AIs approaches are the generative and deep learning approaches with large language models (LLMs) and the manually constructed symbolic approach. Both approaches have led to valuable AI systems and impressive feats. However, manually constructed AIs are brittle even in circumscribed domains. Generative AIs make strange mistakes and do not notice them. In both approaches the AIs cannot be instructed easily, fail to use common sense, and lack curiosity. They have abstract knowledge but lack social alignment. Developmental AIs have more potential. They start with innate competences, interact with their environment, and learn from their interactions. They interact and learn from people and establish perceptual, cognitive, and common grounding. Developmental AIs have demonstrated capabilities including multimodal perception, object recognition, and manipulation. Powerful computational models for hierarchical planning, abstraction discovery, curiosity, and language acquisition exist but need to be adapted to a developmental learning based approach. The promise is that developmental AIs will acquire self-developed and socially developed competences. They would address the shortcomings of current mainstream AI approaches, and ultimately lead to sophisticated forms of learning involving critical reading, provenance evaluation, and hypothesis testing. However, developmental AI projects have not yet fully reached the Speaking Gap corresponding to toddler development at about two years of age, before their speech is fluent. The AIs do not bridge the Reading Gap, to skillfully and skeptically learn from written and online information resources. This position paper lays out the prospects, gaps, and challenges for extending the practice of developmental AIs to create resilient, intelligent, and human-compatible AIs that learn what they need to know.

語言模型化 · 多峰值 · MoDELS · tuning · 輸出空間 ·

2023 年 10 月 4 日

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Xichen Pan,Li Dong,Shaohan Huang,Zhiliang Peng,Wenhu Chen,Furu Wei

from arxiv, Code: //aka.ms/Kosmos-G Project Page: //xichenpan.github.io/kosmosg

Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides. However, the generation from generalized vision-language inputs, especially involving multiple images, remains under-explored. This paper presents Kosmos-G, a model that leverages the advanced perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. Kosmos-G demonstrates a unique capability of zero-shot multi-entity subject-driven generation. Notably, the score distillation instruction tuning requires no modifications to the image decoder. This allows for a seamless substitution of CLIP and effortless integration with a myriad of U-Net techniques ranging from fine-grained controls to personalized image decoder variants. We posit Kosmos-G as an initial attempt towards the goal of "image as a foreign language in image generation."

MoDELS · Performer · 可理解性 · 離散化 · Networking ·

2023 年 10 月 4 日

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Siddhant Arora,Hayato Futami,Jee-weon Jung,Yifan Peng,Roshan Sharma,Yosuke Kashiwagi,Emiru Tsunoo,Shinji Watanabe

Recent studies have demonstrated promising outcomes by employing large language models with multi-tasking capabilities. They utilize prompts to guide the model's behavior and surpass performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly perform various spoken language understanding (SLU) tasks? To address this, we utilize pre-trained automatic speech recognition (ASR) models and employ various task and dataset specifiers as discrete prompts. We demonstrate efficacy of our single multi-task learning (MTL) model "UniverSLU" for 12 different speech classification and sequence generation tasks across 17 datasets and 9 languages. Results show that UniverSLU achieves competitive performance and even surpasses task-specific models. We also conduct preliminary investigations into enabling human-interpretable natural phrases instead of task specifiers as discrete prompts and test the model's generalization capabilities to new paraphrases.

卷積 · 核化 · Backbone · 3D · 圖像分割 ·

2023 年 10 月 4 日

DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable Convolution

Ho Hin Lee,Quan Liu,Qi Yang,Xin Yu,Shunxing Bao,Yuankai Huo,Bennett A. Landman

from arxiv, 14 pages, the source code with our pre-trained model is available at this //github.com/MASILab/deform-uxnet

The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predictions. Despite this, existing core operators, ranging from global-local attention to large kernel convolution, exhibit inherent trade-offs and limitations (e.g., global-local range trade-off, aggregating attentional features). We hypothesize that deformable convolution can be an exploratory alternative to combine all advantages from the previous operators, providing long-range dependency, adaptive spatial aggregation and computational efficiency as a foundation backbone. In this work, we introduce 3D DeformUX-Net, a pioneering volumetric CNN model that adeptly navigates the shortcomings traditionally associated with ViTs and large kernel convolution. Specifically, we revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency. Inspired by the concepts of structural re-parameterization for convolution kernel weights, we further generate the deformable tri-planar offsets by adapting a parallel branch (starting from $1\times1\times1$ convolution), providing adaptive spatial aggregation across all channels. Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, spanning various scales from organs (KiTS: 0.680 to 0.720, MSD Pancreas: 0.676 to 0.717, AMOS: 0.871 to 0.902) to vessels (e.g., MSD hepatic vessels: 0.635 to 0.671) in mean Dice.

語言模型化 · MoDELS · Guidance · AIM · 評論員 ·

2023 年 10 月 3 日

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

Hannah Rose Kirk,Bertie Vidgen,Paul R?ttger,Scott A. Hale

In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behaviour are considered important, then 2) how meanings and definitions are ascribed to these dimensions, and by whom. We situate existing empirical literature and provide guidance on deciding which paradigm to follow. Through this framework, we aim to foster a culture of transparency and critical evaluation, aiding the community in navigating the complexities of aligning LLMs with human populations.

無監督 · 數據增強 · state-of-the-art · Performer · HTTPS ·

2023 年 10 月 3 日

Learnable Data Augmentation for One-Shot Unsupervised Domain Adaptation

Julio Ivan Davila Carrazco,Pietro Morerio,Alessio Del Bue,Vittorio Murino

from arxiv, Accepted to The 34th British Machine Vision Conference (BMVC 2023)

This paper presents a classification framework based on learnable data augmentation to tackle the One-Shot Unsupervised Domain Adaptation (OS-UDA) problem. OS-UDA is the most challenging setting in Domain Adaptation, as only one single unlabeled target sample is assumed to be available for model adaptation. Driven by such single sample, our method LearnAug-UDA learns how to augment source data, making it perceptually similar to the target. As a result, a classifier trained on such augmented data will generalize well for the target domain. To achieve this, we designed an encoder-decoder architecture that exploits a perceptual loss and style transfer strategies to augment the source data. Our method achieves state-of-the-art performance on two well-known Domain Adaptation benchmarks, DomainNet and VisDA. The project code is available at //github.com/IIT-PAVIS/LearnAug-UDA

TOOLS · ACM Multimedia · CASE · 在線 · CASES ·

2023 年 10 月 3 日

Online Multimedia Verification with Computational Tools and OSINT: Russia-Ukraine Conflict Case Studies

Sohail Ahmed Khan,Jan Gunnar Furuly,Henrik Brattli Vold,Rano Tahseen,Duc-Tien Dang-Nguyen

from arxiv, 18 pages

This paper investigates the use of computational tools and Open-Source Intelligence (OSINT) techniques for verifying online multimedia content, with a specific focus on real-world cases from the Russia-Ukraine conflict. Over a nine-month period from April to December 2022, we examine verification workflows, tools, and case studies published by \faktiskbar. Our study showcases the effectiveness of diverse resources, including AI tools, geolocation tools, internet archives, and social media monitoring platforms, in enabling journalists and fact-checkers to efficiently process and corroborate evidence, ensuring the dissemination of accurate information. This research underscores the vital role of computational tools and OSINT techniques in promoting evidence-based reporting and combatting misinformation. We also touch on the current limitations of available tools and prospects for future developments in multimedia verification.

HTML · Performer · Automator · WEB · 歸納偏好 ·

2023 年 10 月 3 日

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Izzeddin Gur,Hiroki Furuta,Austin Huang,Mustafa Safdari,Yutaka Matsuo,Douglas Eck,Aleksandra Faust

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.

圖 · Vision · Performer · Better · state-of-the-art ·

2020 年 7 月 31 日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Fei Yu,Jiji Tang,Weichong Yin,Yu Sun,Hao Tian,Hua Wu,Haifeng Wang

from arxiv, Add comparisons and fix some mistakes for the experiments

We propose a knowledge-enhanced approach, ERNIE-ViL, to learn joint representations of vision and language. ERNIE-ViL tries to construct the detailed semantic connections (objects, attributes of objects and relationships between objects in visual scenes) across vision and language, which are essential to vision-language cross-modal tasks. Incorporating knowledge from scene graphs, ERNIE-ViL constructs Scene Graph Prediction tasks, i.e., Object Prediction, Attribute Prediction and Relationship Prediction in the pre-training phase. More specifically, these prediction tasks are implemented by predicting nodes of different types in the scene graph parsed from the sentence. Thus, ERNIE-ViL can model the joint representation characterizing the alignments of the detailed semantics across vision and language. Pre-trained on two large image-text alignment datasets (Conceptual Captions and SBU), ERNIE-ViL learns better and more robust joint representations. It achieves state-of-the-art performance on 5 vision-language downstream tasks after fine-tuning ERNIE-ViL. Furthermore, it ranked the 1st place on the VCR leader-board with an absolute improvement of 3.7%.

判別器 · Performer · 降維 · 卷積神經網絡 · 多任務學習 ·

2018 年 1 月 25 日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Yuan Gao,Qi She,Jiayi Ma,Mingbo Zhao,Wei Liu,Alan L. Yuille

from arxiv, 11 pages, 5 figures, 7 tables

State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.