亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<li id='Ooish'></li>

_{^{<dd id='CiMSD'><tbody id='wj2UG'><td id='N2fJP'><optgroup id='1uMSy'><strong id='lD6ZC'></strong></optgroup><address id='tWxv0'><ul id='1l1jq'></ul></address><big id='MuLgC'></big></td><table id='W59o3'></table></tbody><pre id='HI5X6'></pre></dd><span id='LvgwD'><b id='9mWqk'></b></span>}}


<dfn id='9X4vB'><optgroup id='8KXE8'></optgroup></dfn><tfoot id='0RaIi'><bdo id='y54S3'><div id='4o2Rh'></div><i id='LRNnw'><dt id='CvnJ4'></dt></i></bdo></tfoot>

_{<fieldset id='gan0h'></fieldset>}

·

3D · HTTPS · RGB-D · 可理解性 · 張成子空間 ·

2023 年 12 月 26 日

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Tai Wang,Xiaohan Mao,Chenming Zhu,Runsen Xu,Ruiyuan Lyu,Peisen Li,Xiao Chen,Wenwei Zhang,Kai Chen,Tianfan Xue,Xihui Liu,Cewu Lu,Dahua Lin,Jiangmiao Pang

from arxiv, A multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. Project page: //tai-wang.github.io/embodiedscan

In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To address the gap, we introduce EmbodiedScan, a multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. It encompasses over 5k scans encapsulating 1M ego-centric RGB-D views, 1M language prompts, 160k 3D-oriented boxes spanning over 760 categories, some of which partially align with LVIS, and dense semantic occupancy with 80 common categories. Building upon this database, we introduce a baseline framework named Embodied Perceptron. It is capable of processing an arbitrary number of multi-modal inputs and demonstrates remarkable 3D perception capabilities, both within the two series of benchmarks we set up, i.e., fundamental 3D perception tasks and language-grounded tasks, and in the wild. Codes, datasets, and benchmarks will be available at //github.com/OpenRobotLab/EmbodiedScan.

相關內容

3D是(shi)英文“Three Dimensions”的(de)簡稱，中文是(shi)指(zhi)三維、三個維度、三個坐標，即有(you)(you)長(chang)、有(you)(you)寬、有(you)(you)高，換句話(hua)說，就(jiu)是(shi)立體的(de)，是(shi)相對于(yu)只有(you)(you)長(chang)和寬的(de)平面（2D）而言。

估計/估計量 · INTERACT · Agent · 變分自編碼 · 推斷 ·

2024 年 2 月 14 日

Auto-Encoding Bayesian Inverse Games

Xinjie Liu,Lasse Peters,Javier Alonso-Mora,Ufuk Topcu,David Fridovich-Keil

When multiple agents interact in a common environment, each agent's actions impact others' future decisions, and noncooperative dynamic games naturally capture this coupling. In interactive motion planning, however, agents typically do not have access to a complete model of the game, e.g., due to unknown objectives of other players. Therefore, we consider the inverse game problem, in which some properties of the game are unknown a priori and must be inferred from observations. Existing maximum likelihood estimation (MLE) approaches to solve inverse games provide only point estimates of unknown parameters without quantifying uncertainty, and perform poorly when many parameter values explain the observed behavior. To address these limitations, we take a Bayesian perspective and construct posterior distributions of game parameters. To render inference tractable, we employ a variational autoencoder (VAE) with an embedded differentiable game solver. This structured VAE can be trained from an unlabeled dataset of observed interactions, naturally handles continuous, multi-modal distributions, and supports efficient sampling from the inferred posteriors without computing game solutions at runtime. Extensive evaluations in simulated driving scenarios demonstrate that the proposed approach successfully learns the prior and posterior objective distributions, provides more accurate objective estimates than MLE baselines, and facilitates safer and more efficient game-theoretic motion planning.

潛在 · MoDELS · 回合 · 機器人 · Learning ·

2024 年 2 月 13 日

LDTrack: Dynamic People Tracking by Service Robots using Diffusion Models

Angus Fung,Beno Benhabib,Goldie Nejat

Tracking of dynamic people in cluttered and crowded human-centered environments is a challenging robotics problem due to the presence of intraclass variations including occlusions, pose deformations, and lighting variations. This paper introduces a novel deep learning architecture, using conditional latent diffusion models, the Latent Diffusion Track (LDTrack), for tracking multiple dynamic people under intraclass variations. By uniquely utilizing conditional latent diffusion models to capture temporal person embeddings, our architecture can adapt to appearance changes of people over time. We incorporated a latent feature encoder network which enables the diffusion process to operate within a high-dimensional latent space to allow for the extraction and spatial-temporal refinement of such rich features as person appearance, motion, location, identity, and contextual information. Extensive experiments demonstrate the effectiveness of LDTrack over other state-of-the-art tracking methods in cluttered and crowded human-centered environments under intraclass variations. Namely, the results show our method outperforms existing deep learning robotic people tracking methods in both tracking accuracy and tracking precision with statistical significance.

損失 · 查準率/準確率 · Analysis · Medical Image Analysis · Processing（編程語言） ·

2024 年 2 月 13 日

FESS Loss: Feature-Enhanced Spatial Segmentation Loss for Optimizing Medical Image Analysis

Charulkumar Chodvadiya,Navyansh Mahla,Kinshuk Gaurav Singh,Kshitij Sharad Jadhav

from arxiv, 5 Pages, 3 figures

Medical image segmentation is a critical process in the field of medical imaging, playing a pivotal role in diagnosis, treatment, and research. It involves partitioning of an image into multiple regions, representing distinct anatomical or pathological structures. Conventional methods often grapple with the challenge of balancing spatial precision and comprehensive feature representation due to their reliance on traditional loss functions. To overcome this, we propose Feature-Enhanced Spatial Segmentation Loss (FESS Loss), that integrates the benefits of contrastive learning (which extracts intricate features, particularly in the nuanced domain of medical imaging) with the spatial accuracy inherent in the Dice loss. The objective is to augment both spatial precision and feature-based representation in the segmentation of medical images. FESS Loss signifies a notable advancement, offering a more accurate and refined segmentation process, ultimately contributing to heightened precision in the analysis of medical images. Further, FESS loss demonstrates superior performance in limited annotated data availability scenarios often present in the medical domain.

Agent · 多峰值 · FAST · 大語言模型 · Principle ·

2024 年 2 月 13 日

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Xiangming Gu,Xiaosen Zheng,Tianyu Pang,Chao Du,Qian Liu,Ye Wang,Jing Jiang,Min Lin

A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jailbreak. It entails the adversary simply jailbreaking a single agent, and without any further intervention from the adversary, (almost) all agents will become infected exponentially fast and exhibit harmful behaviors. To validate the feasibility of infectious jailbreak, we simulate multi-agent environments containing up to one million LLaVA-1.5 agents, and employ randomized pair-wise chat as a proof-of-concept instantiation for multi-agent interaction. Our results show that feeding an (infectious) adversarial image into the memory of any randomly chosen agent is sufficient to achieve infectious jailbreak. Finally, we derive a simple principle for determining whether a defense mechanism can provably restrain the spread of infectious jailbreak, but how to design a practical defense that meets this principle remains an open question to investigate. Our project page is available at //sail-sg.github.io/Agent-Smith/.

層 · 穩健性 · 推斷 · 數據增強 · Learning ·

2024 年 2 月 12 日

Improving Robustness via Tilted Exponential Layer: A Communication-Theoretic Perspective

Bhagyashree Puranik,Ahmad Beirami,Yao Qin,Upamanyu Madhow

from arxiv, This manuscript has been accepted for publication at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

State-of-the-art techniques for enhancing robustness of deep networks mostly rely on empirical risk minimization with suitable data augmentation. In this paper, we propose a complementary approach motivated by communication theory, aimed at enhancing the signal-to-noise ratio at the output of a neural network layer via neural competition during learning and inference. In addition to minimization of a standard end-to-end cost, neurons compete to sparsely represent layer inputs by maximization of a tilted exponential (TEXP) objective function for the layer. TEXP learning can be interpreted as maximum likelihood estimation of matched filters under a Gaussian model for data noise. Inference in a TEXP layer is accomplished by replacing batch norm by a tilted softmax, which can be interpreted as computation of posterior probabilities for the competing signaling hypotheses represented by each neuron. After providing insights via simplified models, we show, by experimentation on standard image datasets, that TEXP learning and inference enhances robustness against noise and other common corruptions, without requiring data augmentation. Further cumulative gains in robustness against this array of distortions can be obtained by appropriately combining TEXP with data augmentation techniques.

優化器 · 泛函 · 相似度 · 潛在 · 損失 ·

2024 年 2 月 9 日

AdvART: Adversarial Art for Camouflaged Object Detection Attacks

Amira Guesmi,Ioan Marius Bilasco,Muhammad Shafique,Ihsen Alouani

Physical adversarial attacks pose a significant practical threat as it deceives deep learning systems operating in the real world by producing prominent and maliciously designed physical perturbations. Emphasizing the evaluation of naturalness is crucial in such attacks, as humans can readily detect and eliminate unnatural manipulations. To overcome this limitation, recent work has proposed leveraging generative adversarial networks (GANs) to generate naturalistic patches, which may not catch human's attention. However, these approaches suffer from a limited latent space which leads to an inevitable trade-off between naturalness and attack efficiency. In this paper, we propose a novel approach to generate naturalistic and inconspicuous adversarial patches. Specifically, we redefine the optimization problem by introducing an additional loss term to the cost function. This term works as a semantic constraint to ensure that the generated camouflage pattern holds semantic meaning rather than arbitrary patterns. The additional term leverages similarity metrics to construct a similarity loss that we optimize within the global objective function. Our technique is based on directly manipulating the pixel values in the patch, which gives higher flexibility and larger space compared to the GAN-based techniques that are based on indirectly optimizing the patch by modifying the latent vector. Our attack achieves superior success rate of up to 91.19\% and 72\%, respectively, in the digital world and when deployed in smart cameras at the edge compared to the GAN-based technique.

Agent · Conformer · 估計/估計量 · MoDELS · Learning ·

2024 年 2 月 9 日

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Nikunj Gupta,Somjit Nath,Samira Ebrahimi Kahou

Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning.

原點 · 近似 · MoDELS · 推斷 · 解碼 ·

2024 年 2 月 8 日

NeRCC: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

Parsa Moradi,Mohammad Ali Maddah-Ali

Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination of original data points, (2) computing, in which a cluster of workers run inference on the coded data points, (3) decoding regression and sampling, which approximately recovers the predictions of the original data points from the available predictions on the coded data points. We argue that the overall objective of the framework reveals an underlying interconnection between two regression models in the encoding and decoding layers. We propose a solution to the nested regressions problem by summarizing their dependence on two regularization terms that are jointly optimized. Our extensive experiments on different datasets and various machine learning models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate that NeRCC accurately approximates the original predictions in a wide range of stragglers, outperforming the state-of-the-art by up to 23%.

注意力機制 · Cognition · Performer · 深度學習 · Boosting（一種模型訓練加速方式） ·

2022 年 4 月 16 日

Visual Attention Methods in Deep Learning: An In-Depth Survey

Mohammed Hassanin,Saeed Anwar,Ibrahim Radwan,Fahad S Khan,Ajmal Mian

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.

圖 · INTERACT · 可理解性 · Extensibility · 學成 ·

2021 年 12 月 16 日

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Zhecan Wang,Haoxuan You,Liunian Harold Li,Alireza Zareian,Suji Park,Yiqing Liang,Kai-Wei Chang,Shih-Fu Chang

from arxiv, AAAI 2022

Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

可(ke)理解性(xing)

張成(cheng)子空間

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='aTuu8'><del id='23eY3'><del id='qHtYX'></del><pre id='2GJf6'><pre id='kF6f5'><option id='X9MA0'><address id='aVyDl'></address><bdo id='RJOvE'><tr id='Dq8mP'><acronym id='1fSiL'><pre id='JiKj1'></pre></acronym><div id='EYpAj'></div></tr></bdo></option></pre><small id='4qNY3'><address id='Ei1Mf'><u id='s3oLC'><legend id='4ISzd'><option id='SpDYA'><abbr id='2lpMR'></abbr><li id='4tnlK'><pre id='mNitl'></pre></li></option></legend><select id='Rpfct'></select></u></address></small></pre></del><sup id='Q59Q8'></sup><blockquote id='7fZwB'><dt id='NWZyb'></dt></blockquote><blockquote id='t4n6m'></blockquote></dir><tt id='9Odap'></tt><u id='qAyov'><tt id='DjAFT'><form id='8mIPU'></form></tt><td id='XE2U7'><dt id='zXUs3'></dt></td></u>

<code id='czNII'><i id='92YWj'><q id='hIGhr'><legend id='cBtkN'><pre id='rxTDD'><style id='pEsPE'><acronym id='D21tq'><i id='qvQOb'><form id='qaMYT'><option id='gl5fk'><center id='J7v5p'></center></option></form></i></acronym></style><tt id='PPkfs'></tt></pre></legend></q></i></code><center id='DJirQ'></center>

<dd id='NzqpB'></dd>

<style id='q8JkC'></style><sub id='BLwoy'><dfn id='nPCky'><abbr id='mhIBL'><big id='DC5AB'><bdo id='0F9V2'></bdo></big></abbr></dfn></sub>_{<dir id='6MLx8'></dir>}