黄片一级在线视频播放,免费黄色视频地址,黄色无码高清视频网站

Recent advances in artificial intelligence promote a wide range of computer vision applications in many different domains. Digital cameras, acting as human eyes, can perceive fundamental object properties, such as shapes and colors, and can be further used for conducting high-level tasks, such as image classification, and object detections. Human perceptions have been widely recognized as the ground truth for training and evaluating computer vision models. However, in some cases, humans can be deceived by what they have seen. Well-functioned human vision relies on stable external lighting while unnatural illumination would influence human perception of essential characteristics of goods. To evaluate the illumination effects on human and computer perceptions, the group presents a novel dataset, the Food Vision Dataset (FVD), to create an evaluation benchmark to quantify illumination effects, and to push forward developments of illumination estimation methods for fair and reliable consumer acceptability prediction from food appearances. FVD consists of 675 images captured under 3 different power and 5 different temperature settings every alternate day for five such days.

相關內容

Vision

關注 0

Learning · 估計/估計量 · 強化學習 · 策略評估 · Marketplace ·

2022 年 10 月 24 日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Chengchun Shi,Runzhe Wan,Ge Song,Shikai Luo,Rui Song,Hongtu Zhu

The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at //github.com/RunzheStat/CausalMARL.

INFORMS · 機器人 · 相同 · INTERACT · Performer ·

2022 年 10 月 24 日

If You Are Careful, So Am I! How Robot Communicative Motions Can Influence Human Approach in a Joint Task

Linda Lastrico,Nuno Ferreira Duarte,Alessandro Carfì,Francesco Rea,Fulvio Mastrogiovanni,Alessandra Sciutti,José Santos-Victor

from arxiv, Accepted to ICSR 2022, 14th International Conference on Social Robotics, preprint version

As humans, we have a remarkable capacity for reading the characteristics of objects only by observing how another person carries them. Indeed, how we perform our actions naturally embeds information on the item features. Collaborative robots can achieve the same ability by modulating the strategy used to transport objects with their end-effector. A contribution in this sense would promote spontaneous interactions by making an implicit yet effective communication channel available. This work investigates if humans correctly perceive the implicit information shared by a robotic manipulator through its movements during a dyadic collaboration task. Exploiting a generative approach, we designed robot actions to convey virtual properties of the transported objects, particularly to inform the partner if any caution is required to handle the carried item. We found that carefulness is correctly interpreted when observed through the robot movements. In the experiment, we used identical empty plastic cups; nevertheless, participants approached them differently depending on the attitude shown by the robot: humans change how they reach for the object, being more careful whenever the robot does the same. This emerging form of motor contagion is entirely spontaneous and happens even if the task does not require it.

Learning · Performer · 解析梯度 · Liquid · 可理解性 ·

2022 年 10 月 24 日

Benchmarking Deformable Object Manipulation with Differentiable Physics

Siwei Chen,Cunjun Yu,Yiqing Xu,Linfeng Li,Xiao Ma,Zhongwen Xu,David Hsu

Deformable Object Manipulation (DOM) is of significant importance to both daily and industrial applications. Recent successes in differentiable physics simulators allow learning algorithms to train a policy with analytic gradients through environment dynamics, which significantly facilitates the development of DOM algorithms. However, existing DOM benchmarks are either single-object-based or non-differentiable. This leaves the questions of 1) how a task-specific algorithm performs on other tasks and 2) how a differentiable-physics-based algorithm compares with the non-differentiable ones in general. In this work, we present DaXBench, a differentiable DOM benchmark with a wide object and task coverage. DaXBench includes 9 challenging high-fidelity simulated tasks, covering rope, cloth, and liquid manipulation with various difficulty levels. To better understand the performance of general algorithms on different DOM tasks, we conduct comprehensive experiments over representative DOM methods, ranging from planning to imitation learning and reinforcement learning. In addition, we provide careful empirical studies of existing decision-making algorithms based on differentiable physics, and discuss their limitations, as well as potential future directions.

Guidance · 無監督 · 變換 · state-of-the-art · HTTPS ·

2022 年 10 月 24 日

Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers

Zhiwei Lin,Zengyu Yang,Yongtao Wang

from arxiv, Accepted in WACV 2023

Unsupervised object discovery (UOD) has recently shown encouraging progress with the adoption of pre-trained Transformer features. However, current methods based on Transformers mainly focus on designing the localization head (e.g., seed selection-expansion and normalized cut) and overlook the importance of improving Transformer features. In this work, we handle UOD task from the perspective of feature enhancement and propose FOReground guidance and MUlti-LAyer feature fusion for unsupervised object discovery, dubbed FORMULA. Firstly, we present a foreground guidance strategy with an off-the-shelf UOD detector to highlight the foreground regions on the feature maps and then refine object locations in an iterative fashion. Moreover, to solve the scale variation issues in object detection, we design a multi-layer feature fusion module that aggregates features responding to objects at different scales. The experiments on VOC07, VOC12, and COCO 20k show that the proposed FORMULA achieves new state-of-the-art results on unsupervised object discovery. The code will be released at //github.com/VDIGPKU/FORMULA.

Principle · SimPLe · 泛函 · 粵港澳大灣區數字經濟研究院 · 講稿 ·

2022 年 10 月 24 日

Logical Relations for Partial Features and Automatic Differentiation Correctness

Fernando Lucatelli Nunes,Matthijs Vákár

from arxiv, 25 pages (18 pages + references and appendices), conference paper (the corresponding extended work can be found at arXiv:2210.07724), submitted to FoSSaCS. arXiv admin note: substantial text overlap with arXiv:2210.07724

We present a simple technique for semantic, open logical relations arguments about languages with recursive types, which, as we show, follows from a principled foundation in categorical semantics. We demonstrate how it can be used to give a very straightforward proof of correctness of practical forward- and reverse-mode dual numbers style automatic differentiation (AD) on ML-family languages. The key idea is to combine it with a suitable open logical relations technique for reasoning about differentiable partial functions (a suitable lifting of the partiality monad to logical relations), which we introduce.

泛函 · 貝葉斯網/貝葉斯網絡 · Networking · MoDELS · 有向非循環圖 ·

2022 年 10 月 23 日

Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data

Fangting Zhou,Kejun He,Kunbo Wang,Yanxun Xu,Yang Ni

Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest, which has not yet been fully explored. In this article, we develop a novel Bayesian network model for multivariate functional data where the conditional independence and causal structure are both encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian process, which is adopted by most existing functional data analysis models. The more reasonable non-Gaussian assumption is the key for unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional Bayesian network model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples are used to demonstrate the practical utility of the proposed model.

INTERACT · 傳感器 · 設計 · Extensibility · 模態 ·

2022 年 10 月 22 日

A Design Space for Human Sensor and Actuator Focused In-Vehicle Interaction Based on a Systematic Literature Review

Pascal Jansen,Mark Colley,Enrico Rukzio

from arxiv, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Automotive user interfaces constantly change due to increasing automation, novel features, additional applications, and user demands. While in-vehicle interaction can utilize numerous promising modalities, no existing overview includes an extensive set of human sensors and actuators and interaction locations throughout the vehicle interior. We conducted a systematic literature review of 327 publications leading to a design space for in-vehicle interaction that outlines existing and lack of work regarding input and output modalities, locations, and multimodal interaction. To investigate user acceptance of possible modalities and locations inferred from existing work and gaps unveiled in our design space, we conducted an online study (N=48). The study revealed users' general acceptance of novel modalities (e.g., brain or thermal activity) and interaction with locations other than the front (e.g., seat or table). Our work helps practitioners evaluate key design decisions, exploit trends, and explore new areas in the domain of in-vehicle interaction.

Guidance · 3D · 穩健性 · state-of-the-art · 損失函數（機器學習） ·

2022 年 10 月 21 日

Self-Supervised Robustifying Guidance for Monocular 3D Face Reconstruction

Hitika Tiwari,Min-Hung Chen,Yi-Min Tsai,Hsien-Kai Kuo,Hung-Jen Chen,Kevin Jou,K. S. Venkatesh,Yong-Sheng Chen

from arxiv, Accepted by The 33rd British Machine Vision Conference (BMVC) 2022. Evaluation code and datasets: //github.com/ArcTrinity9/Datasets-ReaChOcc-and-SynChOcc

Despite the recent developments in 3D Face Reconstruction from occluded and noisy face images, the performance is still unsatisfactory. Moreover, most existing methods rely on additional dependencies, posing numerous constraints over the training procedure. Therefore, we propose a Self-Supervised RObustifying GUidancE (ROGUE) framework to obtain robustness against occlusions and noise in the face images. The proposed network contains 1) the Guidance Pipeline to obtain the 3D face coefficients for the clean faces and 2) the Robustification Pipeline to acquire the consistency between the estimated coefficients for occluded or noisy images and the clean counterpart. The proposed image- and feature-level loss functions aid the ROGUE learning process without posing additional dependencies. To facilitate model evaluation, we propose two challenging occlusion face datasets, ReaChOcc and SynChOcc, containing real-world and synthetic occlusion-based face images for robustness evaluation. Also, a noisy variant of the test dataset of CelebA is produced for evaluation. Our method outperforms the current state-of-the-art method by large margins (e.g., for the perceptual errors, a reduction of 23.8% for real-world occlusions, 26.4% for synthetic occlusions, and 22.7% for noisy images), demonstrating the effectiveness of the proposed approach. The occlusion datasets and the corresponding evaluation code are released publicly at //github.com/ArcTrinity9/Datasets-ReaChOcc-and-SynChOcc.

圖像字幕 · state-of-the-art · Vision · 可辨認的 · 語言模型化 ·

2021 年 7 月 14 日

From Show to Tell: A Survey on Image Captioning

Matteo Stefanini,Marcella Cornia,Lorenzo Baraldi,Silvia Cascianelli,Giuseppe Fiameni,Rita Cucchiara

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, in the last few years, a large research effort has been devoted to image captioning, i.e. the task of describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoding step and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, and relationships and the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results obtained, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches, from visual encoding and text generation to training strategies, used datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in image captioning architectures and training strategies. Moreover, many variants of the problem and its open challenges are analyzed and discussed. The final goal of this work is to serve as a tool for understanding the existing state-of-the-art and highlighting the future directions for an area of research where Computer Vision and Natural Language Processing can find an optimal synergy.

可理解性 · 邊界框 · RGB-D · 相互獨立的 · 3D ·

2020 年 2 月 27 日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Yinyu Nie,Xiaoguang Han,Shihui Guo,Yujian Zheng,Jian Chang,Jian Jun Zhang

from arxiv, Accepted by CVPR 2020

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.