亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='p4NR7'></tfoot>

<legend id='DbXoH'><style id='eZdQq'><dir id='jD47M'><q id='ayekk'></q></dir></style></legend>

<i id='3hxpc'><tr id='Zb0nN'><dt id='4vCyW'><q id='yXAiU'><span id='tNHFx'><b id='R9oTG'><form id='BHPC4'><ins id='pwmbq'></ins><ul id='fevGq'></ul><sub id='Re8zW'></sub></form><legend id='zhzM1'></legend><bdo id='pK22v'><pre id='2WqQm'><center id='vP0EH'></center></pre></bdo></b><th id='bLz3O'></th></span></q></dt></tr></i><div id='YSH8L'><tfoot id='8JMlS'></tfoot><dl id='18sK3'><fieldset id='nMyZ3'></fieldset></dl></div>

<li id='Irs8H'><abbr id='Skmri'></abbr></li>

·

MoDELS · 掩碼 · CLIP · 3D · 稀疏 ·

2024 年 1 月 19 日

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Fanda Fan,Chaoxu Guo,Litong Gong,Biao Wang,Tiezheng Ge,Yuning Jiang,Chunjie Luo,Jianfeng Zhan

from arxiv, Accepted to ACM MM 2023

Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results and codes are provided at our //fanfanda.github.io/M3DDM/.

相關內容

MoDELS

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 解碼 · 卷積 · Notability · INFORMS ·

2024 年 2 月 29 日

Effective Message Hiding with Order-Preserving Mechanisms

Gao Yu,Qiu Xuchong,Ye Zihan

from arxiv, 7 Pages

Message hiding, a technique that conceals secret message bits within a cover image, aims to achieve an optimal balance among message capacity, recovery accuracy, and imperceptibility. While convolutional neural networks have notably improved message capacity and imperceptibility, achieving high recovery accuracy remains challenging. This challenge arises because convolutional operations struggle to preserve the sequential order of message bits and effectively address the discrepancy between these two modalities. To address this, we propose StegaFormer, an innovative MLP-based framework designed to preserve bit order and enable global fusion between modalities. Specifically, StegaFormer incorporates three crucial components: Order-Preserving Message Encoder (OPME), Decoder (OPMD) and Global Message-Image Fusion (GMIF). OPME and OPMD aim to preserve the order of message bits by segmenting the entire sequence into equal-length segments and incorporating sequential information during encoding and decoding. Meanwhile, GMIF employs a cross-modality fusion mechanism to effectively fuse the features from the two uncorrelated modalities. Experimental results on the COCO and DIV2K datasets demonstrate that StegaFormer surpasses existing state-of-the-art methods in terms of recovery accuracy, message capacity, and imperceptibility. We will make our code publicly available.

Networking · 數據集 · 成對型 · 講稿 · state-of-the-art ·

2024 年 2 月 29 日

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Alexander Black,Jing Shi,Yifei Fai,Tu Bui,John Collomosse

from arxiv, AAAI 2024

We present VIXEN - a technique that succinctly summarizes in text the visual differences between a pair of images in order to highlight any content manipulation present. Our proposed network linearly maps image features in a pairwise manner, constructing a soft prompt for a pretrained large language model. We address the challenge of low volume of training data and lack of manipulation variety in existing image difference captioning (IDC) datasets by training on synthetically manipulated images from the recent InstructPix2Pix dataset generated via prompt-to-prompt editing framework. We augment this dataset with change summaries produced via GPT-3. We show that VIXEN produces state-of-the-art, comprehensible difference captions for diverse image contents and edit types, offering a potential mitigation against misinformation disseminated via manipulated image content. Code and data are available at //github.com/alexblck/vixen

Learning · 真實值 · INFORMS · 數據集 · 滑動窗口 ·

2024 年 2 月 29 日

Atmospheric Turbulence Removal with Video Sequence Deep Visual Priors

P. Hill,N. Anantrasirichai,A. Achim,D. R. Bull

Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific content. In this paper, we address these problems with a self-supervised learning method that does not require ground truth. The proposed method is not dependent on any dataset outside of the single data sequence being processed but is also able to improve the quality of any input raw sequences or pre-processed sequences. Specifically, our method is based on an accelerated Deep Image Prior (DIP), but integrates temporal information using pixel shuffling and a temporal sliding window. This efficiently learns spatio-temporal priors leading to a system that effectively mitigates atmospheric turbulence distortions. The experiments show that our method improves visual quality results qualitatively and quantitatively.

Color · MoDELS · 數據可視化 · 約束 · EASE ·

2024 年 2 月 29 日

Accessible Color Sequences for Data Visualization

Matthew A. Petroff

from arxiv, 26 pages, 4 figures, 4 tables; comments welcome

Color sequences, ordered sets of colors for data visualization, that balance aesthetics with accessibility considerations are presented. In order to model aesthetic preference, data were collected with an online survey, and the results were used to train a machine-learning model. To ensure accessibility, this model was combined with minimum-perceptual-distance constraints, including for simulated color-vision deficiencies, as well as with minimum-lightness-distance constraints for grayscale printing, maximum-lightness constraints for maintaining contrast with a white background, and scores from a color-saliency model for ease of use of the colors in verbal and written descriptions. Optimal color sequences containing six, eight, and ten colors were generated using the data-driven aesthetic-preference model and accessibility constraints. Due to the balance of aesthetics and accessibility considerations, the resulting color sequences can serve as reasonable defaults in data-plotting codes, e.g., for use in scatter plots and line plots.

Microsoft Surface · 估計/估計量 · 規范化的 · Vision · Unstructured ·

2024 年 2 月 28 日

Leveraging Compliant Tactile Perception for Haptic Blind Surface Reconstruction

Laurent Yves Emile Ramos Cheret,Vinicius Prado da Fonseca,Thiago Eustaquio Alves de Oliveira

from arxiv, 7 pages, 9 figures, 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

Non-flat surfaces pose difficulties for robots operating in unstructured environments. Reconstructions of uneven surfaces may only be partially possible due to non-compliant end-effectors and limitations on vision systems such as transparency, reflections, and occlusions. This study achieves blind surface reconstruction by harnessing the robotic manipulator's kinematic data and a compliant tactile sensing module, which incorporates inertial, magnetic, and pressure sensors. The module's flexibility enables us to estimate contact positions and surface normals by analyzing its deformation during interactions with unknown objects. While previous works collect only positional information, we include the local normals in a geometrical approach to estimate curvatures between adjacent contact points. These parameters then guide a spline-based patch generation, which allows us to recreate larger surfaces without an increase in complexity while reducing the time-consuming step of probing the surface. Experimental validation demonstrates that this approach outperforms an off-the-shelf vision system in estimation accuracy. Moreover, this compliant haptic method works effectively even when the manipulator's approach angle is not aligned with the surface normals, which is ideal for unknown non-flat surfaces.

跡 · 在線 · on the fly · 向量化 · 編譯器 ·

2024 年 2 月 28 日

Sound Concurrent Traces for Online Monitoring Technical Report

Chukri Soueidi,Ylies Falcone

Monitoring concurrent programs typically rely on collecting traces to abstract program executions. However, existing approaches targeting general behavioral properties are either not tailored for online monitoring, are no longer maintained, or implement naive instrumentation that often leads to unsound verdicts. We first define the notion of when a trace is representative of a concurrent execution. We then present a non-blocking vector clock algorithm to collect sound concurrent traces on the fly reflecting the partial order between events. Moreover, concurrent events in the representative trace pose a soundness problem for monitors synthesized from total order formalisms. For this, we extract a causal dependence relation from the monitor to check if the trace has the needed orderings and define the conditions to decide at runtime when a collected trace is monitorable. We implement our contributions in a tool, FACTS, which instruments programs compiling to Java bytecode, constructs sound representative traces, and warns the monitor about non-monitorable traces. We evaluate our work and compare it with existing approaches.

有向 · Extensibility · Processing（編程語言） · MoDELS · 編譯器 ·

2020 年 12 月 16 日

Communicative Message Passing for Inductive Relation Reasoning

Sijie Mai,Shuangjia Zheng,Yuedong Yang,Haifeng Hu

from arxiv, Accepted by AAAI-2021

Relation prediction for knowledge graphs aims at predicting missing relationships between entities. Despite the importance of inductive relation prediction, most previous works are limited to a transductive setting and cannot process previously unseen entities. The recent proposed subgraph-based relation reasoning models provided alternatives to predict links from the subgraph structure surrounding a candidate triplet inductively. However, we observe that these methods often neglect the directed nature of the extracted subgraph and weaken the role of relation information in the subgraph modeling. As a result, they fail to effectively handle the asymmetric/anti-symmetric triplets and produce insufficient embeddings for the target triplets. To this end, we introduce a \textbf{C}\textbf{o}mmunicative \textbf{M}essage \textbf{P}assing neural network for \textbf{I}nductive re\textbf{L}ation r\textbf{E}asoning, \textbf{CoMPILE}, that reasons over local directed subgraph structures and has a vigorous inductive bias to process entity-independent semantic relations. In contrast to existing models, CoMPILE strengthens the message interactions between edges and entitles through a communicative kernel and enables a sufficient flow of relation information. Moreover, we demonstrate that CoMPILE can naturally handle asymmetric/anti-symmetric relations without the need for explosively increasing the number of model parameters by extracting the directed enclosing subgraphs. Extensive experiments show substantial performance gains in comparison to state-of-the-art methods on commonly used benchmark datasets with variant inductive settings.

無監督 · 表示學習 · 損失函數（機器學習） · 學成 · 未標記 ·

2020 年 2 月 26 日

Evolving Losses for Unsupervised Video Representation Learning

AJ Piergiovanni,Anelia Angelova,Michael S. Ryoo

from arxiv, arXiv admin note: text overlap with arXiv:1906.03248

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

視頻描述生成（Video Caption） · 分層強化學習 · 學成 · 強化學習 · state-of-the-art ·

2018 年 3 月 29 日

Video Captioning via Hierarchical Reinforcement Learning

Xin Wang,Wenhu Chen,Jiawei Wu,Yuan-Fang Wang,William Yang Wang

from arxiv, CVPR 2018, with supplementary material

Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a high-level Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach significantly outperforms all the baseline methods on a newly introduced large-scale dataset for fine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widely-used MSR-VTT dataset.

Extensibility · 圖像字幕 · 情景 · Better · MoDELS ·

2017 年 12 月 21 日

Exploring Models and Data for Remote Sensing Image Caption Generation

Xiaoqiang Lu,Binqiang Wang,Xiangtao Zheng,Xuelong Li

from arxiv, 14 pages, 8 figures

Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at //github.com/2051/RSICD_optimal

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<form id='4ipng'></form>

<bdo id='4ipng'><sup id='4ipng'><div id='4ipng'><bdo id='4ipng'></bdo></div></sup></bdo>