亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='k8LBS'></tfoot>

<legend id='rvlgI'><style id='lcoXQ'><dir id='1gDi7'><q id='klmiT'></q></dir></style></legend>

<i id='0JpZX'><tr id='NIouX'><dt id='jcX33'><q id='NYpRg'><span id='Sijsr'><b id='YIDwO'><form id='GoeHq'><ins id='rhCQd'></ins><ul id='vTTFG'></ul><sub id='OqVlp'></sub></form><legend id='vwf9i'></legend><bdo id='2A8I0'><pre id='fe2Qr'><center id='59pMK'></center></pre></bdo></b><th id='FwJOU'></th></span></q></dt></tr></i><div id='9WAnj'><tfoot id='CWl5B'></tfoot><dl id='sWUxh'><fieldset id='V31nT'></fieldset></dl></div>

<li id='Nd3sZ'><abbr id='1palW'></abbr></li>

·

可約的 · 通道 · state-of-the-art · INFORMS · Learning ·

2024 年 1 月 31 日

Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators

Lifan Zhao,Yanyan Shen

from arxiv, Accepted to ICLR 2024. Preprint version

Recently, channel-independent methods have achieved state-of-the-art performance in multivariate time series (MTS) forecasting. Despite reducing overfitting risks, these methods miss potential opportunities in utilizing channel dependence for accurate predictions. We argue that there exist locally stationary lead-lag relationships between variates, i.e., some lagged variates may follow the leading indicators within a short time period. Exploiting such channel dependence is beneficial since leading indicators offer advance information that can be used to reduce the forecasting difficulty of the lagged variates. In this paper, we propose a new method named LIFT that first efficiently estimates leading indicators and their leading steps at each time step and then judiciously allows the lagged variates to utilize the advance information from leading indicators. LIFT plays as a plugin that can be seamlessly collaborated with arbitrary time series forecasting methods. Extensive experiments on six real-world datasets demonstrate that LIFT improves the state-of-the-art methods by 5.5% in average forecasting performance.

相關內容

可約的

MoDELS · 狀態空間 · Attention · 層 · Performer ·

2024 年 3 月 12 日

SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Yuta Oshima,Shohei Taniguchi,Masahiro Suzuki,Yutaka Matsuo

from arxiv, Accepted as workshop paper at ICLR 2024

Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their memory consumption, which increases quadratically with the length of the sequence. This limitation presents significant challenges when attempting to generate longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs). SSMs have recently gained attention as viable alternatives due to their linear memory consumption relative to sequence length. In the experiments, we first evaluate our SSM-based model with UCF101, a standard benchmark of video generation. In addition, to investigate the potential of SSMs for longer video generation, we perform an experiment using the MineRL Navigate dataset, varying the number of frames to 64 and 150. In these settings, our SSM-based model can considerably save memory consumption for longer sequences, while maintaining competitive FVD scores to the attention-based models. Our codes are available at //github.com/shim0114/SSM-Meets-Video-Diffusion-Models.

正則化項 · Performer · 3D · 真實值 · state-of-the-art ·

2024 年 3 月 11 日

FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization

Jiahui Zhang,Fangneng Zhan,Muyu Xu,Shijian Lu,Eric Xing

3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.

state-of-the-art · DeepFakes · MoDELS · 表示 · Extensibility ·

2024 年 3 月 11 日

Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection

Chuangchuang Tan,Ping Liu,RenShuai Tao,Huan Liu,Yao Zhao,Baoyuan Wu,Yunchao Wei

from arxiv, 12 pages, 3 figures

Recently, the proliferation of increasingly realistic synthetic images generated by various generative adversarial networks has increased the risk of misuse. Consequently, there is a pressing need to develop a generalizable detector for accurately recognizing fake images. The conventional methods rely on generating diverse training sources or large pretrained models. In this work, we show that, on the contrary, the small and training-free filter is sufficient to capture more general artifact representations. Due to its unbias towards both the training and test sources, we define it as Data-Independent Operator (DIO) to achieve appealing improvements on unseen sources. In our framework, handcrafted filters and the randomly-initialized convolutional layer can be used as the training-free artifact representations extractor with excellent results. With the data-independent operator of a popular classifier, such as Resnet50, one could already reach a new state-of-the-art without bells and whistles. We evaluate the effectiveness of the DIO on 33 generation models, even DALLE and Midjourney. Our detector achieves a remarkable improvement of $13.3\%$, establishing a new state-of-the-art performance. The DIO and its extension can serve as strong baselines for future methods. The code is available at \url{//github.com/chuangchuangtan/Data-Independent-Operator}.

CLUES · 多樣性 · 推斷 · 相關系數 · 蒸餾 ·

2024 年 3 月 11 日

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues

Qilang Ye,Zitong Yu,Xin Liu

Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer. Although mining deeper layers of audio-visual information to interact with questions facilitates the multimodal fusion process, the redundancy of audio-visual parameters tends to reduce the generalization of the inference engine to multiple question-answer pairs in a single video. Indeed, the natural heterogeneous relationship between audiovisuals and text makes the perfect fusion challenging, to prevent high-level audio-visual semantics from weakening the network's adaptability to diverse question types, we propose a framework for performing mutual correlation distillation (MCD) to aid question inference. MCD is divided into three main steps: 1) firstly, the residual structure is utilized to enhance the audio-visual soft associations based on self-attention, then key local audio-visual features relevant to the question context are captured hierarchically by shared aggregators and coupled in the form of clues with specific question vectors. 2) Secondly, knowledge distillation is enforced to align audio-visual-text pairs in a shared latent space to narrow the cross-modal semantic gap. 3) And finally, the audio-visual dependencies are decoupled by discarding the decision-level integrations. We evaluate the proposed method on two publicly available datasets containing multiple question-and-answer pairs, i.e., Music-AVQA and AVQA. Experiments show that our method outperforms other state-of-the-art methods, and one interesting finding behind is that removing deep audio-visual features during inference can effectively mitigate overfitting. The source code is released at //github.com/rikeilong/MCD-forAVQA.

Attention · DCN · Learning · Softmax · 估計/估計量 ·

2024 年 3 月 11 日

LVC-LGMC: Joint Local and Global Motion Compensation for Learned Video Compression

Wei Jiang,Junru Li,Kai Zhang,Li Zhang

from arxiv, Accepted to ICASSP 2024 (lecture presentation). The first attempt to use cross attention for bits-free motion estimation and motion compensation

Existing learned video compression models employ flow net or deformable convolutional networks (DCN) to estimate motion information. However, the limited receptive fields of flow net and DCN inherently direct their attentiveness towards the local contexts. Global contexts, such as large-scale motions and global correlations among frames are ignored, presenting a significant bottleneck for capturing accurate motions. To address this issue, we propose a joint local and global motion compensation module (LGMC) for leaned video coding. More specifically, we adopt flow net for local motion compensation. To capture global context, we employ the cross attention in feature domain for motion compensation. In addition, to avoid the quadratic complexity of vanilla cross attention, we divide the softmax operations in attention into two independent softmax operations, leading to linear complexity. To validate the effectiveness of our proposed LGMC, we integrate it with DCVC-TCM and obtain learned video compression with joint local and global motion compensation (LVC-LGMC). Extensive experiments demonstrate that our LVC-LGMC has significant rate-distortion performance improvements over baseline DCVC-TCM.

Agent · 大語言模型 · MoDELS · Processing（編程語言） · 目標檢測 ·

2024 年 3 月 10 日

Reframe Anything: LLM Agent for Open World Video Reframing

Jiawang Cao,Yongliang Wu,Weiheng Chi,Wenbo Zhu,Ziyue Su,Jay Wu

from arxiv, 14 pages, 6 figures

The proliferation of mobile devices and social media has revolutionized content dissemination, with short-form video becoming increasingly prevalent. This shift has introduced the challenge of video reframing to fit various screen aspect ratios, a process that highlights the most compelling parts of a video. Traditionally, video reframing is a manual, time-consuming task requiring professional expertise, which incurs high production costs. A potential solution is to adopt some machine learning models, such as video salient object detection, to automate the process. However, these methods often lack generalizability due to their reliance on specific training data. The advent of powerful large language models (LLMs) open new avenues for AI capabilities. Building on this, we introduce Reframe Any Video Agent (RAVA), a LLM-based agent that leverages visual foundation models and human instructions to restructure visual content for video reframing. RAVA operates in three stages: perception, where it interprets user instructions and video content; planning, where it determines aspect ratios and reframing strategies; and execution, where it invokes the editing tools to produce the final video. Our experiments validate the effectiveness of RAVA in video salient object detection and real-world reframing tasks, demonstrating its potential as a tool for AI-powered video editing.

Boosting（一種模型訓練加速方式） · 解碼 · 表示 · 基準 · Performer ·

2024 年 3 月 8 日

Boosting Neural Representations for Videos with a Conditional Decoder

Xinjie Zhang,Ren Yang,Dailan He,Xingtong Ge,Tongda Xu,Yan Wang,Hongwei Qin,Jun Zhang

from arxiv, Accept by CVPR 2024

Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving reconstruction loss, our approach successfully boosts multiple baseline INRs in the reconstruction quality and convergence speed for video regression, and exhibits superior inpainting and interpolation results. Further, we integrate a consistent entropy minimization technique and develop video codecs based on these boosted INRs. Experiments on the UVG dataset confirm that our enhanced codecs significantly outperform baseline INRs and offer competitive rate-distortion performance compared to traditional and learning-based codecs.

Extensibility · GM · MoDELS · 類別 · 多代理人模型 ·

2021 年 2 月 9 日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Lewis Hammond,James Fox,Tom Everitt,Alessandro Abate,Michael Wooldridge

from arxiv, Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.

圖 · 鏈路預測 · 正交 · 知識圖譜 · Better ·

2020 年 4 月 15 日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Yun Tang,Jing Huang,Guangtao Wang,Xiaodong He,Bowen Zhou

from arxiv, Accepted by ACL 2020

Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.

ConvNets · DAM · 特征空間 · 無監督 · 圖像分割 ·

2018 年 4 月 29 日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Qi Dou,Cheng Ouyang,Cheng Chen,Hao Chen,Pheng-Ann Heng

from arxiv, Accepted to IJCAI 2018

Convolutional networks (ConvNets) have achieved great successes in various challenging vision tasks. However, the performance of ConvNets would degrade when encountering the domain shift. The domain adaptation is more significant while challenging in the field of biomedical image analysis, where cross-modality data have largely different distributions. Given that annotating the medical data is especially expensive, the supervised transfer learning approaches are not quite optimal. In this paper, we propose an unsupervised domain adaptation framework with adversarial learning for cross-modality biomedical image segmentations. Specifically, our model is based on a dilated fully convolutional network for pixel-wise prediction. Moreover, we build a plug-and-play domain adaptation module (DAM) to map the target input to features which are aligned with source domain feature space. A domain critic module (DCM) is set up for discriminating the feature space of both domains. We optimize the DAM and DCM via an adversarial loss without using any target domain label. Our proposed method is validated by adapting a ConvNet trained with MRI images to unpaired CT data for cardiac structures segmentations, and achieved very promising results.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

state-of-the-art

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='fqj1k'></tfoot>

<legend id='fqj1k'><style id='fqj1k'><dir id='fqj1k'><q id='fqj1k'></q></dir></style></legend>

<i id='fqj1k'><tr id='fqj1k'><dt id='fqj1k'><q id='fqj1k'><span id='fqj1k'><b id='fqj1k'><form id='fqj1k'><ins id='fqj1k'></ins><ul id='fqj1k'></ul><sub id='fqj1k'></sub></form><legend id='fqj1k'></legend><bdo id='fqj1k'><pre id='fqj1k'><center id='fqj1k'></center></pre></bdo></b><th id='fqj1k'></th></span></q></dt></tr></i><div id='fqj1k'><tfoot id='fqj1k'></tfoot><dl id='fqj1k'><fieldset id='fqj1k'></fieldset></dl></div>