亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Video prediction, a fundamental task in computer vision, aims to enable models to generate sequences of future frames based on existing video content. This task has garnered widespread application across various domains. In this paper, we comprehensively survey both historical and contemporary works in this field, encompassing the most widely used datasets and algorithms. Our survey scrutinizes the challenges and evolving landscape of video prediction within the realm of computer vision. We propose a novel taxonomy centered on the stochastic nature of video prediction algorithms. This taxonomy accentuates the gradual transition from deterministic to generative prediction methodologies, underlining significant advancements and shifts in approach.

相關內容

分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)學(xue)是(shi)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)的(de)(de)(de)實踐和科學(xue)。Wikipedia類(lei)(lei)(lei)(lei)別說明了一(yi)(yi)種(zhong)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa),可以(yi)通過自動方(fang)式(shi)提取Wikipedia類(lei)(lei)(lei)(lei)別的(de)(de)(de)完整分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)。截至2009年,已經證(zheng)明,可以(yi)使(shi)用(yong)人(ren)工構建的(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)(例(li)如像WordNet這樣的(de)(de)(de)計算(suan)詞(ci)典(dian)的(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa))來改進(jin)和重組Wikipedia類(lei)(lei)(lei)(lei)別分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)。 從廣義上講,分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)還適(shi)(shi)用(yong)于(yu)除父子層次結(jie)(jie)(jie)構以(yi)外的(de)(de)(de)關系方(fang)案,例(li)如網絡結(jie)(jie)(jie)構。然后分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)可能(neng)包括有多父母的(de)(de)(de)單(dan)身孩子,例(li)如,“汽(qi)(qi)車”可能(neng)與(yu)父母雙方(fang)一(yi)(yi)起(qi)出現“車輛”和“鋼結(jie)(jie)(jie)構”;但是(shi)對某些人(ren)而言,這僅意味著“汽(qi)(qi)車”是(shi)幾種(zhong)不(bu)同分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)的(de)(de)(de)一(yi)(yi)部(bu)分(fen)(fen)(fen)(fen)(fen)。分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)也可能(neng)只是(shi)將事物組織成組,或者是(shi)按字母順序排列(lie)的(de)(de)(de)列(lie)表(biao);但是(shi)在(zai)(zai)這里,術語詞(ci)匯更合適(shi)(shi)。在(zai)(zai)知識管理中(zhong)的(de)(de)(de)當(dang)前用(yong)法(fa)(fa)中(zhong),分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)被認為比本體(ti)論(lun)窄(zhai),因為本體(ti)論(lun)應用(yong)了各(ge)種(zhong)各(ge)樣的(de)(de)(de)關系類(lei)(lei)(lei)(lei)型。 在(zai)(zai)數(shu)學(xue)上,分(fen)(fen)(fen)(fen)(fen)層分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)法(fa)(fa)是(shi)給(gei)定對象集的(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)樹結(jie)(jie)(jie)構。該結(jie)(jie)(jie)構的(de)(de)(de)頂部(bu)是(shi)適(shi)(shi)用(yong)于(yu)所有對象的(de)(de)(de)單(dan)個分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei),即根節點。此根下的(de)(de)(de)節點是(shi)更具(ju)(ju)體(ti)的(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei),適(shi)(shi)用(yong)于(yu)總分(fen)(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)對象集的(de)(de)(de)子集。推(tui)理的(de)(de)(de)進(jin)展從一(yi)(yi)般(ban)到更具(ju)(ju)體(ti)。

知識薈萃

精品入門和進(jin)階教程、論文和代碼整理等(deng)

更多

查看相關(guan)VIP內容、論文、資(zi)訊(xun)等(deng)

We introduce a new approach using computer vision to predict the land surface displacement from subsurface geometry images for Carbon Capture and Sequestration (CCS). CCS has been proved to be a key component for a carbon neutral society. However, scientists see there are challenges along the way including the high computational cost due to the large model scale and limitations to generalize a pre-trained model with complex physics. We tackle those challenges by training models directly from the subsurface geometry images. The goal is to understand the respons of land surface displacement due to carbon injection and utilize our trained models to inform decision making in CCS projects. We implement multiple models (CNN, ResNet, and ResNetUNet) for static mechanics problem, which is a image prediction problem. Next, we use the LSTM and transformer for transient mechanics scenario, which is a video prediction problem. It shows ResNetUNet outperforms the others thanks to its architecture in static mechanics problem, and LSTM shows comparable performance to transformer in transient problem. This report proceeds by outlining our dataset in detail followed by model descriptions in method section. Result and discussion state the key learning, observations, and conclusion with future work rounds out the paper.

Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their memory consumption, which increases quadratically with the length of the sequence. This limitation presents significant challenges when attempting to generate longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs). SSMs have recently gained attention as viable alternatives due to their linear memory consumption relative to sequence length. In the experiments, we first evaluate our SSM-based model with UCF101, a standard benchmark of video generation. In addition, to investigate the potential of SSMs for longer video generation, we perform an experiment using the MineRL Navigate dataset, varying the number of frames to 64 and 150. In these settings, our SSM-based model can considerably save memory consumption for longer sequences, while maintaining competitive FVD scores to the attention-based models. Our codes are available at //github.com/shim0114/SSM-Meets-Video-Diffusion-Models.

This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.

In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. Experience-Replay methods store a subset of the old images for joint training. In the scenario of more strict privacy protection, storing the old images becomes infeasible, which leads to a more severe plasticity-stability dilemma and classifier bias. To meet the above challenges, we propose a new architecture, named continual expansion and absorption transformer~(CEAT). The model can learn the novel knowledge by extending the expanded-fusion layers in parallel with the frozen previous parameters. After the task ends, we losslessly absorb the extended parameters into the backbone to ensure that the number of parameters remains constant. To improve the learning ability of the model, we designed a novel prototype contrastive loss to reduce the overlap between old and new classes in the feature space. Besides, to address the classifier bias towards the new classes, we propose a novel approach to generate the pseudo-features to correct the classifier. We experiment with our methods on three standard Non-Exemplar Class-Incremental Learning~(NECIL) benchmarks. Extensive experiments demonstrate that our model gets a significant improvement compared with the previous works and achieves 5.38%, 5.20%, and 4.92% improvement on CIFAR-100, TinyImageNet, and ImageNet-Subset.

In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.

Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical for both academic research and industrial applications. Moreover, insightful views on the opportunities and challenges of efficiency are also highly required for the entire community. While general surveys on the efficiency issue of DNNs have been done from various perspectives, as far as we are aware, scarcely any of them focused on visual recognition systematically, and thus it is unclear which progresses are applicable to it and what else should be concerned. In this paper, we present the review of the recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related visual recognition approaches. We investigate not only from the model but also the data point of view (which is not the case in existing surveys), and focus on three most studied data types (images, videos and points). This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems.

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).

Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.

We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.

北京阿比特科技有限公司