亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of generating intermediate images from image pairs with large motion while maintaining semantic consistency. Due to the large motion, the intermediate semantic information may be absent in input images. Existing methods either limit to small motion or focus on topologically similar objects, leading to artifacts and inconsistency in the interpolation results. To overcome this challenge, we delve into pre-trained image diffusion models for their capabilities in semantic cognition and representations, ensuring consistent expression of the absent intermediate semantic representations with the input. To this end, we propose DreamMover, a novel image interpolation framework with three main components: 1) A natural flow estimator based on the diffusion model that can implicitly reason about the semantic correspondence between two images. 2) To avoid the loss of detailed information during fusion, our key insight is to fuse information in two parts, high-level space and low-level space. 3) To enhance the consistency between the generated images and input, we propose the self-attention concatenation and replacement approach. Lastly, we present a challenging benchmark dataset InterpBench to evaluate the semantic consistency of generated results. Extensive experiments demonstrate the effectiveness of our method. Our project is available at //dreamm0ver.github.io .

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · Attention · Integration · · 優化器 ·
2024 年 10 月 30 日

We propose an effective method for inserting adapters into text-to-image foundation models, which enables the execution of complex downstream tasks while preserving the generalization ability of the base model. The core idea of this method is to optimize the attention mechanism related to 2D feature maps, which enhances the performance of the adapter. This approach was validated on the task of meme video generation and achieved significant results. We hope this work can provide insights for post-training tasks of large text-to-image models. Additionally, as this method demonstrates good compatibility with SD1.5 derivative models, it holds certain value for the open-source community. Therefore, we will release the related code (\url{//songkey.github.io/hellomeme}).

Image retargeting is the task of adjusting the aspect ratio of images to suit different display devices or presentation environments. However, existing retargeting methods often struggle to balance the preservation of key semantics and image quality, resulting in either deformation or loss of important objects, or the introduction of local artifacts such as discontinuous pixels and inconsistent regenerated content. To address these issues, we propose a content-aware retargeting method called PruneRepaint. It incorporates semantic importance for each pixel to guide the identification of regions that need to be pruned or preserved in order to maintain key semantics. Additionally, we introduce an adaptive repainting module that selects image regions for repainting based on the distribution of pruned pixels and the proportion between foreground size and target aspect ratio, thus achieving local smoothness after pruning. By focusing on the content and structure of the foreground, our PruneRepaint approach adaptively avoids key content loss and deformation, while effectively mitigating artifacts with local repainting. We conduct experiments on the public RetargetMe benchmark and demonstrate through objective experimental results and subjective user studies that our method outperforms previous approaches in terms of preserving semantics and aesthetics, as well as better generalization across diverse aspect ratios. Codes will be available at //github.com/fhshen2022/PruneRepaint.

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads. These theoretical insights are validated experimentally and offer natural suggestions for alternative architectures.

We present a type of epistemic logics that encapsulates both the dynamics of acquiring knowledge (knowing) and losing information (forgetting), alongside the integration of group knowledge concepts. Our approach is underpinned by a system of weighted models, which introduces an "epistemic skills" metric to effectively represent the epistemic abilities associated with knowledge update. In this framework, the acquisition of knowledge is modeled as a result of upskilling, whereas forgetting is by downskilling. Additionally, our framework allows us to explore the concept of "knowability," which can be defined as the potential to acquire knowledge through upskilling, and facilitates a nuanced understanding of the distinctions between epistemic de re and de dicto expressions. We study the computational complexity of model checking problems for these logics, providing insights into both the theoretical underpinnings and practical implications of our approach.

Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the generated images unreliable. Meanwhile, a single model tends to specialize in particular tasks and possess the corresponding capabilities, making it inadequate for fulfilling all user requirements. We propose GenArtist, a unified image generation and editing system, coordinated by a multimodal large language model (MLLM) agent. We integrate a comprehensive range of existing models into the tool library and utilize the agent for tool selection and execution. For a complex problem, the MLLM agent decomposes it into simpler sub-problems and constructs a tree structure to systematically plan the procedure of generation, editing, and self-correction with step-by-step verification. By automatically generating missing position-related inputs and incorporating position information, the appropriate tool can be effectively employed to address each sub-problem. Experiments demonstrate that GenArtist can perform various generation and editing tasks, achieving state-of-the-art performance and surpassing existing models such as SDXL and DALL-E 3, as can be seen in Fig. 1. Project page is //zhenyuw16.github.io/GenArtist_page.

Text-to-image models are known to propagate social biases. For example, when prompted to generate images of people in certain professions, these models tend to systematically generate specific genders or ethnicities. In this paper, we show that this bias is already present in the text encoder of the model and introduce a Mixture-of-Experts approach by identifying text-encoded bias in the latent space and then creating a Bias-Identification Gate mechanism. More specifically, we propose MoESD (Mixture of Experts Stable Diffusion) with BiAs (Bias Adapters) to mitigate gender bias in text-to-image models. We also demonstrate that introducing an arbitrary special token to the prompt is essential during the mitigation process. With experiments focusing on gender bias, we show that our approach successfully mitigates gender bias while maintaining image quality.

Fundus image classification is crucial in the computer aided diagnosis tasks, but label noise significantly impairs the performance of deep neural networks. To address this challenge, we propose a robust framework, Self-Supervised Pre-training with Robust Adaptive Credal Loss (SSP-RACL), for handling label noise in fundus image datasets. First, we use Masked Autoencoders (MAE) for pre-training to extract features, unaffected by label noise. Subsequently, RACL employ a superset learning framework, setting confidence thresholds and adaptive label relaxation parameter to construct possibility distributions and provide more reliable ground-truth estimates, thus effectively suppressing the memorization effect. Additionally, we introduce clinical knowledge-based asymmetric noise generation to simulate real-world noisy fundus image datasets. Experimental results demonstrate that our proposed method outperforms existing approaches in handling label noise, showing superior performance.

This study explores the integration of auditory and tactile experiences in musical haptics, focusing on enhancing sensory dimensions of music through touch. Addressing the gap in translating auditory signals to meaningful tactile feedback, our research introduces a novel method involving a touch-sensitive recorder and a wearable haptic display that captures musical interactions via force sensors and converts these into tactile sensations. Previous studies have shown the potential of haptic feedback to enhance musical expressivity, yet challenges remain in conveying complex musical nuances. Our method aims to expand music accessibility for individuals with hearing impairments and deepen digital musical interactions. Experimental results reveal high accuracy ($98\%$ without noise, 93% with white noise) in melody recognition through tactile feedback, demonstrating effective transmission and perception of musical information. The findings highlight the potential of haptic technology to bridge sensory gaps, offering significant implications for music therapy, education, and remote musical collaboration, advancing the field of musical haptics and multi-sensory technology applications.

Neural compression has the potential to revolutionize lossy image compression. Based on generative models, recent schemes achieve unprecedented compression rates at high perceptual quality but compromise semantic fidelity. Details of decompressed images may appear optically flawless but semantically different from the originals, making compression errors difficult or impossible to detect. We explore the problem space and propose a provisional taxonomy of miscompressions. It defines three types of 'what happens' and has a binary 'high impact' flag indicating miscompressions that alter symbols. We discuss how the taxonomy can facilitate risk communication and research into mitigations.

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.

北京阿比特科技有限公司