云南虫谷在线观看免费观看电视剧,黄色视频在线观看男人插女人的视频在线观看,狼友在线视频网站,亚洲午夜在线视频

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

相關內容

圖像還原

關注 0

多峰值 · 語言模型化 · 大語言模型 · INFORMS · 查準率/準確率 ·

2024 年 3 月 6 日

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

Guozhang Li,Xinpeng Ding,De Cheng,Jie Li,Nannan Wang,Xinbo Gao

Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicit-supervision methods, i.e., generating pseudo-temporal boundaries for training, have achieved great success. However, data augmentations in these methods might disrupt critical temporal information, yielding poor pseudo boundaries. In this paper, we propose a new perspective that maintains the integrity of the original temporal content while introducing more valuable information for expanding the incomplete boundaries. To this end, we propose EtC (Expand then Clarify), first use the additional information to expand the initial incomplete pseudo boundaries, and subsequently refine these expanded ones to achieve precise boundaries. Motivated by video continuity, i.e., visual similarity across adjacent frames, we use powerful multimodal large language models (MLLMs) to annotate each frame within initial pseudo boundaries, yielding more comprehensive descriptions for expanded boundaries. To further clarify the noise of expanded boundaries, we combine mutual learning with a tailored proposal-level contrastive objective to use a learnable approach to harmonize a balance between incomplete yet clean (initial) and comprehensive yet noisy (expanded) boundaries for more precise ones. Experiments demonstrate the superiority of our method on two challenging WSVG datasets.

控制器 · MoDELS · Attention · Integration · 多樣性 ·

2024 年 3 月 5 日

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

Xuweiyi Chen,Tian Xia,Sihan Xu

from arxiv, Github: //github.com/XuweiyiChen/UniCtrl Website: //unified-attention-control.github.io/

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. Despite the progress, ensuring consistency across frames remains a challenge, particularly when using text prompts as control conditions. To address this problem, we introduce UniCtrl, a novel, plug-and-play method that is universally applicable to improve the spatiotemporal consistency and motion diversity of videos generated by text-to-video models without additional training. UniCtrl ensures semantic consistency across different frames through cross-frame self-attention control, and meanwhile, enhances the motion quality and spatiotemporal consistency through motion injection and spatiotemporal synchronization. Our experimental results demonstrate UniCtrl's efficacy in enhancing various text-to-video models, confirming its effectiveness and universality.

知識 (knowledge) · CASES · Continuity · CRAFT · Integration ·

2024 年 3 月 5 日

Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context

Somnath Banerjee,Amruit Sahoo,Sayan Layek,Avik Dutta,Rima Hazra,Animesh Mukherjee

In the continuously advancing AI landscape, crafting context-rich and meaningful responses via Large Language Models (LLMs) is essential. Researchers are becoming more aware of the challenges that LLMs with fewer parameters encounter when trying to provide suitable answers to open-ended questions. To address these hurdles, the integration of cutting-edge strategies, augmentation of rich external domain knowledge to LLMs, offers significant improvements. This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement, honing the proficiency of LLMs, especially in domain specific community question answering platforms like AskUbuntu, Unix, and ServerFault. We conduct experiments on various LLMs with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions. Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases. This advancement highlights the importance of pairing context rich data retrieval with LLMs, offering a renewed approach to knowledge sourcing and generation in AI systems. We also show that, due to rich contextual data retrieval, the crucial entities, along with the generated answer, remain factually coherent with the gold answer.

Automator · 大語言模型 · 語言模型化 · 專利 · MoDELS ·

2024 年 3 月 5 日

From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models

Jung-Mei Chu,Hao-Cheng Lo,Jieh Hsiang,Chun-Chieh Cho

from arxiv, 28 pages, 5 figures, typos corrected, references added, under review

In patent prosecution, timely and effective responses to Office Actions (OAs) are crucial for securing patents. However, past automation and artificial intelligence research have largely overlooked this aspect. To bridge this gap, our study introduces the Patent Office Action Response Intelligence System (PARIS) and its advanced version, the Large Language Model (LLM) Enhanced PARIS (LE-PARIS). These systems are designed to enhance the efficiency of patent attorneys in handling OA responses through collaboration with AI. The systems' key features include the construction of an OA Topics Database, development of Response Templates, and implementation of Recommender Systems and LLM-based Response Generation. To validate the effectiveness of the systems, we have employed a multi-paradigm analysis using the USPTO Office Action database and longitudinal data based on attorney interactions with our systems over six years. Through five studies, we have examined the constructiveness of OA topics (studies 1 and 2) using topic modeling and our proposed Delphi process, the efficacy of our proposed hybrid LLM-based recommender system tailored for OA responses (study 3), the quality of generated responses (study 4), and the systems' practical value in real-world scenarios through user studies (study 5). The results indicate that both PARIS and LE-PARIS significantly achieve key metrics and have a positive impact on attorney performance.

Processing（編程語言） · Stream Processing · 優化器 · 流 · Performer ·

2024 年 3 月 4 日

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Morgan Geldenhuys,Dominik Scheinert,Odej Kao,Lauritz Thamsen

from arxiv, 12 pages, 14 figures, published at ICPE 2024

Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is especially true in environments where workloads change over time and node failures are all but inevitable. Furthermore, configuration parameters such as memory allocation and checkpointing intervals impact performance and resource usage as well. Sub-optimal configurations easily lead to high operational costs, poor performance, or unacceptable loss of service. In this paper, we present Demeter, a method for dynamically optimizing key DSP system configuration parameters for resource efficiency. Demeter uses Time Series Forecasting to predict future workloads and Multi-Objective Bayesian Optimization to model runtime behaviors in relation to parameter settings and workload rates. Together, these techniques allow us to determine whether or not enough is known about the predicted workload rate to proactively initiate short-lived parallel profiling runs for data gathering. Once trained, the models guide the adjustment of multiple, potentially dependent system configuration parameters ensuring optimized performance and resource usage in response to changing workload rates. Our experiments on a commodity cluster using Apache Flink demonstrate that Demeter significantly improves the operational efficiency of long-running benchmark jobs.

樣本 · FAST · 采樣法 · MoDELS · Analysis ·

2024 年 3 月 4 日

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

Shuchen Xue,Mingyang Yi,Weijian Luo,Shifeng Zhang,Jiacheng Sun,Zhenguo Li,Zhi-Ming Ma

from arxiv, Accepted in Neurips 2023

Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks. As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed. The majority of such techniques consider solving the diffusion ODE due to its superior efficiency. However, stochastic sampling could offer additional advantages in generating diverse and high-quality data. In this work, we engage in a comprehensive analysis of stochastic sampling from two aspects: variance-controlled diffusion SDE and linear multi-step SDE solver. Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality. Our experiments show that SA-Solver achieves: 1) improved or comparable performance compared with the existing state-of-the-art sampling methods for few-step sampling; 2) SOTA FID scores on substantial benchmark datasets under a suitable number of function evaluations (NFEs).

有偏 · CASE · 語言模型化 · MoDELS · AI ·

2024 年 3 月 4 日

AI Language Models Could Both Help and Harm Equity in Marine Policymaking: The Case Study of the BBNJ Question-Answering Bot

Matt Ziegler,Sarah Lothian,Brian O'Neill,Richard Anderson,Yoshitaka Ota

AI Large Language Models (LLMs) like ChatGPT are set to reshape some aspects of policymaking processes. Policy practitioners are already using ChatGPT for help with a variety of tasks: from drafting statements, submissions, and presentations, to conducting background research. We are cautiously hopeful that LLMs could be used to promote a marginally more balanced footing among decision makers in policy negotiations by assisting with certain tedious work, particularly benefiting developing countries who face capacity constraints that put them at a disadvantage in negotiations. However, the risks are particularly concerning for environmental and marine policy uses, due to the urgency of crises like climate change, high uncertainty, and trans-boundary impact. To explore the realistic potentials, limitations, and equity risks for LLMs in marine policymaking, we present a case study of an AI chatbot for the recently adopted Biodiversity Beyond National Jurisdiction Agreement (BBNJ), and critique its answers to key policy questions. Our case study demonstrates the dangers of LLMs in marine policymaking via their potential bias towards generating text that favors the perspectives of mainly Western economic centers of power, while neglecting developing countries' viewpoints. We describe several ways these biases can enter the system, including: (1) biases in the underlying foundational language models; (2) biases arising from the chatbot's connection to UN negotiation documents, and (3) biases arising from the application design. We urge caution in the use of generative AI in ocean policy processes and call for more research on its equity and fairness implications. Our work also underscores the need for developing countries' policymakers to develop the technical capacity to engage with AI on their own terms.

共軛梯度 · 控制器 · 共軛 · 線性的 · Performer ·

2024 年 3 月 3 日

MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU

Emre Adabag,Miloni Atal,William Gerard,Brian Plancher

from arxiv, Accepted to ICRA 2024, 8 pages, 6 figures

Nonlinear Model Predictive Control (NMPC) is a state-of-the-art approach for locomotion and manipulation which leverages trajectory optimization at each control step. While the performance of this approach is computationally bounded, implementations of direct trajectory optimization that use iterative methods to solve the underlying moderately-large and sparse linear systems, are a natural fit for parallel hardware acceleration. In this work, we introduce MPCGPU, a GPU-accelerated, real-time NMPC solver that leverages an accelerated preconditioned conjugate gradient (PCG) linear system solver at its core. We show that MPCGPU increases the scalability and real-time performance of NMPC, solving larger problems, at faster rates. In particular, for tracking tasks using the Kuka IIWA manipulator, MPCGPU is able to scale to kilohertz control rates with trajectories as long as 512 knot points. This is driven by a custom PCG solver which outperforms state-of-the-art, CPU-based, linear system solvers by at least 10x for a majority of solves and 3.6x on average.

INTERACT · UWB · Performer · 磁流變材料 · 可約的 ·

2024 年 3 月 3 日

Never Tell the Trick: Covert Interactive Mixed Reality System for Immersive Storytelling

Chanwoo Lee,Kyubeom Shim,Sanggyo Seo,Gwonu Ryu,Yongsoon Choi

from arxiv, To be presented in IEEE VR 2024

This study explores the integration of Ultra-Wideband (UWB) technology into Mixed Reality (MR) Systems for immersive storytelling. Addressing the limitations of existing technologies like Microsoft Kinect and HTC Vive, the research focuses on overcoming challenges in robustness to occlusion, tracking volume, and cost efficiency in props tracking. Utilizing UWB technology, the interactive MR system enhances the scope of performance art by enabling larger tracking areas, more reliable and cheaper multi-prop tracking, and reducing occlusion issues. Preliminary user tests suggest meaningful improvements in immersive experience, promising a new possibility in Extended Reality (XR) theater, performance art and immersive game.

MoDELS · ChatGPT · BERT · 語言模型化 · 變換 ·

2023 年 2 月 18 日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou,Qian Li,Chen Li,Jun Yu,Yixin Liu,Guangjing Wang,Kai Zhang,Cheng Ji,Qiben Yan,Lifang He,Hao Peng,Jianxin Li,Jia Wu,Ziwei Liu,Pengtao Xie,Caiming Xiong,Jian Pei,Philip S. Yu,Lichao Sun

from arxiv, 97 pages, 16 figures

The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.