亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tfoot id='tw3nq'></tfoot>

<legend id='tw3nq'><style id='tw3nq'><dir id='tw3nq'><q id='tw3nq'></q></dir></style></legend>

<i id='tw3nq'><tr id='tw3nq'><dt id='tw3nq'><q id='tw3nq'><span id='tw3nq'><b id='tw3nq'><form id='tw3nq'><ins id='tw3nq'></ins><ul id='tw3nq'></ul><sub id='tw3nq'></sub></form><legend id='tw3nq'></legend><bdo id='tw3nq'><pre id='tw3nq'><center id='tw3nq'></center></pre></bdo></b><th id='tw3nq'></th></span></q></dt></tr></i><div id='tw3nq'><tfoot id='tw3nq'></tfoot><dl id='tw3nq'><fieldset id='tw3nq'></fieldset></dl></div>

·

Taxonomy · 逼真度 · binary · MoDELS · 模式識別 ·

2024 年 10 月 15 日

A Taxonomy of Miscompressions: Preparing Image Forensics for Neural Compression

Nora Hofer,Rainer B?hme

from arxiv, 6 pages, 6 figures

Neural compression has the potential to revolutionize lossy image compression. Based on generative models, recent schemes achieve unprecedented compression rates at high perceptual quality but compromise semantic fidelity. Details of decompressed images may appear optically flawless but semantically different from the originals, making compression errors difficult or impossible to detect. We explore the problem space and propose a provisional taxonomy of miscompressions. It defines three types of 'what happens' and has a binary 'high impact' flag indicating miscompressions that alter symbols. We discuss how the taxonomy can facilitate risk communication and research into mitigations.

相關內容

Taxonomy

分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)學是(shi)(shi)(shi)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)的(de)(de)(de)(de)實踐和(he)科學。Wikipedia類(lei)(lei)別(bie)說(shuo)明(ming)了(le)(le)一種(zhong)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)，可(ke)以通過(guo)自動方(fang)式提取(qu)Wikipedia類(lei)(lei)別(bie)的(de)(de)(de)(de)完整分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)。截至(zhi)2009年，已經證明(ming)，可(ke)以使用(yong)(yong)(yong)人工構(gou)建的(de)(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)（例如(ru)像WordNet這(zhe)(zhe)樣的(de)(de)(de)(de)計算詞典的(de)(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)）來(lai)改進和(he)重組Wikipedia類(lei)(lei)別(bie)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)。從廣(guang)義(yi)上講，分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)還(huan)適用(yong)(yong)(yong)于除父(fu)子層次結(jie)構(gou)以外(wai)的(de)(de)(de)(de)關系(xi)方(fang)案，例如(ru)網絡結(jie)構(gou)。然(ran)后分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)可(ke)能(neng)包(bao)括有多父(fu)母的(de)(de)(de)(de)單(dan)身孩子，例如(ru)，“汽車”可(ke)能(neng)與(yu)父(fu)母雙方(fang)一起出現“車輛(liang)”和(he)“鋼結(jie)構(gou)”；但是(shi)(shi)(shi)對(dui)(dui)某些人而言(yan)，這(zhe)(zhe)僅意味著“汽車”是(shi)(shi)(shi)幾種(zhong)不同分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)的(de)(de)(de)(de)一部(bu)分(fen)(fen)(fen)(fen)(fen)(fen)。分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)也可(ke)能(neng)只(zhi)是(shi)(shi)(shi)將事(shi)物組織成組，或者是(shi)(shi)(shi)按字母順序排列的(de)(de)(de)(de)列表；但是(shi)(shi)(shi)在這(zhe)(zhe)里(li)，術語詞匯更合適。在知識管理中的(de)(de)(de)(de)當(dang)前用(yong)(yong)(yong)法(fa)(fa)中，分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)被認為(wei)比本(ben)體(ti)論窄，因為(wei)本(ben)體(ti)論應(ying)用(yong)(yong)(yong)了(le)(le)各(ge)種(zhong)各(ge)樣的(de)(de)(de)(de)關系(xi)類(lei)(lei)型。在數(shu)學上，分(fen)(fen)(fen)(fen)(fen)(fen)層分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)法(fa)(fa)是(shi)(shi)(shi)給定對(dui)(dui)象(xiang)集(ji)的(de)(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)樹結(jie)構(gou)。該結(jie)構(gou)的(de)(de)(de)(de)頂部(bu)是(shi)(shi)(shi)適用(yong)(yong)(yong)于所有對(dui)(dui)象(xiang)的(de)(de)(de)(de)單(dan)個分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)，即根(gen)節(jie)點(dian)。此根(gen)下的(de)(de)(de)(de)節(jie)點(dian)是(shi)(shi)(shi)更具體(ti)的(de)(de)(de)(de)分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)，適用(yong)(yong)(yong)于總分(fen)(fen)(fen)(fen)(fen)(fen)類(lei)(lei)對(dui)(dui)象(xiang)集(ji)的(de)(de)(de)(de)子集(ji)。推理的(de)(de)(de)(de)進展從一般到更具體(ti)。

知識薈萃

精品入門和(he)進階教(jiao)程、論文和(he)代碼(ma)整(zheng)理等

更多

查看(kan)相(xiang)關VIP內容、論文、資訊等

噪聲 · Guidance · MoDELS · Prompt · 有向 ·

2024 年 12 月 6 日

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

Ruoyu Wang,Huayang Huang,Ye Zhu,Olga Russakovsky,Yu Wu

from arxiv, 18 pages, 18 figures, 6 tables

Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models. In the conventional setup, the text prompt provides explicit, user-defined guidance, directing the generation process by denoising a randomly sampled Gaussian noise. In this work, we reveal that the often-overlooked noise itself encodes inherent generative tendencies, acting as a "silent prompt" that implicitly guides the output. This implicit guidance, embedded in the noise scheduler design of diffusion model formulations and their training stages, generalizes across a wide range of T2I models and backbones. Building on this insight, we introduce NoiseQuery, a novel strategy that selects optimal initial noise from a pre-built noise library to meet diverse user needs. Our approach not only enhances high-level semantic alignment with text prompts, but also allows for nuanced adjustments of low-level visual attributes, such as texture, sharpness, shape, and color, which are typically challenging to control through text alone. Extensive experiments across various models and target attributes demonstrate the strong performance and zero-shot transferability of our approach, requiring no additional optimization.

Continuity · Processing（編程語言） · MoDELS · 樣本 · 步幅 ·

2024 年 12 月 6 日

Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction

Gaurav Shrivastava,Abhinav Shrivastava

from arxiv, Navigate to the project page //www.cs.umd.edu/~gauravsh/cvp/supp/website.html for video results. Extended version of published CVPR paper

Diffusion models have made significant strides in image generation, mastering tasks such as unconditional image synthesis, text-image translation, and image-to-image conversions. However, their capability falls short in the realm of video prediction, mainly because they treat videos as a collection of independent images, relying on external constraints such as temporal attention mechanisms to enforce temporal coherence. In our paper, we introduce a novel model class, that treats video as a continuous multi-dimensional process rather than a series of discrete frames. We also report a reduction of 75\% sampling steps required to sample a new frame thus making our framework more efficient during the inference time. Through extensive experimentation, we establish state-of-the-art performance in video prediction, validated on benchmark datasets including KTH, BAIR, Human3.6M, and UCF101. Navigate to the project page //www.cs.umd.edu/~gauravsh/cvp/supp/website.html for video results.}

MoDELS · Prompt · 大語言模型 · 語言模型化 · 評論員 ·

2024 年 12 月 5 日

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment

Jason Vega,Junsheng Huang,Gaokai Zhang,Hangoo Kang,Minjia Zhang,Gagandeep Singh

from arxiv, v2: Updated with changes from peer review rebuttal. v1: Version under peer review

Safety alignment of Large Language Models (LLMs) has recently become a critical objective of model developers. In response, a growing body of work has been investigating how safety alignment can be bypassed through various jailbreaking methods, such as adversarial attacks. However, these jailbreak methods can be rather costly or involve a non-trivial amount of creativity and effort, introducing the assumption that malicious users are high-resource or sophisticated. In this paper, we study how simple random augmentations to the input prompt affect safety alignment effectiveness in state-of-the-art LLMs, such as Llama 3 and Qwen 2. We perform an in-depth evaluation of 17 different models and investigate the intersection of safety under random augmentations with multiple dimensions: augmentation type, model size, quantization, fine-tuning-based defenses, and decoding strategies (e.g., sampling temperature). We show that low-resource and unsophisticated attackers, i.e. $\textit{stochastic monkeys}$, can significantly improve their chances of bypassing alignment with just 25 random augmentations per prompt. Source code and data: //github.com/uiuc-focal-lab/stochastic-monkeys/

Weight · Guidance · Analysis · MoDELS · Processing（編程語言） ·

2024 年 12 月 4 日

Analysis of Classifier-Free Guidance Weight Schedulers

Xi Wang,Nicolas Dufour,Nefeli Andreou,Marie-Paule Cani,Victoria Fernandez Abrevaya,David Picard,Vicky Kalogeiton

Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models. It operates by combining the conditional and unconditional predictions using a fixed weight. However, recent works vary the weights throughout the diffusion process, reporting superior results but without providing any rationale or analysis. By conducting comprehensive experiments, this paper provides insights into CFG weight schedulers. Our findings suggest that simple, monotonically increasing weight schedulers consistently lead to improved performances, requiring merely a single line of code. In addition, more complex parametrized schedulers can be optimized for further improvement, but do not generalize across different models and tasks.

MoDELS · 特征提取 · Processing（編程語言） · Extensibility · 多樣性 ·

2024 年 12 月 4 日

DIVE: Taming DINO for Subject-Driven Video Editing

Yi Huang,Wei Xiong,He Zhang,Chaoqi Chen,Jianzhuang Liu,Mingfu Yan,Shifeng Chen

Building on the success of diffusion models in image generation and editing, video editing has recently gained substantial attention. However, maintaining temporal consistency and motion alignment still remains challenging. To address these issues, this paper proposes DINO-guided Video Editing (DIVE), a framework designed to facilitate subject-driven editing in source videos conditioned on either target text prompts or reference images with specific identities. The core of DIVE lies in leveraging the powerful semantic features extracted from a pretrained DINOv2 model as implicit correspondences to guide the editing process. Specifically, to ensure temporal motion consistency, DIVE employs DINO features to align with the motion trajectory of the source video. Extensive experiments on diverse real-world videos demonstrate that our framework can achieve high-quality editing results with robust motion consistency, highlighting the potential of DINO to contribute to video editing. For precise subject editing, DIVE incorporates the DINO features of reference images into a pretrained text-to-image model to learn Low-Rank Adaptations (LoRAs), effectively registering the target subject's identity. Project page: //dino-video-editing.github.io

控制器 · 相關系數 · INFORMS · 優化器 · 評論員 ·

2024 年 12 月 4 日

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Anqi Li,Feng Li,Yuxi Liu,Runmin Cong,Yao Zhao,Huihui Bai

Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, termed Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. Control-GIC is grounded in a VQGAN framework that encodes an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Drawing inspiration from the classical coding principle, we correlate the information density of local image patches with their granular representations. Hence, we can flexibly determine a proper allocation of granularity for the patches to achieve dynamic adjustment for VQ-indices, resulting in desirable compression rates. We further develop a probabilistic conditional decoder capable of retrieving historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption where the results demonstrate its superior performance over recent state-of-the-art methods.

MoDELS · Guidance · Seven · Continuity · Performer ·

2023 年 8 月 10 日

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Yang Liu,Yuanshun Yao,Jean-Francois Ton,Xiaoying Zhang,Ruocheng Guo Hao Cheng,Yegor Klochkov,Muhammad Faaiz Taufiq,Hang Li

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.

Pyramid · MoDELS · Extensibility · state-of-the-art · Performer ·

2022 年 12 月 1 日

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Wan-Cyuan Fan,Yen-Chun Chen,Dongdong Chen,Yu Cheng,Lu Yuan,Yu-Chiang Frank Wang

from arxiv, AAAI 2023

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.

視覺問答 · 自動問答 · MoDELS · 可辨認的 · 注意力機制 ·

2018 年 2 月 15 日

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang,Jonathon Hare,Adam Prügel-Bennett

from arxiv, Published in ICLR 2018

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

圖像分割 · 代價 · Performer · SCAN · Better ·

2018 年 1 月 31 日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Marc Bosch,Christopher M. Gifford,Austin G. Dress,Clare W. Lau,Jeffrey G. Skibo,Gordon A. Christie

from arxiv, Accepted BMVC 17

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<tfoot id='tw3nq'></tfoot>

<legend id='tw3nq'><style id='tw3nq'><dir id='tw3nq'><q id='tw3nq'></q></dir></style></legend>

<i id='tw3nq'><tr id='tw3nq'><dt id='tw3nq'><q id='tw3nq'><span id='tw3nq'><b id='tw3nq'><form id='tw3nq'><ins id='tw3nq'></ins><ul id='tw3nq'></ul><sub id='tw3nq'></sub></form><legend id='tw3nq'></legend><bdo id='tw3nq'><pre id='tw3nq'><center id='tw3nq'></center></pre></bdo></b><th id='tw3nq'></th></span></q></dt></tr></i><div id='tw3nq'><tfoot id='tw3nq'></tfoot><dl id='tw3nq'><fieldset id='tw3nq'></fieldset></dl></div>