亚洲AV永久无码精品九之_久久久久久久久国产精品网站_日韩欧美精品人妻一区二区不卡_亚洲一区国产一区欧美一区_成人午夜精品无码区久久含羞草_亚洲精品PPP在线观看_国产一级婬AAA毛片久久

The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.

相關內容

變分自編碼

關注 51

可約的 · 哈希學習 · 泛函 · 樣本 · 值域 ·

2023 年 10 月 23 日

Bipartite ShockHash: Pruning ShockHash Search for Efficient Perfect Hashing

Hans-Peter Lehmann,Peter Sanders,Stefan Walzer

A minimal perfect hash function (MPHF) maps a set of n keys to the first n integers without collisions. Representing this bijection needs at least $\log_2(e) \approx 1.443$ bits per key, and there is a wide range of practical implementations achieving about 2 bits per key. Minimal perfect hashing is a key ingredient in many compact data structures such as updatable retrieval data structures and approximate membership data structures. A simple implementation reaching the space lower bound is to sample random hash functions using brute-force, which needs about $e^n \approx 2.718^n$ tries in expectation. ShockHash recently reduced that to about $(e/2)^n \approx 1.359^n$ tries in expectation by sampling random graphs. With bipartite ShockHash, we now sample random bipartite graphs. In this paper, we describe the general algorithmic ideas of bipartite ShockHash and give an experimental evaluation. The key insight is that we can try all combinations of two hash functions, each mapping into one half of the output range. This reduces the number of sampled hash functions to only about $(\sqrt{e/2})^n \approx 1.166^n$ in expectation. In itself, this does not reduce the asymptotic running time much because all combinations still need to be tested. However, by filtering the candidates before combining them, we can reduce this to less than $1.175^n$ combinations in expectation. Our implementation of bipartite ShockHash is up to 3 orders of magnitude faster than original ShockHash. Inside the RecSplit framework, bipartite ShockHash-RS enables significantly larger base cases, leading to a construction that is, depending on the allotted space budget, up to 20 times faster. In our most extreme configuration, ShockHash-RS can build an MPHF for 10 million keys with 1.489 bits per key (within 3.3% of the lower bound) in about half an hour, pushing the limits of what is possible.

置換 · 邊緣計算 · 邊 · 增強現實（AR） · Attention ·

2023 年 10 月 23 日

Poster: Real-Time Object Substitution for Mobile Diminished Reality with Edge Computing

Hongyu Ke,Haoxin Wang

Diminished Reality (DR) is considered as the conceptual counterpart to Augmented Reality (AR), and has recently gained increasing attention from both industry and academia. Unlike AR which adds virtual objects to the real world, DR allows users to remove physical content from the real world. When combined with object replacement technology, it presents an further exciting avenue for exploration within the metaverse. Although a few researches have been conducted on the intersection of object substitution and DR, there is no real-time object substitution for mobile diminished reality architecture with high quality. In this paper, we propose an end-to-end architecture to facilitate immersive and real-time scene construction for mobile devices with edge computing.

MoDELS · 情景 · Performer · 成對型 · 數據集 ·

2023 年 10 月 20 日

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Michael Günther,Louis Milliken,Jonathan Geuter,Georgios Mastrapas,Bo Wang,Han Xiao

from arxiv, 9 pages, 2 page appendix

Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, offers in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Text Embedding Benchmark (MTEB). Furthermore, to increase the model's awareness of grammatical negation, we construct a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.

Networking · 原點 · 置換 · 圖像修復 · 講稿 ·

2023 年 10 月 20 日

PSGText: Stroke-Guided Scene Text Editing with PSP Module

Felix Liawi,Yun-Da Tsai,Guan-Lun Lu,Shou-De Lin

Scene Text Editing (STE) aims to substitute text in an image with new desired text while preserving the background and styles of the original text. However, present techniques present a notable challenge in the generation of edited text images that exhibit a high degree of clarity and legibility. This challenge primarily stems from the inherent diversity found within various text types and the intricate textures of complex backgrounds. To address this challenge, this paper introduces a three-stage framework for transferring texts across text images. Initially, we introduce a text-swapping network that seamlessly substitutes the original text with the desired replacement. Subsequently, we incorporate a background inpainting network into our framework. This specialized network is designed to skillfully reconstruct background images, effectively addressing the voids left after the removal of the original text. This process meticulously preserves visual harmony and coherence in the background. Ultimately, the synthesis of outcomes from the text-swapping network and the background inpainting network is achieved through a fusion network, culminating in the creation of the meticulously edited final image. A demo video is included in the supplementary material.

列 · 代碼 · 不可約的 · 線性無關 · 相互獨立的 ·

2023 年 10 月 19 日

Generalized GM-MDS: Polynomial Codes are Higher Order MDS

Joshua Brakensiek,Manik Dhar,Sivakanth Gopi

from arxiv, 34 pages

The GM-MDS theorem, conjectured by Dau-Song-Dong-Yuen and proved by Lovett and Yildiz-Hassibi, shows that the generator matrices of Reed-Solomon codes can attain every possible configuration of zeros for an MDS code. The recently emerging theory of higher order MDS codes has connected the GM-MDS theorem to other important properties of Reed-Solomon codes, including showing that Reed-Solomon codes can achieve list decoding capacity, even over fields of size linear in the message length. A few works have extended the GM-MDS theorem to other families of codes, including Gabidulin and skew polynomial codes. In this paper, we generalize all these previous results by showing that the GM-MDS theorem applies to any \emph{polynomial code}, i.e., a code where the columns of the generator matrix are obtained by evaluating linearly independent polynomials at different points. We also show that the GM-MDS theorem applies to dual codes of such polynomial codes, which is non-trivial since the dual of a polynomial code may not be a polynomial code. More generally, we show that GM-MDS theorem also holds for algebraic codes (and their duals) where columns of the generator matrix are chosen to be points on some irreducible variety which is not contained in a hyperplane through the origin. Our generalization has applications to constructing capacity-achieving list-decodable codes as shown in a follow-up work by Brakensiek-Dhar-Gopi-Zhang, where it is proved that randomly punctured algebraic-geometric (AG) codes achieve list-decoding capacity over constant-sized fields.

流 · Extensibility · Subspace · motivation · 線性的 ·

2023 年 10 月 19 日

Liveness Properties in Geometric Logic for Domain-Theoretic Streams

Colin Riba,Solal Stern

We devise a version of Linear Temporal Logic (LTL) on a denotational domain of streams. We investigate this logic in terms of domain theory, (point-free) topology and geometric logic. This yields the first steps toward an extension of the "Domain Theory in Logical Form" paradigm to temporal liveness properties. We show that the negation-free formulae of LTL induce sober subspaces of streams, but that this is in general not the case in presence of negation. We propose a direct, inductive, translation of negation-free LTL to geometric logic. This translation reflects the approximations used to compute the usual fixpoint representations of LTL modalities. As a motivating example, we handle a natural input-output specification for the usual filter function on streams.

LIDAR · 值域 · SCAN · 高斯混合（模型） · Extensibility ·

2023 年 10 月 19 日

ECTLO: Effective Continuous-time Odometry Using Range Image for LiDAR with Small FoV

Xin Zheng,Jianke Zhu

from arxiv, 8 pages, 5 figures. Accepted for publication in the Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

Prism-based LiDARs are more compact and cheaper than the conventional mechanical multi-line spinning LiDARs, which have become increasingly popular in robotics, recently. However, there are several challenges for these new LiDAR sensors, including small field of view, severe motion distortions, and irregular patterns, which hinder them from being widely used in LiDAR odometry, practically. To tackle these problems, we present an effective continuous-time LiDAR odometry (ECTLO) method for the Risley-prism-based LiDARs with non-repetitive scanning patterns. A single range image covering historical points in LiDAR's small FoV is adopted for efficient map representation. To account for the noisy data from occlusions after map updating, a filter-based point-to-plane Gaussian Mixture Model is used for robust registration. Moreover, a LiDAR-only continuous-time motion model is employed to relieve the inevitable distortions. Extensive experiments have been conducted on various testbeds using the prism-based LiDARs with different scanning patterns, whose promising results demonstrate the efficacy of our proposed approach.

INFORMS · 機器閱讀理解 · Better · Extensibility · 數據集 ·

2020 年 12 月 14 日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Xiuying Chen,Zhi Cui,Jiayi Zhang,Chen Wei,Jianwei Cui,Bin Wang,Dongyan Zhao,Rui Yan

from arxiv, 9 pages, 1 figure

In multi-turn dialog, utterances do not always take the full form of sentences \cite{Carbonell1983DiscoursePA}, which naturally makes understanding the dialog context more difficult. However, it is essential to fully grasp the dialog context to generate a reasonable response. Hence, in this paper, we propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question, where the question is focused on the omitted information in the dialog. Enlightened by the multi-task learning scheme, we propose a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features. To better fusing information from the question and the dialog history in the encoding part, we propose to augment the Transformer architecture with a memory updater, which is designed to selectively store and update the history dialog information so as to support downstream tasks. For the experiment, we employ human annotators to write and examine a large-scale dialog reading comprehension dataset. Extensive experiments are conducted on this dataset, and the results show that the proposed model brings substantial improvements over several strong baselines on both tasks. In this way, we demonstrate that reasoning can indeed help better response generation and vice versa. We release our large-scale dataset for further research.

異常檢測 · Taxonomy · Better · 示例 · 深度學習 ·

2020 年 3 月 16 日

Anomalous Instance Detection in Deep Learning: A Survey

Saikiran Bulusu,Bhavya Kailkhura,Bo Li,Pramod K. Varshney,Dawn Song

Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.

學成 · 深度學習 · 可辨認的 · MoDELS · 目標跟蹤 ·

2019 年 7 月 31 日

Deep Learning in Video Multi-Object Tracking: A Survey

Gioele Ciaparrone,Francisco Luque Sánchez,Siham Tabik,Luigi Troiano,Roberto Tagliaferri,Francisco Herrera

from arxiv, New in v2: corrected typos and various minor mistakes. Submitted to Neurocomputing. Main text: 25 pages, 5 figures, 6 tables. Summary table in appendix at the end of the paper

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.