A minimal perfect hash function (MPHF) maps a set of n keys to the first n integers without collisions. Representing this bijection needs at least $\log_2(e) \approx 1.443$ bits per key, and there is a wide range of practical implementations achieving about 2 bits per key. Minimal perfect hashing is a key ingredient in many compact data structures such as updatable retrieval data structures and approximate membership data structures. A simple implementation reaching the space lower bound is to sample random hash functions using brute-force, which needs about $e^n \approx 2.718^n$ tries in expectation. ShockHash recently reduced that to about $(e/2)^n \approx 1.359^n$ tries in expectation by sampling random graphs. With bipartite ShockHash, we now sample random bipartite graphs. In this paper, we describe the general algorithmic ideas of bipartite ShockHash and give an experimental evaluation. The key insight is that we can try all combinations of two hash functions, each mapping into one half of the output range. This reduces the number of sampled hash functions to only about $(\sqrt{e/2})^n \approx 1.166^n$ in expectation. In itself, this does not reduce the asymptotic running time much because all combinations still need to be tested. However, by filtering the candidates before combining them, we can reduce this to less than $1.175^n$ combinations in expectation. Our implementation of bipartite ShockHash is up to 3 orders of magnitude faster than original ShockHash. Inside the RecSplit framework, bipartite ShockHash-RS enables significantly larger base cases, leading to a construction that is, depending on the allotted space budget, up to 20 times faster. In our most extreme configuration, ShockHash-RS can build an MPHF for 10 million keys with 1.489 bits per key (within 3.3% of the lower bound) in about half an hour, pushing the limits of what is possible.
We develop a generative attention-based approach to modeling structured entities comprising different property types, such as numerical, categorical, string, and composite. This approach handles such heterogeneous data through a mixed continuous-discrete diffusion process over the properties. Our flexible framework can model entities with arbitrary hierarchical properties, enabling applications to structured Knowledge Base (KB) entities and tabular data. Our approach obtains state-of-the-art performance on a majority of cases across 15 datasets. In addition, experiments with a device KB and a nuclear physics dataset demonstrate the model's ability to learn representations useful for entity completion in diverse settings. This has many downstream use cases, including modeling numerical properties with high accuracy - critical for science applications, which also benefit from the model's inherent probabilistic nature.
This study delves into the application of Generative Adversarial Networks (GANs) within the context of imbalanced datasets. Our primary aim is to enhance the performance and stability of GANs in such datasets. In pursuit of this objective, we introduce a novel network architecture known as Damage GAN, building upon the ContraD GAN framework which seamlessly integrates GANs and contrastive learning. Through the utilization of contrastive learning, the discriminator is trained to develop an unsupervised representation capable of distinguishing all provided samples. Our approach draws inspiration from the straightforward framework for contrastive learning of visual representations (SimCLR), leading to the formulation of a distinctive loss function. We also explore the implementation of self-damaging contrastive learning (SDCLR) to further enhance the optimization of the ContraD GAN model. Comparative evaluations against baseline models including the deep convolutional GAN (DCGAN) and ContraD GAN demonstrate the evident superiority of our proposed model, Damage GAN, in terms of generated image distribution, model stability, and image quality when applied to imbalanced datasets.
Coupled partial differential equations (PDEs) are key tasks in modeling the complex dynamics of many physical processes. Recently, neural operators have shown the ability to solve PDEs by learning the integral kernel directly in Fourier/Wavelet space, so the difficulty for solving the coupled PDEs depends on dealing with the coupled mappings between the functions. Towards this end, we propose a \textit{coupled multiwavelets neural operator} (CMWNO) learning scheme by decoupling the coupled integral kernels during the multiwavelet decomposition and reconstruction procedures in the Wavelet space. The proposed model achieves significantly higher accuracy compared to previous learning-based solvers in solving the coupled PDEs including Gray-Scott (GS) equations and the non-local mean field game (MFG) problem. According to our experimental results, the proposed model exhibits a $2\times \sim 4\times$ improvement relative $L$2 error compared to the best results from the state-of-the-art models.
We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from other automatic evaluation methods that only provide arcane scores, TIGERScore is guided by natural language instruction to provide error analysis to pinpoint the mistakes in the generated text. Our metric is based on LLaMA-2, trained on our meticulously curated instruction-tuning dataset MetricInstruct which covers 6 text generation tasks and 23 text generation datasets. The dataset consists of 42K quadruple in the form of (instruction, input, system output $\rightarrow$ error analysis). We collected the `system outputs' through from a large variety of models to cover different types of errors. To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the open-source SoTA correlation with human ratings across these datasets and almost approaches GPT-4 evaluator. As a reference-free metric, its correlation can even surpass the best existing reference-based metrics. To further qualitatively assess the rationale generated by our metric, we conduct human evaluation on the generated explanations and found that the explanations are 70.8\% accurate. Through these experimental results, we believe TIGERScore demonstrates the possibility of building universal explainable metrics to evaluate any text generation task.
Popular industrial robotic problems such as spray painting and welding require (i) conditioning on free-shape 3D objects and (ii) planning of multiple trajectories to solve the task. Yet, existing solutions make strong assumptions on the form of input surfaces and the nature of output paths, resulting in limited approaches unable to cope with real-data variability. By leveraging on recent advances in 3D deep learning, we introduce a novel framework capable of dealing with arbitrary 3D surfaces, and handling a variable number of unordered output paths (i.e. unstructured). Our approach predicts local path segments, which can be later concatenated to reconstruct long-horizon paths. We extensively validate the proposed method in the context of robotic spray painting by releasing PaintNet, the first public dataset of expert demonstrations on free-shape 3D objects collected in a real industrial scenario. A thorough experimental analysis demonstrates the capabilities of our model to promptly predict smooth output paths that cover up to 95% of previously unseen object surfaces, even without explicitly optimizing for paint coverage.
This paper addresses uncertainty propagation on unimodular matrix Lie groups that have a surjective exponential map. We derive the exact formula for the propagation of mean and covariance in a continuous-time setting from the governing Fokker-Planck equation. Two approximate propagation methods are discussed based on the exact formula. One uses numerical quadrature and another utilizes the expansion of moments. A closed-form second-order propagation formula is derived. We apply the general theory to the joint attitude and angular momentum uncertainty propagation problem and numerical experiments demonstrate two approximation methods. These results show that our new methods have high accuracy while being computationally efficient.
Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.
Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding. In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. We design three pre-training tasks: Masked Language Modeling (MLM), Image-Text Matching (ITM), and Masked Region Modeling (MRM, with three variants). Different from concurrent work on multimodal pre-training that apply joint random masking to both modalities, we use conditioned masking on pre-training tasks (i.e., masked language/region modeling is conditioned on full observation of image/text). Comprehensive analysis shows that conditioned masking yields better performance than unconditioned masking. We also conduct a thorough ablation study to find an optimal setting for the combination of pre-training tasks. Extensive experiments show that UNITER achieves new state of the art across six V+L tasks (over nine datasets), including Visual Question Answering, Image-Text Retrieval, Referring Expression Comprehension, Visual Commonsense Reasoning, Visual Entailment, and NLVR2.
Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at //github.com/dolphin-zs/Doc2EDAG.