In this paper, we propose Docprompt for document question answering tasks with powerful zero-shot and few-shot performance. We proposed a novel weakly supervised data generation method, a novel multl-stage training method and a novel understanding model \& generation model ensemble method. We achieved state-of-the-art performance on 4 document question answering tasks. This method greatly improves the delivery efficiency and model performance of document question answering customer projects, reducing annotation costs and labor costs. Our demo can be found at //huggingface.co/spaces/PaddlePaddle/ERNIE-Layout.
In this paper, I present three closed-form approximations of the two-sample Pearson Bayes factor. The techniques rely on some classical asymptotic results about gamma functions. These approximations permit simple closed-form calculation of the Pearson Bayes factor in cases where only the summary statistics are available (i.e., the t-score and degrees of freedom).
The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at //github.com/k2-fsa/icefall.
In the Maximum Independent Set problem we are asked to find a set of pairwise nonadjacent vertices in a given graph with the maximum possible cardinality. In general graphs, this classical problem is known to be NP-hard and hard to approximate within a factor of $n^{1-\varepsilon}$ for any $\varepsilon > 0$. Due to this, investigating the complexity of Maximum Independent Set in various graph classes in hope of finding better tractability results is an active research direction. In $H$-free graphs, that is, graphs not containing a fixed graph $H$ as an induced subgraph, the problem is known to remain NP-hard and APX-hard whenever $H$ contains a cycle, a vertex of degree at least four, or two vertices of degree at least three in one connected component. For the remaining cases, where every component of $H$ is a path or a subdivided claw, the complexity of Maximum Independent Set remains widely open, with only a handful of polynomial-time solvability results for small graphs $H$ such as $P_5$, $P_6$, the claw, or the fork. We show that for every graph $H$ for which Maximum Independent Set is not known to be APX-hard and SUBEXP-hard in $H$-free graphs, the problem admits a quasi-polynomial time approximation scheme and a subexponential-time exact algorithm in this graph class. Our algorithm works also in the more general weighted setting, where the input graph is supplied with a weight function on vertices and we are maximizing the total weight of an independent set.
In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image. While Zero-1-to-3 has demonstrated impressive zero-shot open-set NVS capability, it treats NVS as a pure image-to-image translation problem. This approach suffers from the challengingly under-constrained nature of single-view NVS: the process lacks means of explicit user control and often results in implausible NVS generations. To address this limitation, TOSS uses text as high-level semantic information to constrain the NVS solution space. TOSS fine-tunes text-to-image Stable Diffusion pre-trained on large-scale text-image pairs and introduces modules specifically tailored to image and camera pose conditioning, as well as dedicated training for pose correctness and preservation of fine details. Comprehensive experiments are conducted with results showing that our proposed TOSS outperforms Zero-1-to-3 with more plausible, controllable and multiview-consistent NVS results. We further support these results with comprehensive ablations that underscore the effectiveness and potential of the introduced semantic guidance and architecture design.
Query answering is an important problem in AI, database and knowledge representation. In this paper, we develop saturation-based Boolean conjunctive query answering and rewriting procedures for the guarded, the loosely guarded and the clique-guarded fragments. Our query answering procedure improves existing resolution-based decision procedures for the guarded and the loosely guarded fragments and this procedure solves Boolean conjunctive query answering problems for the guarded, the loosely guarded and the clique-guarded fragments. Based on this query answering procedure, we also introduce a novel saturation-based query rewriting procedure for these guarded fragments. Unlike mainstream query answering and rewriting methods, our procedures derive a compact and reusable saturation, namely a closure of formulas, to handle the challenge of querying for distributed datasets. This paper lays the theoretical foundations for the first automated deduction decision procedures for Boolean conjunctive query answering and the first saturation-based Boolean conjunctive query rewriting in the guarded, the loosely guarded and the clique-guarded fragments.
In this paper we consider the numerical approximation of infinite horizon problems via the dynamic programming approach. The value function of the problem solves a Hamilton-Jacobi-Bellman (HJB) equation that is approximated by a fully discrete method. It is known that the numerical problem is difficult to handle by the so called curse of dimensionality. To mitigate this issue we apply a reduction of the order by means of a new proper orthogonal decomposition (POD) method based on time derivatives. We carry out the error analysis of the method using recently proved optimal bounds for the fully discrete approximations. Moreover, the use of snapshots based on time derivatives allow us to bound some terms of the error that could not be bounded in a standard POD approach. Some numerical experiments show the good performance of the method in practice.
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: //github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.
In this paper a set of previous general results for the development of B--series for a broad class of stochastic differential equations has been collected. The applicability of these results is demonstrated by the derivation of B--series for non-autonomous semi-linear SDEs and exponential Runge-Kutta methods applied to this class of SDEs, which is a significant generalization of existing theory on such methods.
Mesh offsetting plays an important role in discrete geometric processing. In this paper, we propose a parallel feature-preserving mesh offsetting framework with variable distance. Different from the traditional method based on distance and normal vector, a new calculation of offset position is proposed by using dynamic programming and quadratic programming, and the sharp feature can be preserved after offsetting. Instead of distance implicit field, a spatial coverage region represented by polyhedral for computing offsets is proposed. Our method can generate an offsetting model with smaller mesh size, and also can achieve high quality without gaps, holes, and self-intersections. Moreover, several acceleration techniques are proposed for the efficient mesh offsetting, such as the parallel computing with grid, AABB tree and rays computing. In order to show the efficiency and robustness of the proposed framework, we have tested our method on the quadmesh dataset, which is available at [//www.quadmesh.cloud]. The source code of the proposed algorithm is available on GitHub at [//github.com/iGame-Lab/PFPOffset].
Suppose a gambler pays one coin per coup to play a two-armed Futurity slot machine, an antique casinos, and two coins are refunded for every two consecutive gambler losses. This payoff is called the Futurity award. The casino owner honestly advertises that each arm on his/her two-armed machine is fair in the sense that the asymptotic expected profit of both gambler and dealer is 0 if the gambler only plays either arm. The gambler is allowed to play either arm on each coup alternatively in some deterministic order or at random. For almost 90 years, since Futurity slot machines is designed in 1936, an open problem that has not been solved for a long time is whether the slot machine will obey the so-called "long bet will lose" phenomenon so common to casino games. Ethier and Lee [Ann. Appl. Proba. 20(2010), pp.1098-1125] conjectured that a player will also definitely lose in the long run by applying any non-random-mixture strategy. In this paper, we shall prove Ethier and Lee's conjecture. Our result with Ethier and Lee's conclusion straightforwardly demonstrates that players decide to use either random or non-random two-arm strategies before playing and then repeated without interruption, the casino owners are always profitable even when the Futurity award is taken into account. The contribution of this work is that it helps complete the demystification of casino profitability. Moreover, it paves the way for casino owners to improve casino game design and for players to participate effectively in gambling.