In the logic programming paradigm, a program is defined by a set of methods, each of which can be executed when specific conditions are met during the current state of an execution. The semantics of these programs can be elegantly represented using sequent calculi, in which each method is linked to an inference rule. In this context, proof search mirrors the program's execution. Previous works introduced a framework in which the process of constructing proof nets is employed to model executions, as opposed to the traditional approach of proof search in sequent calculus. This paper further extends this investigation by focussing on the pure multiplicative fragment of this framework. We demonstrate, providing practical examples, the capability to define logic programming methods with context-sensitive behaviors solely through specific resource-preserving and context-free operations, corresponding to certain generalized multiplicative connectives explored in existing literature. We show how some of these methods, although still multiplicative, escape the purely multiplicative fragment of Linear Logic (MLL).
In the circuit model of quantum computing, amplitude amplification techniques can be used to find solutions to NP-hard problems defined on $n$-bits in time $\text{poly}(n) 2^{n/2}$. In this work, we investigate whether such general statements can be made for adiabatic quantum optimization, as provable results regarding its performance are mostly unknown. Although a lower bound of $\Omega(2^{n/2})$ has existed in such a setting for over a decade, a purely adiabatic algorithm with this running time has been absent. We show that adiabatic quantum optimization using an unstructured search approach results in a running time that matches this lower bound (up to a polylogarithmic factor) for a broad class of classical local spin Hamiltonians. For this, it is necessary to bound the spectral gap throughout the adiabatic evolution and compute beforehand the position of the avoided crossing with sufficient precision so as to adapt the adiabatic schedule accordingly. However, we show that the position of the avoided crossing is approximately given by a quantity that depends on the degeneracies and inverse gaps of the problem Hamiltonian and is NP-hard to compute even within a low additive precision. Furthermore, computing it exactly (or nearly exactly) is \#P-hard. Our work indicates a possible limitation of adiabatic quantum optimization algorithms, leaving open the question of whether provable Grover-like speed-ups can be obtained for any optimization problem using this approach.
We introduce an approach for analyzing the responses of dynamical systems to external perturbations that combines score-based generative modeling with the Generalized Fluctuation-Dissipation Theorem (GFDT). The methodology enables accurate estimation of system responses, including those with non-Gaussian statistics. We numerically validate our approach using time-series data from three different stochastic partial differential equations of increasing complexity: an Ornstein-Uhlenbeck process with spatially correlated noise, a modified stochastic Allen-Cahn equation, and the 2D Navier-Stokes equations. We demonstrate the improved accuracy of the methodology over conventional methods and discuss its potential as a versatile tool for predicting the statistical behavior of complex dynamical systems.
Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at //github.com/sail-sg/VocabularyParallelism .
Symmetry detection can improve various machine learning tasks. In the context of continuous symmetry detection, current state of the art experiments are limited to detecting affine transformations. Under the manifold assumption, we outline a framework for discovering continuous symmetry in data beyond the affine transformation group. We also provide a similar framework for discovering discrete symmetry. We experimentally compare our method to an existing method known as LieGAN and show that our method is competitive at detecting affine symmetries for large sample sizes and superior than LieGAN for small sample sizes. We also show our method is able to detect continuous symmetries beyond the affine group and is generally more computationally efficient than LieGAN.
This work proposes a class of differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure. The proposed Cascade Sampling algorithm instantiates the mechanism exactly and efficiently. Our theoretical and empirical analysis demonstrates that we achieve near-optimal utility, effectively compete with other methods, and retain all the favorable statistical properties discussed earlier.
Recursive techniques have recently been introduced into quantum programming so that a variety of large quantum circuits and algorithms can be elegantly and economically programmed. In this paper, we present a proof system for formal verification of the correctness of recursively defined quantum circuits. The soundness and (relative) completeness of the proof system are established. To demonstrating its effectiveness, a series of application examples of the proof system are given, including (multi-qubit) controlled gates, a quantum circuit generating (multi-qubit) GHZ (Greenberger-Horne-Zeilinger) states, recursive definition of quantum Fourier transform, quantum state preparation, and quantum random-access memories (QRAM).
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i.e., batch-wise dynamic attention-masking and triple-wise calibration). Experimental results on three datasets (i.e., NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.