Operations research deals with modeling and solving real-world problems as mathematical optimization problems. While solving mathematical systems is accomplished by analytical software, formulating a problem as a set of mathematical operations has been typically done manually by domain experts. Recent machine learning methods have shown promise in converting textual problem descriptions to corresponding mathematical formulations. This paper presents an approach that converts linear programming word problems into mathematical formulations. We leverage the named entities in the input and augment the input to highlight these entities. Our approach achieves the highest accuracy among all submissions to the NL4Opt Competition, securing first place in the generation track.
Number theoretic transform (NTT) has been a very useful tool in computations for number theory, algebra and cryptography. Its performance affects some post-quantum cryptosystems. In this paper, we discuss the butterfly operation of NTT. This basic module of NTT requires heavy modular arithmetics. Montgomery reduction is commonly used in this setting. Recently several variants of Montgomery algorithm have been proposed for the purpose of speeding up NTT. We observe that the Chinese remainder theorem (CRT) can be involved in this type of algorithms in nature and transparent ways. In this paper, a framework of using CRT to model Montgomery type algorithms is described. The derivation of these algorithms as well as their correctness are all treated in the CRT framework. Under our approach, some problems of a modular reduction algorithm (published in IACR Transactions on Cryptographic Hardware and Embedded Systems, doi:10.46586/tches.v2022.i4.614-636 ) are identified, and a counterexample is generated to show that the algorithm is incorrect.
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.
Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks.
Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of quantum mechanics. The scalability of quantum architectures remains a significant challenge. Multi-core quantum architectures are proposed to solve the scalability problem, arising a new set of challenges in hardware, communications and compilation, among others. One of these challenges is to adapt a quantum algorithm to fit within the different cores of the quantum computer. This paper presents a novel approach for circuit partitioning using Deep Reinforcement Learning, contributing to the advancement of both quantum computing and graph partitioning. This work is the first step in integrating Deep Reinforcement Learning techniques into Quantum Circuit Mapping, opening the door to a new paradigm of solutions to such problems.
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
Bayesian model updating facilitates the calibration of analytical models based on observations and the quantification of uncertainties in model parameters such as stiffness and mass. This process significantly enhances damage assessment and response predictions in existing civil structures. Predominantly, current methods employ modal properties identified from acceleration measurements to evaluate the likelihood of the model parameters. This modal analysis-based likelihood generally involves a prior assumption regarding the mass parameters. In civil structures, accurately determining mass parameters proves challenging owing to the time-varying nature of imposed loads. The resulting inaccuracy potentially introduces biases while estimating the stiffness parameters, which affects the assessment of structural response and associated damage. Addressing this issue, the present study introduces a stress-resultant-based approach for Bayesian model updating independent of mass assumptions. This approach utilizes system identification on strain and acceleration measurements to establish the relationship between nodal displacements and elemental stress resultants. Employing static analysis to depict this relationship aids in assessing the likelihood of stiffness parameters. Integrating this static-analysis-based likelihood with a modal-analysis-based likelihood facilitates the simultaneous estimation of mass and stiffness parameters. The proposed approach was validated using numerical examples on a planar frame and experimental studies on a full-scale moment-resisting steel frame structure.
We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention. When the number of in-context examples D increases, the prediction loss using single-/multi-head attention is in O(1/D), and the one for multi-head attention has a smaller multiplicative constant. In addition to the simplest data distribution setting, we consider more scenarios, e.g., noisy labels, local examples, correlated features, and prior knowledge. We observe that, in general, multi-head attention is preferred over single-head attention. Our results verify the effectiveness of the design of multi-head attention in the transformer architecture.
This paper belongs to a group of work in the intersection of symbolic computation and group analysis aiming for the symbolic analysis of differential equations. The goal is to extract important properties without finding the explicit general solution. In this contribution, we introduce the algorithmic verification of nonlinear superposition properties and its implementation. More exactly, for a system of nonlinear ordinary differential equations of first order with a polynomial right-hand side, we check if the differential system admits a general solution by means of a superposition rule and a certain number of particular solutions. It is based on the theory of Newton polytopes and associated symbolic computation. The developed method provides the basis for the identification of nonlinear superpositions within a given system and for the construction of numerical methods which preserve important algebraic properties at the numerical level.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.