A state vector-based quantum circuit simulation can provide accurate results for the development and validation of quantum computing algorithms, without being affected by noise interference. However, existing quantum circuit simulators have consistently underperformed due to inadequate integration with quantum circuits and high-performance computing architectures. To tackle the challenges in quantum computing, we propose QueenV2, which builds upon the design principles of Queen and elevates performance to a new level. Experimental results on the NVIDIA RTX-4090 demonstrate that QueenV2 achieves up to a 40x improvement in gate performance and a 5x improvement in circuit performance compared to hyQuas. Furthermore, QueenV2 realizes a 137x speedup in gate benchmarks and a 14x speedup in circuit performance relative to NVIDIA cuQuantum, enabled by gate fusion via the IBM Qiskit toolkit. By eliminating reliance on third-party libraries, QueenV2 is positioned to significantly accelerate quantum circuit simulation, thus promoting the development of innovative accelerators and quantum algorithms.
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.
Choices in the semantics and the signature of a theory are integral in determining how the theory is used and how challenging it is to reason over it. Our interest in this paper lies in the SMT theory of sequences. Various versions of it exist in the literature and in state-of-the-art SMT solvers, but it has not yet been standardized in the SMT-LIB. We reflect on its existing variants, and we define a set of theory design criteria to help determine what makes one variant of a theory better than another. The criteria we define can be used to appraise theory proposals for other theories as well. Based on these criteria, we propose a series of changes to the SMT theory of sequences as a contribution to the discussion regarding its standardization.
It is becoming increasingly difficult to improve the performance of a a single process (thread) on a computer due to physical limitations. Modern systems use multi-core processors in which multiple processes (threads) may run concurrently. A lock-free data structure can allow these processes to communicate with each other without requiring mutual exclusion, and may increase the amount of work they may perform in parallel rather than sequentially, thus improving the performance of the system as a whole. This paper contains an implementation of Ko's Lock-Free Binary Trie, which stores a dynamic set of keys from an ordered universe. It supports insert, remove, search and predecessor operations. One novel component of this implementation is a lock-free linked list which allows multiple processes to attempt to insert the same node, but which prevents a node from being reinserted once it has been removed from the list. The final section of this paper contains an experimental comparison of this implementation against other data structures which implement the same abstract data type (ADT) as the lock-free trie. Analysis of these experiments reveal that the implementation of Ko's Trie performs better than existing theoretical implementations of this ADT when the universe of keys is large, when removes are rare and when the number of processes performing operations concurrently is low.
Code metamorphism refers to a computer programming exercise wherein the program modifies its own code (partial or entire) consistently and automatically while retaining its core functionality. This technique is often used for online performance optimization and automated crash recovery in certain mission-critical applications. However, the technique has been misappropriated by malware creators to bypass signature-based detection measures instituted by anti-malware engines. However, current code mutation engines used by threat actors offer only a limited degree of mutation, which is frequently detectable via static code analysis. The advent of large language models (LLMs), such as ChatGPT 4.0 and Google Bard may lead to a significant evolution in this landscape. These models have demonstrated a level of algorithm comprehension and code synthesis capability that closely resembles human abilities. This advancement has sparked concerns among experts that such models could be exploited by threat actors to generate sophisticated metamorphic malware. This paper explores the potential of several prominent LLMs for software code mutation that may be used to reconstruct (with mutation) existing malware code bases or create new forms of embedded mutation engines for next-gen metamorphic malwares. In this work, we introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.
This paper proposes a new multi-linear projection method for denoising and estimation of high-dimensional matrix-variate factor time series. It assumes that a $p_1\times p_2$ matrix-variate time series consists of a dynamically dependent, lower-dimensional matrix-variate factor process and a $p_1\times p_2$ matrix idiosyncratic series. In addition, the latter series assumes a matrix-variate factor structure such that its row and column covariances may have diverging/spiked eigenvalues to accommodate the case of low signal-to-noise ratio often encountered in applications. We use an iterative projection procedure to reduce the dimensions and noise effects in estimating front and back loading matrices and to obtain faster convergence rates than those of the traditional methods available in the literature. We further introduce a two-way projected Principal Component Analysis to mitigate the diverging noise effects, and implement a high-dimensional white-noise testing procedure to estimate the dimension of the matrix factor process. Asymptotic properties of the proposed method are established if the dimensions and sample size go to infinity. We also use simulations and real examples to assess the performance of the proposed method in finite samples and to compare its forecasting ability with some existing ones in the literature. The proposed method fares well in out-of-sample forecasting. In a supplement, we demonstrate the efficacy of the proposed approach even when the idiosyncratic terms exhibit serial correlations with or without a diverging white noise effect.
Humanoids exhibit a wide variety in terms of joint configuration, actuators, and degrees of freedom, resulting in different achievable movements and tasks for each type. Particularly, musculoskeletal humanoids are developed to closely emulate human body structure and movement functions, consisting of a skeletal framework driven by numerous muscle actuators. The redundant arrangement of muscles relative to the skeletal degrees of freedom has been used to represent the flexible and complex body movements observed in humans. However, due to this flexible body and high degrees of freedom, modeling, simulation, and control become extremely challenging, limiting the feasible movements and tasks. In this study, we integrate the musculoskeletal humanoid Musashi with the wire-driven robot CubiX, capable of connecting to the environment, to form CubiXMusashi. This combination addresses the shortcomings of traditional musculoskeletal humanoids and enables movements beyond the capabilities of other humanoids. CubiXMusashi connects to the environment with wires and drives by winding them, successfully achieving movements such as pull-up, rising from a lying pose, and mid-air kicking, which are difficult for Musashi alone. This concept demonstrates that various humanoids, not limited to musculoskeletal humanoids, can mitigate their physical constraints and acquire new abilities by connecting to the environment and driving through wires.
This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.