{mayi_des}
This work tackles the problem of finding a good ansatz initialization for Variational Quantum Algorithms (VQAs), by proposing CAFQA, a Clifford Ansatz For Quantum Accuracy. The CAFQA ansatz is a hardware-efficient circuit built with only Clifford gates. In this ansatz, the parameters for the tunable gates are chosen by searching efficiently through the Clifford parameter space via classical simulation. The resulting initial states always equal or outperform traditional classical initialization (e.g., Hartree-Fock), and enable high-accuracy VQA estimations. CAFQA is well-suited to classical computation because: a) Clifford-only quantum circuits can be exactly simulated classically in polynomial time, and b) the discrete Clifford space is searched efficiently via Bayesian Optimization. For the Variational Quantum Eigensolver (VQE) task of molecular ground state energy estimation (up to 18 qubits), CAFQA's Clifford Ansatz achieves a mean accuracy of nearly 99% and recovers as much as 99.99% of the molecular correlation energy that is lost in Hartree-Fock initialization. CAFQA achieves mean accuracy improvements of 6.4x and 56.8x, over the state-of-the-art, on different metrics. The scalability of the approach allows for preliminary ground state energy estimation of the challenging chromium dimer (Cr$_2$) molecule. With CAFQA's high-accuracy initialization, the convergence of VQAs is shown to accelerate by 2.5x, even for small molecules. Furthermore, preliminary exploration of allowing a limited number of non-Clifford (T) gates in the CAFQA framework, shows that as much as 99.9% of the correlation energy can be recovered at bond lengths for which Clifford-only CAFQA accuracy is relatively limited, while remaining classically simulable.
Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GRIM}, a prototype \textbf{GR}aph-based \textbf{I}nteractive narrative visualization system for ga\textbf{M}es, generates a rich narrative graph with branching storylines that match a high-level narrative description and constraints provided by the designer. Game designers can interactively edit the graph by automatically generating new sub-graphs that fit the edits within the original narrative and constraints. We illustrate the use of \textbf{GRIM} in conjunction with GPT-4, generating branching narratives for four well-known stories with different contextual constraints.
By establishing an interesting connection between ordinary Bell polynomials and rational convolution powers, some composition and inverse relations of Bell polynomials as well as explicit expressions for convolution roots of sequences are obtained. Based on these results, a new method is proposed for calculation of partial Bell polynomials based on prime factorization. It is shown that this method is more efficient than the conventional recurrence procedure for computing Bell polynomials in most cases, requiring far less arithmetic operations. A detailed analysis of the computation complexity is provided, followed by some numerical evaluations.
Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king - man + woman = __?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21%) and on ConceptARC it scored 8.8%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.
We propose Riemannian preconditioned algorithms for the tensor completion problem via tensor ring decomposition. A new Riemannian metric is developed on the product space of the mode-2 unfolding matrices of the core tensors in tensor ring decomposition. The construction of this metric aims to approximate the Hessian of the cost function by its diagonal blocks, paving the way for various Riemannian optimization methods. Specifically, we propose the Riemannian gradient descent and Riemannian conjugate gradient algorithms. We prove that both algorithms globally converge to a stationary point. In the implementation, we exploit the tensor structure and adopt an economical procedure to avoid large matrix formulation and computation in gradients, which significantly reduces the computational cost. Numerical experiments on various synthetic and real-world datasets -- movie ratings, hyperspectral images, and high-dimensional functions -- suggest that the proposed algorithms are more efficient and have better reconstruction ability than other candidates.
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at //github.com/google-research/google-research/tree/master/bigger_better_faster.
One of the main challenges of multimodal learning is the need to combine heterogeneous modalities (e.g., video, audio, text). For example, video and audio are obtained at much higher rates than text and are roughly aligned in time. They are often not synchronized with text, which comes as a global context, e.g., a title, or a description. Furthermore, video and audio inputs are of much larger volumes, and grow as the video length increases, which naturally requires more compute dedicated to these modalities and makes modeling of long-range dependencies harder. We here decouple the multimodal modeling, dividing it into separate, focused autoregressive models, processing the inputs according to the characteristics of the modalities. We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential. To address the long-sequences of the video-audio inputs, we propose to further partition the video and audio sequences in consecutive snippets and autoregressively process their representations. To that end, we propose a Combiner mechanism, which models the audio-video information jointly within a timeframe. The Combiner learns to extract audio and video features from raw spatio-temporal signals, and then learns to fuse these features producing compact but expressive representations per snippet. Our approach achieves the state-of-the-art on well established multimodal benchmarks, outperforming much larger models. It effectively addresses the high computational demand of media inputs by both learning compact representations, controlling the sequence length of the audio-video feature representations, and modeling their dependencies in time.
We consider a Canonical Polyadic (CP) decomposition approach to low-rank tensor completion (LRTC) by incorporating external pairwise similarity relations through graph Laplacian regularization on the CP factor matrices. The usage of graph regularization entails benefits in the learning accuracy of LRTC, but at the same time, induces coupling graph Laplacian terms that hinder the optimization of the tensor completion model. In order to solve graph-regularized LRTC, we propose efficient alternating minimization algorithms by leveraging the block structure of the underlying CP decomposition-based model. For the subproblems of alternating minimization, a linear conjugate gradient subroutine is specifically adapted to graph-regularized LRTC. Alternatively, we circumvent the complicating coupling effects of graph Laplacian terms by using an alternating directions method of multipliers. Based on the Kurdyka-{\L}ojasiewicz property, we show that the sequence generated by the proposed algorithms globally converges to a critical point of the objective function. Moreover, the complexity and convergence rate are also derived. In addition, numerical experiments including synthetic data and real data show that the graph regularized tensor completion model has improved recovery results compared to those without graph regularization, and that the proposed algorithms achieve gains in time efficiency over existing algorithms.
In this paper we study the convergence of a fully discrete Crank-Nicolson Galerkin scheme for the initial value problem associated with the fractional Korteweg-de Vries (KdV) equation, which involves the fractional Laplacian and non-linear convection terms. Our proof relies on the Kato type local smoothing effect to estimate the localized $H^{\alpha/2}$-norm of the approximated solution, where $\alpha \in [1,2)$. We demonstrate that the scheme converges strongly in $L^2(0,T;L^2_{loc}(\mathbb{R}))$ to a weak solution of the fractional KdV equation provided the initial data in $L^2(\mathbb{R})$. Assuming the initial data is sufficiently regular, we obtain the rate of convergence for the numerical scheme. Finally, the theoretical convergence rates are justified numerically through various numerical illustrations.
We present NCCSG, a nonsmooth optimization method. In each iteration, NCCSG finds the best length-constrained descent direction by considering the worst bound over all local subgradients. NCCSG can take advantage of local smoothness or local strong convexity of the objective function. We prove a few global convergence rates of NCCSG. For well-behaved nonsmooth functions (characterized by the weak smooth property), NCCSG converges in $O(\frac{1}{\epsilon} \log \frac{1}{\epsilon})$ iterations, where $\epsilon$ is the desired optimality gap. For smooth functions and strongly-convex smooth functions, NCCSG achieves the lower bound of convergence rates of blackbox first-order methods, i.e., $O(\frac{1}{\epsilon})$ for smooth functions and $O(\log \frac{1}{\epsilon})$ for strongly-convex smooth functions. The efficiency of NCCSG depends on the efficiency of solving a minimax optimization problem involving the subdifferential of the objective function in each iteration.
We investigate the use of multilevel Monte Carlo (MLMC) methods for estimating the expectation of discretized random fields. Specifically, we consider a setting in which the input and output vectors of the numerical simulators have inconsistent dimensions across the multilevel hierarchy. This requires the introduction of grid transfer operators borrowed from multigrid methods. Starting from a simple 1D illustration, we demonstrate numerically that the resulting MLMC estimator deteriorates the estimation of high-frequency components of the discretized expectation field compared to a Monte Carlo (MC) estimator. By adapting mathematical tools initially developed for multigrid methods, we perform a theoretical spectral analysis of the MLMC estimator of the expectation of discretized random fields, in the specific case of linear, symmetric and circulant simulators. This analysis provides a spectral decomposition of the variance into contributions associated with each scale component of the discretized field. We then propose improved MLMC estimators using a filtering mechanism similar to the smoothing process of multigrid methods. The filtering operators improve the estimation of both the small- and large-scale components of the variance, resulting in a reduction of the total variance of the estimator. These improvements are quantified for the specific class of simulators considered in our spectral analysis. The resulting filtered MLMC (F-MLMC) estimator is applied to the problem of estimating the discretized variance field of a diffusion-based covariance operator, which amounts to estimating the expectation of a discretized random field. The numerical experiments support the conclusions of the theoretical analysis even with non-linear simulators, and demonstrate the improvements brought by the proposed F-MLMC estimator compared to both a crude MC and an unfiltered MLMC estimator.