We construct a quantum oracle relative to which $\mathsf{BQP} = \mathsf{QMA}$ but cryptographic pseudorandom quantum states and pseudorandom unitary transformations exist, a counterintuitive result in light of the fact that pseudorandom states can be "broken" by quantum Merlin-Arthur adversaries. We explain how this nuance arises as the result of a distinction between algorithms that operate on quantum and classical inputs. On the other hand, we show that some computational complexity assumption is needed to construct pseudorandom states, by proving that pseudorandom states do not exist if $\mathsf{BQP} = \mathsf{PP}$. We discuss implications of these results for cryptography, complexity theory, and quantum tomography.
The analysis of mapping relationships and distortions in multidimensional data poses a significant challenge in contemporary research. While Beltrami coefficients offer a precise description of distortions in two-dimensional mappings, current tools lack this capability in the context of three-dimensional space. This paper presents a novel approach: a 3D quasiconformal representation that captures the local dilation of 3D mappings, along with an algorithm that establishes a connection between this representation and the corresponding mapping. Experimental results showcase the algorithm's effectiveness in eliminating foldings in 3D mappings, as well as in mapping reconstruction and generation. These features bear a resemblance to the 2D Linear Beltrami Solver technique. The work presented in this paper offers a promising solution for the precise analysis and adjustment of distortions in 3D data and mappings.
A closed quasigeodesic is a closed curve on the surface of a polyhedron with at most $180^\circ$ of surface on both sides at all points; such curves can be locally unfolded straight. In 1949, Pogorelov proved that every convex polyhedron has at least three (non-self-intersecting) closed quasigeodesics, but the proof relies on a nonconstructive topological argument. We present the first finite algorithm to find a closed quasigeodesic on a given convex polyhedron, which is the first positive progress on a 1990 open problem by O'Rourke and Wyman. The algorithm establishes for the first time a quasipolynomial upper bound on the total number of visits to faces (number of line segments), namely, $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3}\right)$ where $n$ is the number of vertices of the polyhedron, $\epsilon$ is the minimum curvature of a vertex, $L$ is the length of the longest edge, and $\ell$ is the smallest distance within a face between a vertex and a nonincident edge (minimum feature size of any face). On the real RAM, the algorithm's running time is also pseudopolynomial, namely $O\left(\frac{n \, L^3}{\epsilon^2 \, \ell^3} \log n\right)$. On a word RAM, the running time grows to $O\left(b^2 \cdot \frac{n^8 \log n}{\epsilon^8} \cdot \frac{L^{21}}{\ell^{21}}\cdot 2^{O(|\Lambda|)}\right)$, where $|\Lambda|$ is the number of distinct edge lengths in the polyhedron, assuming its intrinsic or extrinsic geometry is given by rational coordinates each with at most $b$ bits. This time bound remains pseudopolynomial for polyhedra with $O(\log n)$ distinct edges lengths, but is exponential in the worst case. Along the way, we introduce the expression RAM model of computation, formalizing a connection between the real RAM and word RAM hinted at by past work on exact geometric computation.
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1..m]$ on a large repetitive text collection $T[1..n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^\epsilon n)$, for any constant $\epsilon > 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(\delta\log\frac{n}{\delta})$, the time decreases to $O(m\log m(\log m + \log^\epsilon n))$. The value $\delta$ is a function of the substring complexity of $T$ and $\Omega(\delta\log\frac{n}{\delta})$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $\delta$. We extend our results to the problem of finding $q$-MEMs, which must appear at least $q$ times in $T$.
Existing approaches to distributed matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to stragglers and/or enhance privacy. In this study, we consider the challenge of preserving input sparsity in such approaches to retain the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of stragglers (for given number of nodes and their storage constraints). Next we propose a distributed matrix computation scheme which meets this exact lower bound on the weight of the coding. Further, we develop controllable trade-off between worker computation time and the privacy constraint for sparse input matrices in settings where the worker nodes are honest but curious. Numerical experiments conducted in Amazon Web Services (AWS) validate our assertions regarding straggler mitigation and computation speed for sparse matrices.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.