Assuming the polynomial hierarchy is infinite, we prove a sufficient condition for determining if uniform and polynomial size quantum circuits over a non-universal gate set are not efficiently classically simulable in the weak multiplicative sense. Our criterion exploits the fact that subgroups of $\mathrm{SL}(2;\mathbb{C})$ are essentially either discrete or dense in $\mathrm{SL}(2;\mathbb{C})$. Using our criterion, we give a new proof that both instantaneous quantum polynomial (IQP) circuits and conjugated Clifford circuits (CCCs) afford a quantum advantage. We also prove that both commuting CCCs and CCCs over various fragments of the Clifford group afford a quantum advantage, which settles two questions of Bouland, Fitzsimons, and Koh. Our results imply that circuits over just $(U^\dagger \otimes U^\dagger) \mathrm{CZ} (U \otimes U)$ afford a quantum advantage for almost all $U \in \mathrm{U}(2)$.
In the context of communication complexity, we explore randomized protocols for graph coloring, focusing specifically on the vertex and edge coloring problems in $n$-vertex graphs $G$ with a maximum degree $\Delta$. We consider a scenario where the edges of $G$ are partitioned between two players. Our first contribution is a randomized protocol that efficiently finds a $(\Delta + 1)$-vertex coloring of $G$, utilizing $O(n)$ bits of communication in expectation and completing in $O(\log \log n \cdot \log \Delta)$ rounds in the worst case. This advancement represents a significant improvement over the work of Flin and Mittal [PODC 2024], who achieved the same communication cost but required $O(n)$ rounds in expectation, thereby making a significant reduction in the round complexity. We also present a randomized protocol for a $(2\Delta - 1)$-edge coloring of $G$, which maintains the same $O(n)$ bits of communication in expectation over $O(\log^\ast \Delta)$ rounds in the worst case. We complement the result with a tight $\Omega(n)$-bit lower bound on the communication complexity of the $(2\Delta-1)$-edge coloring, while a similar $\Omega(n)$ lower bound for the $(\Delta+1)$-vertex coloring has been established by Flin and Mittal [PODC 2024].
A new online multiple testing procedure is described in the context of anomaly detection, which controls the False Discovery Rate (FDR). An accurate anomaly detector must control the false positive rate at a prescribed level while keeping the false negative rate as low as possible. However in the online context, such a constraint remains highly challenging due to the usual lack of FDR control: the online framework makes it impossible to use classical multiple testing approaches such as the Benjamini-Hochberg (BH) procedure, which would require knowing the entire time series. The developed strategy relies on exploiting the local control of the ``modified FDR'' (mFDR) criterion. It turns out that the local control of mFDR enables global control of the FDR over the full series up to additional modifications of the multiple testing procedures. An important ingredient in this control is the cardinality of the calibration dataset used to compute the empirical p-values. A dedicated strategy for tuning this parameter is designed for achieving the prescribed FDR control over the entire time series. The good statistical performance of the full strategy is analyzed by theoretical guarantees. Its practical behavior is assessed by several simulation experiments which support our conclusions.
Motivated by the change-making problem, we extend the notion of greediness to sets of positive integers not containing the element $1$, and from there to numerical semigroups. We provide an algorithm to determine if a given set (not necessarily containing the number $1$) is greedy. We also give specific conditions for sets of cardinality three, and we prove that numerical semigroups generated by three consecutive integers are greedy.
A challenge in high-dimensional inverse problems is developing iterative solvers to find the accurate solution of regularized optimization problems with low computational cost. An important example is computed tomography (CT) where both image and data sizes are large and therefore the forward model is costly to evaluate. Since several years algorithms from stochastic optimization are used for tomographic image reconstruction with great success by subsampling the data. Here we propose a novel way how stochastic optimization can be used to speed up image reconstruction by means of image domain sketching such that at each iteration an image of different resolution is being used. Hence, we coin this algorithm ImaSk. By considering an associated saddle-point problem, we can formulate ImaSk as a gradient-based algorithm where the gradient is approximated in the same spirit as the stochastic average gradient am\'elior\'e (SAGA) and uses at each iteration one of these multiresolution operators at random. We prove that ImaSk is linearly converging for linear forward models with strongly convex regularization functions. Numerical simulations on CT show that ImaSk is effective and increasing the number of multiresolution operators reduces the computational time to reach the modeled solution.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Behaviors of the synthetic characters in current military simulations are limited since they are generally generated by rule-based and reactive computational models with minimal intelligence. Such computational models cannot adapt to reflect the experience of the characters, resulting in brittle intelligence for even the most effective behavior models devised via costly and labor-intensive processes. Observation-based behavior model adaptation that leverages machine learning and the experience of synthetic entities in combination with appropriate prior knowledge can address the issues in the existing computational behavior models to create a better training experience in military training simulations. In this paper, we introduce a framework that aims to create autonomous synthetic characters that can perform coherent sequences of believable behavior while being aware of human trainees and their needs within a training simulation. This framework brings together three mutually complementary components. The first component is a Unity-based simulation environment - Rapid Integration and Development Environment (RIDE) - supporting One World Terrain (OWT) models and capable of running and supporting machine learning experiments. The second is Shiva, a novel multi-agent reinforcement and imitation learning framework that can interface with a variety of simulation environments, and that can additionally utilize a variety of learning algorithms. The final component is the Sigma Cognitive Architecture that will augment the behavior models with symbolic and probabilistic reasoning capabilities. We have successfully created proof-of-concept behavior models leveraging this framework on realistic terrain as an essential step towards bringing machine learning into military simulations.
Transformer is a type of deep neural network mainly based on self-attention mechanism which is originally applied in natural language processing field. Inspired by the strong representation ability of transformer, researchers propose to extend transformer for computer vision tasks. Transformer-based models show competitive and even better performance on various visual benchmarks compared to other network types such as convolutional networks and recurrent networks. In this paper we provide a literature review of these visual transformer models by categorizing them in different tasks and analyze the advantages and disadvantages of these methods. In particular, the main categories include the basic image classification, high-level vision, low-level vision and video processing. Self-attention in computer vision is also briefly revisited as self-attention is the base component in transformer. Efficient transformer methods are included for pushing transformer into real applications. Finally, we give a discussion about the further research directions for visual transformer.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax