This paper studies nested sequents for quantified modal logics. In particular, it considers extensions of the propositional modal logics definable by the axioms D, T, B, 4, and 5 with varying, increasing, decreasing, and constant domains. Each calculus is proved to have good structural properties: weakening and contraction are height-preserving admissible and cut is (syntactically) admissible. Each calculus is shown to be equivalent to the corresponding axiomatic system and, thus, to be sound and complete. Finally, it is argued that the calculi are internal -- i.e., each sequent has a formula interpretation -- whenever the existence predicate is expressible in the language.
We introduce an quantum entropy for bimodule quantum channels on finite von Neumann algebras, generalizing the remarkable Pimsner-Popa entropy. The relative entropy for Fourier multipliers of bimodule quantum channels establishes an upper bound of the quantum entropy. Additionally, we present the Araki relative entropy for bimodule quantum channels, revealing its equivalence to the relative entropy for Fourier multipliers and demonstrating its left/right monotonicities and convexity. Notably, the quantum entropy attains its maximum if there is a downward Jones basic construction. By considering R\'{e}nyi entropy for Fourier multipliers, we find a continuous bridge between the logarithm of the Pimsner-Popa index and the Pimsner-Popa entropy. As a consequence, the R\'{e}nyi entropy at $1/2$ serves a criterion for the existence of a downward Jones basic construction.
This study proposes the "adaptive flip graph algorithm", which combines adaptive searches with the flip graph algorithm for finding fast and efficient methods for matrix multiplication. The adaptive flip graph algorithm addresses the inherent limitations of exploration and inefficient search encountered in the original flip graph algorithm, particularly when dealing with large matrix multiplication. For the limitation of exploration, the proposed algorithm adaptively transitions over the flip graph, introducing a flexibility that does not strictly reduce the number of multiplications. Concerning the issue of inefficient search in large instances, the proposed algorithm adaptively constraints the search range instead of relying on a completely random search, facilitating more effective exploration. Numerical experimental results demonstrate the effectiveness of the adaptive flip graph algorithm, showing a reduction in the number of multiplications for a $4\times 5$ matrix multiplied by a $5\times 5$ matrix from $76$ to $73$, and that from $95$ to $94$ for a $5 \times 5$ matrix multiplied by another $5\times 5$ matrix. These results are obtained in characteristic two.
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.
The online bisection problem is a natural dynamic variant of the classic optimization problem, where one has to dynamically maintain a partition of $n$ elements into two clusters of cardinality $n/2$. During runtime, an online algorithm is given a sequence of requests, each being a pair of elements: an inter-cluster request costs one unit while an intra-cluster one is free. The algorithm may change the partition, paying a unit cost for each element that changes its cluster. This natural problem admits a simple deterministic $O(n^2)$-competitive algorithm [Avin et al., DISC 2016]. While several significant improvements over this result have been obtained since the original work, all of them either limit the generality of the input or assume some form of resource augmentation (e.g., larger clusters). Moreover, the algorithm of Avin et al. achieves the best known competitive ratio even if randomization is allowed. In this paper, we present the first randomized online algorithm that breaks this natural quadratic barrier and achieves a competitive ratio of $\tilde{O}(n^{23/12})$ without resource augmentation and for an arbitrary sequence of requests.
Transformers have achieved promising results on a variety of tasks. However, the quadratic complexity in self-attention computation has limited the applications, especially in low-resource settings and mobile or edge devices. Existing works have proposed to exploit hand-crafted attention patterns to reduce computation complexity. However, such hand-crafted patterns are data-agnostic and may not be optimal. Hence, it is likely that relevant keys or values are being reduced, while less important ones are still preserved. Based on this key insight, we propose a novel deformable audio Transformer for audio recognition, named DATAR, where a deformable attention equipping with a pyramid transformer backbone is constructed and learnable. Such an architecture has been proven effective in prediction tasks,~\textit{e.g.}, event classification. Moreover, we identify that the deformable attention map computation may over-simplify the input feature, which can be further enhanced. Hence, we introduce a learnable input adaptor to alleviate this issue, and DATAR achieves state-of-the-art performance.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.