Ehrenfeucht-Fra\"iss\'e games provide a fundamental method for proving elementary equivalence (and equivalence up to a certain quantifier rank) of relational structures. We investigate the soundness and completeness of this method in the more general context of semiring semantics. Motivated originally by provenance analysis of database queries, semiring semantics evaluates logical statements not just by true or false, but by values in some commutative semiring; this can provide much more detailed information, for instance concerning the combinations of atomic facts that imply the truth of a statement, or practical information about evaluation costs, confidence scores, access levels or the number of successful evaluation strategies. There is a wide variety of different semirings that are relevant for provenance analysis, and the applicability of classical logical methods in semiring semantics may strongly depend on the algebraic properties of the underlying semiring. While Ehrenfeucht-Fra\"iss\'e games are sound and complete for logical equivalences in classical semantics, and thus on the Boolean semiring, this is in general not the case for other semirings. We provide a detailed analysis of the soundness and completeness of model comparison games on specific semirings, not just for classical Ehrenfeucht-Fra\"iss\'e games but also for other variants based on bijections or counting. Finally we propose a new kind of games, called homomorphism games, which are based on the fact that there exist certain rather simple semiring interpretations that are locally very different and can be separated even in a one-move game, but which can be proved to be elementarily equivalent via separating sets of homomorphisms into the Boolean semiring. We prove that these homomorphism games provide a sound and complete method for logical equivalences on finite and infinite lattice semirings.
We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.
Contrastive learning often relies on comparing positive anchor samples with multiple negative samples to perform Self-Supervised Learning (SSL). However, non-contrastive approaches like BYOL, SimSiam, and Barlow Twins achieve SSL without explicit negative samples. In this paper, we introduce a unified matrix information-theoretic framework that explains many contrastive and non-contrastive learning methods. We then propose a novel method Matrix-SSL based on matrix information theory. Experimental results reveal that Matrix-SSL significantly outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing 100 epochs pre-training, our method outperforms SimCLR by 4.6%, and when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. Code available at //github.com/yifanzhang-pro/Matrix-SSL.
Higher-order abstract GSOS is a recent extension of Turi and Plotkin's framework of Mathematical Operational Semantics to higher-order languages. The fundamental well-behavedness property of all specifications within the framework is that coalgebraic strong (bi)similarity on their operational model is a congruence. In the present work, we establish a corresponding congruence theorem for weak similarity, which is shown to instantiate to well-known concepts such as Abramsky's applicative similarity for the lambda-calculus. On the way, we develop several techniques of independent interest at the level of abstract categories, including relation liftings of mixed-variance bifunctors and higher-order GSOS laws, as well as Howe's method.
We propose a novel zero-shot approach to computing correspondences between 3D shapes. Existing approaches mainly focus on isometric and near-isometric shape pairs (e.g., human vs. human), but less attention has been given to strongly non-isometric and inter-class shape matching (e.g., human vs. cow). To this end, we introduce a fully automatic method that exploits the exceptional reasoning capabilities of recent foundation models in language and vision to tackle difficult shape correspondence problems. Our approach comprises multiple stages. First, we classify the 3D shapes in a zero-shot manner by feeding rendered shape views to a language-vision model (e.g., BLIP2) to generate a list of class proposals per shape. These proposals are unified into a single class per shape by employing the reasoning capabilities of ChatGPT. Second, we attempt to segment the two shapes in a zero-shot manner, but in contrast to the co-segmentation problem, we do not require a mutual set of semantic regions. Instead, we propose to exploit the in-context learning capabilities of ChatGPT to generate two different sets of semantic regions for each shape and a semantic mapping between them. This enables our approach to match strongly non-isometric shapes with significant differences in geometric structure. Finally, we employ the generated semantic mapping to produce coarse correspondences that can further be refined by the functional maps framework to produce dense point-to-point maps. Our approach, despite its simplicity, produces highly plausible results in a zero-shot manner, especially between strongly non-isometric shapes. Project webpage: //samir55.github.io/3dshapematch/.
We introduce and initiate the study of a natural class of repeated two-player matrix games, called Repeated-Until-Collision (RUC) games. In each round, both players simultaneously pick an action from a common action set $\{1, 2, \dots, n\}$. Depending on their chosen actions, they derive payoffs given by $n \times n$ matrices $A$ and $B$, respectively. If their actions collide (i.e., they pick the same action), the game ends, otherwise, it proceeds to the next round. Both players want to maximize their total payoff until the game ends. RUC games can be interpreted as pursuit-evasion games or repeated hide-and-seek games. They also generalize hand cricket, a popular game among children in India. We show that under mild assumptions on the payoff matrices, every RUC game admits a Nash equilibrium (NE). Moreover, we show the existence of a stationary NE, where each player chooses their action according to a probability distribution over the action set that does not change across rounds. Remarkably, we show that all NE are effectively the same as the stationary NE, thus showing that RUC games admit an almost unique NE. Lastly, we also show how to compute (approximate) NE for RUC games.
Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however, call for a multi-modal approach and, therefore, for multi-tasking models. Multi-task learning (MTL) aims to leverage useful information across tasks to improve the generalization capability of a model. This thesis is concerned with multi-task learning in the context of computer vision. First, we review existing approaches for MTL. Next, we propose several methods that tackle important aspects of multi-task learning. The proposed methods are evaluated on various benchmarks. The results show several advances in the state-of-the-art of multi-task learning. Finally, we discuss several possibilities for future work.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.
Learning with limited data is a key challenge for visual recognition. Few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them is the target task. In this paper, we propose a novel approach to adapt the embedding model to the target classification task, yielding embeddings that are task-specific and are discriminative. To this end, we employ a type of self-attention mechanism called Transformer to transform the embeddings from task-agnostic to task-specific by focusing on relating instances from the test instances to the training instances in both seen and unseen classes. Our approach also extends to both transductive and generalized few-shot classification, two important settings that have essential use cases. We verify the effectiveness of our model on two standard benchmark few-shot classification datasets --- MiniImageNet and CUB, where our approach demonstrates state-of-the-art empirical performance.
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.