We describe the categorical semantics for a simply typed variant and a simplified dependently typed variant of Cocon, a contextual modal type theory where the box modality mediates between the weak function space that is used to represent higher-order abstract syntax (HOAS) trees and the strong function space that describes (recursive) computations about them. What makes Cocon different from standard type theories is the presence of first-class contexts and contextual objects to describe syntax trees that are closed with respect to a given context of assumptions. Following M. Hofmann's work, we use a presheaf model to characterise HOAS trees. Surprisingly, this model already provides the necessary structure to also model Cocon. In particular, we can capture the contextual objects of Cocon using a comonad $\flat$ that restricts presheaves to their closed elements. This gives a simple semantic characterisation of the invariants of contextual types (e.g. substitution invariance) and identifies Cocon as a type-theoretic syntax of presheaf models. We further extend this characterisation to dependent types using categories with families and show that we can model a fragment of Cocon without recursor in the Fitch-style dependent modal type theory presented by Birkedal et. al..
It is a well-known fact that there is no complete and discrete invariant on the collection of all multiparameter persistence modules. Nonetheless, many invariants have been proposed in the literature to study multiparameter persistence modules, though each invariant will lose some amount of information. One such invariant is the generalized rank invariant. This invariant is known to be complete on the class of interval decomposable persistence modules in general, under mild assumptions on the indexing poset $P$. There is often a trade-off, where the stronger an invariant is, the more expensive it is to compute in practice. The generalized rank invariant on its own is difficult to compute, whereas the standard rank invariant is readily computable through software implementations such as RIVET. We can interpolate between these two to induce new invariants via restricting the domain of the generalized rank invariant, and this family exhibits the aforementioned trade-off. This work studies the tension which exists between computational efficiency and retaining strength when restricting the domain of the generalized rank invariant. We provide a characterization result on where such restrictions are complete invariants in the setting where $P$ is finite, and furthermore show that such restricted generalized rank invariants are stable.
In dynamical systems, it is advantageous to identify regions of flow which can exhibit maximal influence on nearby behaviour. Hyperbolic Lagrangian Coherent Structures have been introduced to obtain two-dimensional surfaces which maximise repulsion or attraction in three-dimensional dynamical systems with arbitrary time-dependence. However, the numerical method to compute them requires obtaining derivatives associated with the system, often performed through the approximation of divided differences, which can lead to significant numerical error and numerical noise. In this paper, we introduce a novel method for the numerical calculation of hyperbolic Lagrangian Coherent Structures using Differential Algebra called DA-LCS. As a form of automatic forward differentiation, it allows direct computation of the Taylor expansion of the flow, its derivatives, and the eigenvectors of the associated strain tensor, with all derivatives obtained algebraically and to machine precision. It does so without a priori information about the system, such as variational equations or explicit derivatives. We demonstrate that this can provide significant improvements in the accuracy of the Lagrangian Coherent Structures identified compared to finite-differencing methods in a series of test cases drawn from the literature. We also show how DA-LCS uncovers additional dynamical behaviour in a real-world example drawn from astrodynamics.
We identify a new class of vulnerabilities in implementations of differential privacy. Specifically, they arise when computing basic statistics such as sums, thanks to discrepancies between the implemented arithmetic using finite data types (namely, ints or floats) and idealized arithmetic over the reals or integers. These discrepancies cause the sensitivity of the implemented statistics (i.e., how much one individual's data can affect the result) to be much higher than the sensitivity we expect. Consequently, essentially all differential privacy libraries fail to introduce enough noise to hide individual-level information as required by differential privacy, and we show that this may be exploited in realistic attacks on differentially private query systems. In addition to presenting these vulnerabilities, we also provide a number of solutions, which modify or constrain the way in which the sum is implemented in order to recover the idealized or near-idealized bounds on sensitivity.
The ergodic decomposition theorem is a cornerstone result of dynamical systems and ergodic theory. It states that every invariant measure on a dynamical system is a mixture of ergodic ones. Here we formulate and prove the theorem in terms of string diagrams, using the formalism of Markov categories. We recover the usual measure-theoretic statement by instantiating our result in the category of stochastic kernels. Along the way we give a conceptual treatment of several concepts in the theory of deterministic and stochastic dynamical systems. In particular, - ergodic measures appear very naturally as particular cones of deterministic morphisms (in the sense of Markov categories); - the invariant $\sigma$-algebra of a dynamical system can be seen as a colimit in the category of Markov kernels. In line with other uses of category theory, once the necessary structures are in place, our proof of the main theorem is much simpler than traditional approaches. In particular, it does not use any quantitative limiting arguments, and it does not rely on the cardinality of the group or monoid indexing the dynamics. We hope that this result paves the way for further applications of category theory to dynamical systems, ergodic theory, and information theory.
Predictive coding (PC) is an influential theory in computational neuroscience, which argues that the cortex forms unsupervised world models by implementing a hierarchical process of prediction error minimization. PC networks (PCNs) are trained in two phases. First, neural activities are updated to optimize the network's response to external stimuli. Second, synaptic weights are updated to consolidate this change in activity -- an algorithm called \emph{prospective configuration}. While previous work has shown how in various limits, PCNs can be found to approximate backpropagation (BP), recent work has demonstrated that PCNs operating in this standard regime, which does not approximate BP, nevertheless obtain competitive training and generalization performance to BP-trained networks while outperforming them on tasks such as online, few-shot, and continual learning, where brains are known to excel. Despite this promising empirical performance, little is understood theoretically about the properties and dynamics of PCNs in this regime. In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration. We first derive analytical results concerning the inference equilibrium for PCNs and a previously unknown close connection relationship to target propagation (TP). Secondly, we provide a theoretical analysis of learning in PCNs as a variant of generalized expectation-maximization and use that to prove the convergence of PCNs to critical points of the BP loss function, thus showing that deep PCNs can, in theory, achieve the same generalization performance as BP, while maintaining their unique advantages.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks. Prior methods have been focused on overcoming this problem on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, but have largely overlooked graph neural networks (GNNs) that handle non-grid data. In this paper, we propose a novel scheme dedicated to overcoming catastrophic forgetting problem and hence strengthen continual learning in GNNs. At the heart of our approach is a generic module, termed as topology-aware weight preserving~(TWP), applicable to arbitrary form of GNNs in a plug-and-play fashion. Unlike the main stream of CNN-based continual learning methods that rely on solely slowing down the updates of parameters important to the downstream task, TWP explicitly explores the local structures of the input graph, and attempts to stabilize the parameters playing pivotal roles in the topological aggregation. We evaluate TWP on different GNN backbones over several datasets, and demonstrate that it yields performances superior to the state of the art. Code is publicly available at \url{//github.com/hhliu79/TWP}.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.
Graph Neural Networks (GNNs) for representation learning of graphs broadly follow a neighborhood aggregation framework, where the representation vector of a node is computed by recursively aggregating and transforming feature vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties and limitations. Here, we present a theoretical framework for analyzing the expressive power of GNNs in capturing different graph structures. Our results characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures. We then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theoretical findings on a number of graph classification benchmarks, and demonstrate that our model achieves state-of-the-art performance.