We study the complexity of the following related computational tasks concerning a fixed countable graph G: 1. Does a countable graph H provided as input have a(n induced) subgraph isomorphic to G? 2. Given a countable graph H that has a(n induced) subgraph isomorphic to G, find such a subgraph. The framework for our investigations is given by effective Wadge reducibility and by Weihrauch reducibility. Our work follows on "Reverse mathematics and Weihrauch analysis motivated by finite complexity theory" (Computability, 2021) by BeMent, Hirst and Wallace, and we answer several of their open questions.
We present some basic elements of the theory of generalised Br\`{e}gman relative entropies over nonreflexive Banach spaces. Using nonlinear embeddings of Banach spaces together with the Euler--Legendre functions, this approach unifies two former approaches to Br\`{e}gman relative entropy: one based on reflexive Banach spaces, another based on differential geometry. This construction allows to extend Br\`{e}gman relative entropies, and related geometric and operator structures, to arbitrary-dimensional state spaces of probability, quantum, and postquantum theory. We give several examples, not considered previously in the literature.
By abstracting over well-known properties of De Bruijn's representation with nameless dummies, we design a new theory of syntax with variable binding and capture-avoiding substitution. We propose it as a simpler alternative to Fiore, Plotkin, and Turi's approach, with which we establish a strong formal link. We also show that our theory easily incorporates simple types and equations between terms.
Fully decentralized learning is gaining momentum for training AI models at the Internet's edge, addressing infrastructure challenges and privacy concerns. In a decentralized machine learning system, data is distributed across multiple nodes, with each node training a local model based on its respective dataset. The local models are then shared and combined to form a global model capable of making accurate predictions on new data. Our exploration focuses on how different types of network structures influence the spreading of knowledge - the process by which nodes incorporate insights gained from learning patterns in data available on other nodes across the network. Specifically, this study investigates the intricate interplay between network structure and learning performance using three network topologies and six data distribution methods. These methods consider different vertex properties, including degree centrality, betweenness centrality, and clustering coefficient, along with whether nodes exhibit high or low values of these metrics. Our findings underscore the significance of global centrality metrics (degree, betweenness) in correlating with learning performance, while local clustering proves less predictive. We highlight the challenges in transferring knowledge from peripheral to central nodes, attributed to a dilution effect during model aggregation. Additionally, we observe that central nodes exert a pull effect, facilitating the spread of knowledge. In examining degree distribution, hubs in Barabasi-Albert networks positively impact learning for central nodes but exacerbate dilution when knowledge originates from peripheral nodes. Finally, we demonstrate the formidable challenge of knowledge circulation outside of segregated communities.
Bayesian statistical graphical models are typically either continuous and parametric (Gaussian, parameterized by the graph-dependent precision matrix with Wishart-type priors) or discrete and non-parametric (with graph-dependent structure of probabilities of cells and Dirichlet-type priors). We propose to break this dichotomy by introducing two discrete parametric graphical models on finite decomposable graphs: the graph negative multinomial and the graph multinomial distributions. These models interpolate between the product of univariate negative binomial laws and the negative multinomial distribution, and between the product of binomial laws and the multinomial distribution, respectively. We derive their Markov decomposition and present related probabilistic models representations. We also introduce graphical versions of the Dirichlet distribution and inverted Dirichlet distribution, which serve as conjugate priors for the two discrete graphical Markov models. We derive explicit normalizing constants for both graphical Dirichlet laws and demonstrate that their independence structure (a graphical version of neutrality) yields a strong hyper Markov property for both Bayesian models. We also provide characterization theorems for graphical Dirichlet laws via strong hyper Markov property. Finally, we develop a model selection procedure for the Bayesian graphical negative multinomial model with respective Dirichlet-type priors.
This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and Mistral, focusing on the adaptation of Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals significant insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges faced by current LLMs in generating diverse and high-quality text.
Evaluating the Expected Information Gain (EIG) is a critical task in many areas of computational science and statistics, necessitating the approximation of nested integrals. Available techniques for this problem based on Quasi-Monte Carlo (QMC) methods have primarily focused on enhancing the efficiency of the inner integral approximation. In this work, we introduce a novel approach that extends the scope of these efforts to address inner and outer expectations simultaneously. Leveraging the principles of Owen's scrambling, we develop a randomized quasi-Monte Carlo (RQMC) method that improves the approximation of nested integrals. We also indicate how to combine this methodology with Importance Sampling to address a measure concentration arising in the inner integral. Our RQMC method capitalizes on the unique structure of nested expectations to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we handle integrands exhibiting infinite variation in the Hardy-Krause (HK) sense, paving the way for theoretically sound error estimates. We derive asymptotic error bounds for the bias and variance of our estimator. In addition, we provide nearly optimal sample sizes for the inner and outer RQMC approximations, which are helpful for the actual numerical implementations. We verify the quality of our estimator through numerical experiments in the context of Bayesian optimal experimental design. Specifically, we compare the computational efficiency of our RQMC method against standard nested Monte Carlo integration across two case studies: one in thermo-mechanics and the other in pharmacokinetics. These examples highlight our approach's computational savings and enhanced applicability, showcasing the advantages of estimating the Expected Information Gain with greater efficiency and reduced computational cost.
Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. Deep metric learning aims to learn deep neural networks for feature embeddings, distances of which satisfy given constraint. In deep metric learning, ensemble takes average of distances learned by multiple learners. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.