Many networks in political and social research are bipartite, with edges connecting exclusively across two distinct types of nodes. A common example includes cosponsorship networks, in which legislators are connected indirectly through the bills they support. Yet most existing network models are designed for unipartite networks, where edges can arise between any pair of nodes. We show that using a unipartite network model to analyze bipartite networks, as often done in practice, can result in aggregation bias. To address this methodological problem, we develop a statistical model of bipartite networks by extending the popular mixed-membership stochastic blockmodel. Our model allows researchers to identify the groups of nodes, within each node type, that share common patterns of edge formation. The model also incorporates both node and dyad-level covariates as the predictors of the edge formation patterns. We develop an efficient computational algorithm for fitting the model, and apply it to cosponsorship data from the United States Senate. We show that senators tapped into communities defined by party lines and seniority when forming cosponsorships on bills, while the pattern of cosponsorships depends on the timing and substance of legislations. We also find evidence for norms of reciprocity, and uncover the substantial role played by policy expertise in the formation of cosponsorships between senators and legislation. An open-source software package is available for implementing the proposed methodology.
The concepts of Bayesian prediction, model comparison, and model selection have developed significantly over the last decade. As a result, the Bayesian community has witnessed a rapid growth in theoretical and applied contributions to building and selecting predictive models. Projection predictive inference in particular has shown promise to this end, finding application across a broad range of fields. It is less prone to over-fitting than na\"ive selection based purely on cross-validation or information criteria performance metrics, and has been known to out-perform other methods in terms of predictive performance. We survey the core concept and contemporary contributions to projection predictive inference, and present a safe, efficient, and modular workflow for prediction-oriented model selection therein. We also provide an interpretation of the projected posteriors achieved by projection predictive inference in terms of their limitations in causal settings.
Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
Most link prediction methods return estimates of the connection probability of missing edges in a graph. Such output can be used to rank the missing edges, from most to least likely to be a true edge, but it does not directly provide a classification into true and non-existent. In this work, we consider the problem of identifying a set of true edges with a control of the false discovery rate (FDR). We propose a novel method based on high-level ideas from the literature on conformal inference. The graph structure induces intricate dependence in the data, which we carefully take into account, as this makes the setup different from the usual setup in conformal inference, where exchangeability is assumed. The FDR control is empirically demonstrated for both simulated and real data.
The capacity to generate meaningful symbols and effectively employ them for advanced cognitive processes, such as communication, reasoning, and planning, constitutes a fundamental and distinctive aspect of human intelligence. Existing deep neural networks still notably lag human capabilities in terms of generating symbols for higher cognitive functions. Here, we propose a solution (symbol emergence artificial network (SEA-net)) to endow neural networks with the ability to create symbols, understand semantics, and achieve communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that allows the system to acquire new functions purely by symbolic manipulation or communication. In addition, these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We believe that the proposed framework will be instrumental in producing more capable systems that can synergize the strengths of connectionist and symbolic approaches for artificial intelligence (AI).
Statistical analysis of social networks is a predominant methodology in political science research. In this article we implement network methods to characterize the presidential inauguration speech and identify political communities in Colombia. We propose an empirical approach to analize the discursive structure of the heads of state and the configuration of alliance and work relationships between prominent figures of Colombian politics. Thus, we implement network methods from two perspectives, words and political actors. We conclude on the relevance of social network statistics to identify frequent and important terms in the communicative action of president figures, and to examine cohesion such discourses. Finally, we distinguish notable actors in the consolidation of working relationships and alliances as well as political communities.
We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.
The estimation of unknown parameters in simulations, also known as calibration, is crucial for practical management of epidemics and prediction of pandemic risk. A simple yet widely used approach is to estimate the parameters by minimizing the sum of the squared distances between actual observations and simulation outputs. It is shown in this paper that this method is inefficient, particularly when the epidemic models are developed based on certain simplifications of reality, also known as imperfect models which are commonly used in practice. To address this issue, a new estimator is introduced that is asymptotically consistent, has a smaller estimation variance than the least squares estimator, and achieves the semiparametric efficiency. Numerical studies are performed to examine the finite sample performance. The proposed method is applied to the analysis of the COVID-19 pandemic for 20 countries based on the SEIR (Susceptible-Exposed-Infectious-Recovered) model with both deterministic and stochastic simulations. The estimation of the parameters, including the basic reproduction number and the average incubation period, reveal the risk of disease outbreaks in each country and provide insights to the design of public health interventions.
Graph neural networks generalize conventional neural networks to graph-structured data and have received widespread attention due to their impressive representation ability. In spite of the remarkable achievements, the performance of Euclidean models in graph-related learning is still bounded and limited by the representation ability of Euclidean geometry, especially for datasets with highly non-Euclidean latent anatomy. Recently, hyperbolic space has gained increasing popularity in processing graph data with tree-like structure and power-law distribution, owing to its exponential growth property. In this survey, we comprehensively revisit the technical details of the current hyperbolic graph neural networks, unifying them into a general framework and summarizing the variants of each component. More importantly, we present various HGNN-related applications. Last, we also identify several challenges, which potentially serve as guidelines for further flourishing the achievements of graph learning in hyperbolic spaces.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.