Permutation synchronization is an important problem in computer science that constitutes the key step of many computer vision tasks. The goal is to recover $n$ latent permutations from their noisy and incomplete pairwise measurements. In recent years, spectral methods have gained increasing popularity thanks to their simplicity and computational efficiency. Spectral methods utilize the leading eigenspace $U$ of the data matrix and its block submatrices $U_1,U_2,\ldots, U_n$ to recover the permutations. In this paper, we propose a novel and statistically optimal spectral algorithm. Unlike the existing methods which use $\{U_jU_1^\top\}_{j\geq 2}$, ours constructs an anchor matrix $M$ by aggregating useful information from all the block submatrices and estimates the latent permutations through $\{U_jM^\top\}_{j\geq 1}$. This modification overcomes a crucial limitation of the existing methods caused by the repetitive use of $U_1$ and leads to an improved numerical performance. To establish the optimality of the proposed method, we carry out a fine-grained spectral analysis and obtain a sharp exponential error bound that matches the minimax rate.
Spectral hypergraph sparsification, an attempt to extend well-known spectral graph sparsification to hypergraphs, has been extensively studied over the past few years. For undirected hypergraphs, Kapralov, Krauthgamer, Tardos, and Yoshida~(2022) have proved an $\varepsilon$-spectral sparsifier of the optimal $O^*(n)$ size, where $n$ is the number of vertices and $O^*$ suppresses the $\varepsilon^{-1}$ and $\log n$ factors. For directed hypergraphs, however, the optimal sparsifier size has not been known. Our main contribution is the first algorithm that constructs an $O^*(n^2)$-size $\varepsilon$-spectral sparsifier for a weighted directed hypergraph. Our result is optimal up to the $\varepsilon^{-1}$ and $\log n$ factors since there is a lower bound of $\Omega(n^2)$ even for directed graphs. We also show the first non-trivial lower bound of $\Omega(n^2/\varepsilon)$ for general directed hypergraphs. The basic idea of our algorithm is borrowed from the spanner-based sparsification for ordinary graphs by Koutis and Xu~(2016). Their iterative sampling approach is indeed useful for designing sparsification algorithms in various circumstances. To demonstrate this, we also present a similar iterative sampling algorithm for undirected hypergraphs that attains one of the best size bounds, enjoys parallel implementation, and can be transformed to be fault-tolerant.
We consider a generalized collision channel model for general multi-user communication systems, an extension of Massey and Mathys' collision channel without feedback for multiple access communications. In our model, there are multiple transmitters and receivers sharing the same communication channel. The transmitters are not synchronized and arbitrary time offsets between transmitters and receivers are assumed. A ``collision" occurs if two or more packets from different transmitters partially or completely overlap at a receiver. Our model includes the original collision channel as a special case. This paper focuses on reliable throughputs that are approachable for arbitrary time offsets. We consider both slot-synchronized and non-synchronized cases and characterize their reliable throughput regions for the generalized collision channel model. These two regions are proven to coincide. Moreover, it is shown that the protocol sequences constructed for multiple access communication remain ``throughput optimal" in the generalized collision channel model. We also identify the protocol sequences that can approach the outer boundary of the reliable throughput region.
We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $\phi$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $\alpha$-divergences. Our construction proceeds in two stages. First, for any feature map and any $\alpha$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $\phi$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $\alpha$-divergences $\alpha \in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps.
Bayesian statistics is concerned with conducting posterior inference for the unknown quantities in a given statistical model. Conventional Bayesian inference requires the specification of a probabilistic model for the observed data, and the construction of the resulting likelihood function. However, sometimes the model is so complicated that evaluation of the likelihood is infeasible, which renders exact Bayesian inference impossible. Bayesian synthetic likelihood (BSL) is a posterior approximation procedure that can be used to conduct inference in situations where the likelihood is intractable, but where simulation from the model is straightforward. In this entry, we give a high-level presentation of BSL, and its extensions aimed at delivering scalable and robust posterior inferences.
View synchronisation is an important component of many modern Byzantine Fault Tolerant State Machine Replication (SMR) systems in the partial synchrony model. Roughly, the efficiency of view synchronisation is measured as the word complexity and latency required for moving from being synchronised in a view of one correct leader to being synchronised in the view of the next correct leader. The efficiency of view synchronisation has emerged as a major bottleneck in the efficiency of SMR systems as a whole. A key question remained open: Do there exist view synchronisation protocols with asymptotically optimal quadratic worst-case word complexity that also obtain linear message complexity and responsiveness when moving between consecutive correct leaders? We answer this question affirmatively with a new view synchronisation protocol for partial synchrony assuming minimal clock synchronisation, called \emph{Fever}. If $n$ is the number of processors and $t$ is the largest integer $<n/3$, then Fever has resilience $t$, and in all executions with at most $0\leq f\leq t$ Byzantine parties and network delays of at most $\delta \leq \Delta$ after $GST$ (where $f$ and $\delta$ are unknown), Fever has worst-case word complexity $O(fn+n)$ and worst-case latency $O(\Delta f + \delta)$.
Incrementality experiments compare customers exposed to a marketing action designed to increase sales to those randomly assigned to a control group. These experiments suffer from noisy responses which make precise estimation of the average treatment effect (ATE) and marketing ROI difficult. We develop a model that improves the precision by estimating separate treatment effects for three latent strata defined by potential outcomes in the experiment -- customers who would buy regardless of ad exposure, those who would buy only if exposed to ads and those who would not buy regardless. The overall ATE is estimated by averaging the strata-level effects, and this produces a more precise estimator of the ATE over a wide range of conditions typical of marketing experiments. Analytical results and simulations show that the method decreases the sampling variance of the ATE most when (1) there are large differences in the treatment effect between latent strata and (2) the model used to estimate the strata-level effects is well-identified. Applying the procedure to 5 catalog experiments shows a reduction of 30-60% in the variance of the overall ATE. This leads to a substantial decrease in decision errors when the estimator is used to determine whether ads should be continued or discontinued.
In this paper we address the rotation synchronization problem, where the objective is to recover absolute rotations starting from pairwise ones, where the unknowns and the measures are represented as nodes and edges of a graph, respectively. This problem is an essential task for structure from motion and simultaneous localization and mapping. We focus on the formulation of synchronization via neural networks, which has only recently begun to be explored in the literature. Inspired by deep matrix completion, we express rotation synchronization in terms of matrix factorization with a deep neural network. Our formulation exhibits implicit regularization properties and, more importantly, is unsupervised, whereas previous deep approaches are supervised. Our experiments show that we achieve comparable accuracy to the closest competitors in most scenes, while working under weaker assumptions.
Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .