We present algorithms that compute the terminal configurations for sandpile instances in $O(n \log n)$ time on trees and $O(n)$ time on paths, where $n$ is the number of vertices. The Abelian Sandpile model is a well-known model used in exploring self-organized criticality. Despite a large amount of work on other aspects of sandpiles, there have been limited results in efficiently computing the terminal state, known as the sandpile prediction problem. Our algorithm improves the previous best runtime of $O(n \log^5 n)$ on trees [Ramachandran-Schild SODA '17] and $O(n \log n)$ on paths [Moore-Nilsson '99]. To do so, we move beyond the simulation of individual events by directly computing the number of firings for each vertex. The computation is accelerated using splittable binary search trees. We also generalize our algorithm to adapt at most three sink vertices, which is the first prediction algorithm faster than mere simulation on a sandpile model with sinks. We provide a general reduction that transforms the prediction problem on an arbitrary graph into problems on its subgraphs separated by any vertex set $P$. The reduction gives a time complexity of $O(\log^{|P|} n \cdot T)$ where $T$ denotes the total time for solving on each subgraph. In addition, we give algorithms in $O(n)$ time on cliques and $O(n \log^2 n)$ time on pseudotrees.
We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models estimated within $\nabla$-RANSAC to guide the network learning accurate and useful inlier probabilities or to train feature detection and matching networks. Our method directly maximizes the probability of drawing a good hypothesis, allowing us to learn better sampling distributions. We test $\nabla$-RANSAC on various real-world scenarios on fundamental and essential matrix estimation, and 3D point cloud registration, outdoors and indoors, with handcrafted and learning-based features. It is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives. The code and trained models are available at //github.com/weitong8591/differentiable_ransac.
Event-based sensors, distinguished by their high temporal resolution of 1$\mathrm{\mu s}$ and a dynamic range of 120$\mathrm{dB}$, stand out as ideal tools for deployment in fast-paced settings like vehicles and drones. Traditional object detection techniques that utilize Artificial Neural Networks (ANNs) face challenges due to the sparse and asynchronous nature of the events these sensors capture. In contrast, Spiking Neural Networks (SNNs) offer a promising alternative, providing a temporal representation that is inherently aligned with event-based data. This paper explores the unique membrane potential dynamics of SNNs and their ability to modulate sparse events. We introduce an innovative spike-triggered adaptive threshold mechanism designed for stable training. Building on these insights, we present a specialized spiking feature pyramid network (SpikeFPN) optimized for automotive event-based object detection. Comprehensive evaluations demonstrate that SpikeFPN surpasses both traditional SNNs and advanced ANNs enhanced with attention mechanisms. Evidently, SpikeFPN achieves a mean Average Precision (mAP) of 0.477 on the {GEN1 Automotive Detection (GAD)} benchmark dataset, marking a significant increase of 9.7\% over the previous best SNN. Moreover, the efficient design of SpikeFPN ensures robust performance while optimizing computational resources, attributed to its innate sparse computation capabilities.
We introduce two new classes of measures of information for statistical experiments which generalise and subsume $\phi$-divergences, integral probability metrics, $\mathfrak{N}$-distances (MMD), and $(f,\Gamma)$ divergences between two or more distributions. This enables us to derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem, thus extending the variational $\phi$-divergence representation to multiple distributions in an entirely symmetric manner. The new families of divergence are closed under the action of Markov operators which yields an information processing equality which is a refinement and generalisation of the classical data processing inequality. This equality gives insight into the significance of the choice of the hypothesis class in classical risk minimization.
The combined universal probability $\mathbf{m}(D)$ of strings $x$ in sets $D$ is close to max $\m(x)$ over $x$ in $D$: their logs differ by at most $D$'s information $\mathbf{I}(D:\mathcal{H})$ about the halting sequence $\mathcal{H}$.
An adjacency sketching or implicit labeling scheme for a family $\cal F$ of graphs is a method that defines for any $n$ vertex $G \in \cal F$ an assignment of labels to each vertex in $G$, so that the labels of two vertices tell you whether or not they are adjacent. The goal is to come up with labeling schemes that use as few bits as possible to represent the labels. By using randomness when assigning labels, it is sometimes possible to produce adjacency sketches with much smaller label sizes, but this comes at the cost of introducing some probability of error. Both deterministic and randomized labeling schemes have been extensively studied, as they have applications for distributed data structures and deeper connections to universal graphs and communication complexity. The main question of interest is which graph families have schemes using short labels, usually $O(\log n)$ in the deterministic case or constant for randomized sketches. In this work we consider the resilience of probabilistic adjacency sketches against an adversary making adaptive queries to the labels. This differs from the previously analyzed probabilistic setting which is ``one shot". We show that in the adaptive adversarial case the size of the labels is tightly related to the maximal degree of the graphs in $\cal F$. This results in a stronger characterization compared to what is known in the non-adversarial setting. In more detail, we construct sketches that fail with probability $\varepsilon$ for graphs with maximal degree $d$ using $2d\log (1/\varepsilon)$ bit labels and show that this is roughly the best that can be done for any specific graph of maximal degree $d$, e.g.\ a $d$-ary tree.
Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.
Given a continuous definable function $f: S \to \mathbb{R}$ on a definable set $S$, we study sublevel sets of the form $S^f_t = \{x \in S: f(x) \leq t\}$ for all $t \in \mathbb{R}$. Using o-minimal structures, we prove that the Euler characteristic of $S^f_t$ is right continuous with respect to $t$. Furthermore, when $S$ is compact, we show that $S^f_{t+\delta}$ deformation retracts to $S^f_t$ for all sufficiently small $\delta > 0$. Applying these results, we also characterize the relationship between the concepts of Euler characteristic transform and smooth Euler characteristic transform in topological data analysis.
We propose and analyze exact and inexact regularized Newton-type methods for finding a global saddle point of \emph{convex-concave} unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information is much more involved. In this paper, we examine how second-order information can be used to speed up extra-gradient methods, even under inexactness. Specifically, we show that the proposed algorithms generate iterates that remain within a bounded set and the averaged iterates converge to an $\epsilon$-saddle point within $O(\epsilon^{-2/3})$ iterations in terms of a restricted gap function. Our algorithms match the theoretically established lower bound in this context and our analysis provides a simple and intuitive convergence analysis for second-order methods without any boundedness requirements. Finally, we present a series of numerical experiments on synthetic and real data that demonstrate the efficiency of the proposed algorithms.
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
Traditional methods for link prediction can be categorized into three main types: graph structure feature-based, latent feature-based, and explicit feature-based. Graph structure feature methods leverage some handcrafted node proximity scores, e.g., common neighbors, to estimate the likelihood of links. Latent feature methods rely on factorizing networks' matrix representations to learn an embedding for each node. Explicit feature methods train a machine learning model on two nodes' explicit attributes. Each of the three types of methods has its unique merits. In this paper, we propose SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), a new framework for link prediction which combines the power of all the three types into a single graph neural network (GNN). GNN is a new type of neural network which directly accepts graphs as input and outputs their labels. In SEAL, the input to the GNN is a local subgraph around each target link. We prove theoretically that our local subgraphs also reserve a great deal of high-order graph structure features related to link existence. Another key feature is that our GNN can naturally incorporate latent features and explicit features. It is achieved by concatenating node embeddings (latent features) and node attributes (explicit features) in the node information matrix for each subgraph, thus combining the three types of features to enhance GNN learning. Through extensive experiments, SEAL shows unprecedentedly strong performance against a wide range of baseline methods, including various link prediction heuristics and network embedding methods.