A linear-time algorithm for generating auxiliary subgraphs for the 3-edge-connected components of a connected multigraph is presented. The algorithm uses an innovative graph contraction operation and makes only one pass over the graph. By contrast, the previously best-known algorithms make multiple passes over the graph to decompose it into its 2-edge-connected components or 2-vertex-connected components, then its 3-edge-connected components or 3-vertex-connected components, and then construct a cactus representation for the 2-cuts to generate the auxiliary subgraphs for the 3-edge-connected components.
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. This proof-of-concept study motivates the design of a foundational model that will be a crucial tool for the study of future detectors, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.
We study versions of Hilbert's projective metric for spaces of integrable functions of bounded growth. These metrics originate from cones which are relaxations of the cone of all non-negative functions, in the sense that they include all functions having non-negative integral values when multiplied with certain test functions. We show that kernel integral operators are contractions with respect to suitable specifications of such metrics even for kernels which are not bounded away from zero, provided that the decay to zero of the kernel is controlled. As an application to entropic optimal transport, we show exponential convergence of Sinkhorn's algorithm in settings where the marginal distributions have sufficiently light tails compared to the growth of the cost function.
Many networks can be characterised by the presence of communities, which are groups of units that are closely linked and can be relevant in understanding the system's overall function. Recently, hypergraphs have emerged as a fundamental tool for modelling systems where interactions are not limited to pairs but may involve an arbitrary number of nodes. Using a dual approach to community detection, in this study we extend the concept of link communities to hypergraphs, allowing us to extract informative clusters of highly related hyperedges. We analyze the dendrograms obtained by applying hierarchical clustering to distance matrices among hyperedges on a variety of real-world data, showing that hyperlink communities naturally highlight the hierarchical and multiscale structure of higher-order networks. Moreover, by using hyperlink communities, we are able to extract overlapping memberships from nodes, overcoming limitations of traditional hard clustering methods. Finally, we introduce higher-order network cartography as a practical tool for categorizing nodes into different structural roles based on their interaction patterns and community participation. This approach helps identify different types of individuals in a variety of real-world social systems. Our work contributes to a better understanding of the structural organization of real-world higher-order systems.
Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex constrained optimization that sequentially minimizes majorizing surrogates of the objective function in each block coordinate while the other coordinates are held fixed. BMM entails a large class of optimization algorithms such as block coordinate descent and its proximal-point variant, expectation-minimization, and block projected gradient descent. We establish that for general constrained nonconvex optimization, BMM with strongly convex surrogates can produce an $\epsilon$-stationary point within $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$ iterations and asymptotically converges to the set of stationary points. Furthermore, we propose a trust-region variant of BMM that can handle surrogates that are only convex and still obtain the same iteration complexity and asymptotic stationarity. These results hold robustly even when the convex sub-problems are inexactly solved as long as the optimality gaps are summable. As an application, we show that a regularized version of the celebrated multiplicative update algorithm for nonnegative matrix factorization by Lee and Seung has iteration complexity of $O(\epsilon^{-2}(\log \epsilon^{-1})^{2})$. The same result holds for a wide class of regularized nonnegative tensor decomposition algorithms as well as the classical block projected gradient descent algorithm. These theoretical results are validated through various numerical experiments.
Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. In this paper, we introduce a bilevel stochastic gradient method for bilevel problems with nonlinear and possibly nonconvex lower-level constraints. We also present a comprehensive convergence theory that addresses both the lower-level unconstrained and constrained cases and covers all inexact calculations of the adjoint gradient (also called hypergradient), such as the inexact solution of the lower-level problem, inexact computation of the adjoint formula (due to the inexact solution of the adjoint equation or use of a truncated Neumann series), and noisy estimates of the gradients, Hessians, and Jacobians involved. To promote the use of bilevel optimization in large-scale learning, we have developed new low-rank practical bilevel stochastic gradient methods (BSG-N-FD and~BSG-1) that do not require second-order derivatives and, in the lower-level unconstrained case, dismiss any matrix-vector products.
Scale-free dynamics, formalized by selfsimilarity, provides a versatile paradigm massively and ubiquitously used to model temporal dynamics in real-world data. However, its practical use has mostly remained univariate so far. By contrast, modern applications often demand multivariate data analysis. Accordingly, models for multivariate selfsimilarity were recently proposed. Nevertheless, they have remained rarely used in practice because of a lack of available robust estimation procedures for the vector of selfsimilarity parameters. Building upon recent mathematical developments, the present work puts forth an efficient estimation procedure based on the theoretical study of the multiscale eigenstructure of the wavelet spectrum of multivariate selfsimilar processes. The estimation performance is studied theoretically in the asymptotic limits of large scale and sample sizes, and computationally for finite-size samples. As a practical outcome, a fully operational and documented multivariate signal processing estimation toolbox is made freely available and is ready for practical use on real-world data. Its potential benefits are illustrated in epileptic seizure prediction from multi-channel EEG data.
Functionality is a graph complexity measure that extends a variety of parameters, such as vertex degree, degeneracy, clique-width, or twin-width. In the present paper, we show that functionality is bounded for box intersection graphs in $\mathbb{R}^1$, i.e. for interval graphs, and unbounded for box intersection graphs in $\mathbb{R}^3$. We also study a parameter known as symmetric difference, which is intermediate between twin-width and functionality, and show that this parameter is unbounded both for interval graphs and for unit box intersection graphs in $\mathbb{R}^2$.
We establish conditions under which latent causal graphs are nonparametrically identifiable and can be reconstructed from unknown interventions in the latent space. Our primary focus is the identification of the latent structure in measurement models without parametric assumptions such as linearity or Gaussianity. Moreover, we do not assume the number of hidden variables is known, and we show that at most one unknown intervention per hidden variable is needed. This extends a recent line of work on learning causal representations from observations and interventions. The proofs are constructive and introduce two new graphical concepts -- imaginary subsets and isolated edges -- that may be useful in their own right. As a matter of independent interest, the proofs also involve a novel characterization of the limits of edge orientations within the equivalence class of DAGs induced by unknown interventions. These are the first results to characterize the conditions under which causal representations are identifiable without making any parametric assumptions in a general setting with unknown interventions and without faithfulness.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.