In simplicial complexes it is well known that many of the global properties of the complex, can be deduced from expansion properties of its links. This phenomenon was first discovered by Garland [G]. In this work we develop a local to global machinery for general posets. We first show that the basic localization principle of Garland generalizes to more general posets. We then show that notable local to global theorems for simplicial complexes arise from general principles for general posets with expanding links. Specifically, we prove the following theorems for general posets satisfying some assumptions: Expanding links (one sided expansion) imply fast convergence of high dimensional random walks (generalization [KO,AL]); Expanding links imply Trickling down theorem (generalizing [O]); and a poset has expanding links (with two sided expansion) iff it satisfies a global random walk convergence property (generalization [DDFH]). We axiomatize general conditions on posets that imply local to global theorems. By developing this local to global machinery for general posets we discover that some posets behave better than simplicial complexes with respect to local to global implications. Specifically, we get a trickling down theorem for some posets (e.g. the Grassmanian poset) which is better behaved than the trickling down theorem known for simplicial complexes. In addition to this machinery, we also present a method to construct a new poset out of a pair of an initial poset and an auxiliary simplicial complex. By applying this procedure to the case where the pair is the Grassmanian poset and a bounded degree high dimensional expander, we obtain a bounded degree Grassmanian poset. We prove, using the tools described above, that this poset is a bounded degree expanding Grassmanian poset, partially proving a conjecture of [DDFH].
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex - but very common - machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data for general compact metric spaces. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmicaly slowly.
Machine learning and computational intelligence technologies gain more and more popularity as possible solution for issues related to the power grid. One of these issues, the power flow calculation, is an iterative method to compute the voltage magnitudes of the power grid's buses from power values. Machine learning and, especially, artificial neural networks were successfully used as surrogates for the power flow calculation. Artificial neural networks highly rely on the quality and size of the training data, but this aspect of the process is apparently often neglected in the works we found. However, since the availability of high quality historical data for power grids is limited, we propose the Correlation Sampling algorithm. We show that this approach is able to cover a larger area of the sampling space compared to different random sampling algorithms from the literature and a copula-based approach, while at the same time inter-dependencies of the inputs are taken into account, which, from the other algorithms, only the copula-based approach does.
The latest biological findings discover that the motionless 'lock-and-key' theory is no longer applicable and the flexibility of both the receptor and ligand plays a significant role in helping understand the principles of the binding affinity prediction. Based on this mechanism, molecular dynamics (MD) simulations have been invented as a useful tool to investigate the dynamical properties of this molecular system. However, the computational expenditure prohibits the growth of reported protein trajectories. To address this insufficiency, we present a novel spatial-temporal pre-training protocol, PretrainMD, to grant the protein encoder the capacity to capture the time-dependent geometric mobility along MD trajectories. Specifically, we introduce two sorts of self-supervised learning tasks: an atom-level denoising generative task and a protein-level snapshot ordering task. We validate the effectiveness of PretrainMD through the PDBbind dataset for both linear-probing and fine-tuning. Extensive experiments show that our PretrainMD exceeds most state-of-the-art methods and achieves comparable performance. More importantly, through visualization we discover that the learned representations by pre-training on MD trajectories without any label from the downstream task follow similar patterns of the magnitude of binding affinities. This strongly aligns with the fact that the motion of the interactions of protein and ligand maintains the key information of their binding. Our work provides a promising perspective of self-supervised pre-training for protein representations with very fine temporal resolutions and hopes to shed light on the further usage of MD simulations for the biomedical deep learning community.
In this work, we introduce a novel approach to formulating an artificial viscosity for shock capturing in nonlinear hyperbolic systems by utilizing the property that the solutions of hyperbolic conservation laws are not reversible in time in the vicinity of shocks. The proposed approach does not require any additional governing equations or a priori knowledge of the hyperbolic system in question, is independent of the mesh and approximation order, and requires the use of only one tunable parameter. The primary novelty is that the resulting artificial viscosity is unique for each component of the conservation law which is advantageous for systems in which some components exhibit discontinuities while others do not. The efficacy of the method is shown in numerical experiments of multi-dimensional hyperbolic conservation laws such as nonlinear transport, Euler equations, and ideal magnetohydrodynamics using a high-order discontinuous spectral element method on unstructured grids.
The number of down-steps between pairs of up-steps in $k_t$-Dyck paths, a generalization of Dyck paths consisting of steps $\{(1, k), (1, -1)\}$ such that the path stays (weakly) above the line $y=-t$, is studied. Results are proved bijectively and by means of generating functions, and lead to several interesting identities as well as links to other combinatorial structures. In particular, there is a connection between $k_t$-Dyck paths and perforation patterns for punctured convolutional codes (binary matrices) used in coding theory. Surprisingly, upon restriction to usual Dyck paths this yields a new combinatorial interpretation of Catalan numbers.
The shift towards end-to-end deep learning has brought unprecedented advances in many areas of computer vision. However, deep neural networks are trained on images with resolutions that rarely exceed $1,000 \times 1,000$ pixels. The growing use of scanners that create images with extremely high resolutions (average can be $100,000 \times 100,000$ pixels) thereby presents novel challenges to the field. Most of the published methods preprocess high-resolution images into a set of smaller patches, imposing an a priori belief on the best properties of the extracted patches (magnification, field of view, location, etc.). Herein, we introduce Magnifying Networks (MagNets) as an alternative deep learning solution for gigapixel image analysis that does not rely on a preprocessing stage nor requires the processing of billions of pixels. MagNets can learn to dynamically retrieve any part of a gigapixel image, at any magnification level and field of view, in an end-to-end fashion with minimal ground truth (a single global, slide-level label). Our results on the publicly available Camelyon16 and Camelyon17 datasets corroborate to the effectiveness and efficiency of MagNets and the proposed optimization framework for whole slide image classification. Importantly, MagNets process far less patches from each slide than any of the existing approaches ($10$ to $300$ times less).
The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.
Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.