Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum physics, we reverse the time direction of the graph heat equation. The resulted reversing process yields a class of high pass filtering functions that enhance the sharpness of graph node features. Leveraging this concept, we introduce the Multi-Scaled Heat Kernel based GNN (MHKG) by amalgamating diverse filtering functions' effects on node features. To explore more flexible filtering conditions, we further generalize MHKG into a model termed G-MHKG and thoroughly show the roles of each element in controlling over-smoothing, over-squashing and expressive power. Notably, we illustrate that all aforementioned issues can be characterized and analyzed via the properties of the filtering functions, and uncover a trade-off between over-smoothing and over-squashing: enhancing node feature sharpness will make model suffer more from over-squashing, and vice versa. Furthermore, we manipulate the time again to show how G-MHKG can handle both two issues under mild conditions. Our conclusive experiments highlight the effectiveness of proposed models. It surpasses several GNN baseline models in performance across graph datasets characterized by both homophily and heterophily.
Learning with Errors (LWE) is a hard math problem used in post-quantum cryptography. Homomorphic Encryption (HE) schemes rely on the hardness of the LWE problem for their security, and two LWE-based cryptosystems were recently standardized by NIST for digital signatures and key exchange (KEM). Thus, it is critical to continue assessing the security of LWE and specific parameter choices. For example, HE uses secrets with small entries, and the HE community has considered standardizing small sparse secrets to improve efficiency and functionality. However, prior work, SALSA and PICANTE, showed that ML attacks can recover sparse binary secrets. Building on these, we propose VERDE, an improved ML attack that can recover sparse binary, ternary, and narrow Gaussian secrets. Using improved preprocessing and secret recovery techniques, VERDE can attack LWE with larger dimensions ($n=512$) and smaller moduli ($\log_2 q=12$ for $n=256$), using less time and power. We propose novel architectures for scaling. Finally, we develop a theory that explains the success of ML LWE attacks.
Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations, it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with the surrogate, but this can result in bias and poor uncertainty quantification. In this paper we propose a new method for adjusting approximate posterior samples to reduce bias and produce more accurate uncertainty quantification. We do this by optimizing a transform of the approximate posterior that maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We demonstrate good performance of the new method on several examples of increasing complexity.
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.
We introduce a convergent hierarchy of lower bounds on the minimum value of a real homogeneous polynomial over the sphere. The main practical advantage of our hierarchy over the sum-of-squares (SOS) hierarchy is that the lower bound at each level of our hierarchy is obtained by a minimum eigenvalue computation, as opposed to the full semidefinite program (SDP) required at each level of SOS. In practice, this allows us to go to much higher levels than are computationally feasible for the SOS hierarchy. For both hierarchies, the underlying space at the $k$-th level is the set of homogeneous polynomials of degree $2k$. We prove that our hierarchy converges as $O(1/k)$ in the level $k$, matching the best-known convergence of the SOS hierarchy when the number of variables $n$ is less than the half-degree $d$ (the best-known convergence of SOS when $n \geq d$ is $O(1/k^2)$). More generally, we introduce a convergent hierarchy of minimum eigenvalue computations for minimizing the inner product between a real tensor and an element of the spherical Segre-Veronese variety, with similar convergence guarantees. As examples, we obtain hierarchies for computing the (real) tensor spectral norm, and for minimizing biquadratic forms over the sphere. Hierarchies of eigencomputations for more general constrained polynomial optimization problems are discussed.
Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies. Finally, theoretical guarantees and extensive experiments show the high efficiency of the proposed algorithm. An implementation of regularized Stein thinning as the kernax library in python and JAX is available at //gitlab.com/drti/kernax.
Academic writing is an indispensable yet laborious part of the research enterprise. This Perspective maps out principles and methods for using generative artificial intelligence (AI), specifically large language models (LLMs), to elevate the quality and efficiency of academic writing. We introduce a human-AI collaborative framework that delineates the rationale (why), process (how), and nature (what) of AI engagement in writing. The framework pinpoints both short-term and long-term reasons for engagement and their underlying mechanisms (e.g., cognitive offloading and imaginative stimulation). It reveals the role of AI throughout the writing process, conceptualized through a two-stage model for human-AI collaborative writing, and the nature of AI assistance in writing, represented through a model of writing-assistance types and levels. Building on this framework, we describe effective prompting techniques for incorporating AI into the writing routine (outlining, drafting, and editing) as well as strategies for maintaining rigorous scholarship, adhering to varied journal policies, and avoiding overreliance on AI. Ultimately, the prudent integration of AI into academic writing can ease the communication burden, empower authors, accelerate discovery, and promote diversity in science.
Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requirements pose significant challenges. Moreover, both frequentist NNs and BNNs suffer from overfitting issues when dealing with noisy and sparse data, which render their predictions unwieldy away from the available data space. To address both these problems simultaneously, we leverage insights from a hierarchical setting in which the parameter priors are conditional on hyperparameters to construct a BNN by applying a semi-analytical framework known as nonlinear sparse Bayesian learning (NSBL). We call our network sparse Bayesian neural network (SBNN) which aims to address the practical and computational issues associated with BNNs. Simultaneously, imposing a sparsity-inducing prior encourages the automatic pruning of redundant parameters based on the automatic relevance determination (ARD) concept. This process involves removing redundant parameters by optimally selecting the precision of the parameters prior probability density functions (pdfs), resulting in a tractable treatment for overfitting. To demonstrate the benefits of the SBNN algorithm, the study presents an illustrative regression problem and compares the results of a BNN using standard Bayesian inference, hierarchical Bayesian inference, and a BNN equipped with the proposed algorithm. Subsequently, we demonstrate the importance of considering the full parameter posterior by comparing the results with those obtained using the Laplace approximation with and without NSBL.
We study the stability of randomized Taylor schemes for ODEs. We consider three notions of probabilistic stability: asymptotic stability, mean-square stability, and stability in probability. We prove fundamental properties of the probabilistic stability regions and benchmark them against the absolute stability regions for deterministic Taylor schemes.
Deep Learning (DL) is the most widely used tool in the contemporary field of computer vision. Its ability to accurately solve complex problems is employed in vision research to learn deep neural models for a variety of tasks, including security critical applications. However, it is now known that DL is vulnerable to adversarial attacks that can manipulate its predictions by introducing visually imperceptible perturbations in images and videos. Since the discovery of this phenomenon in 2013~[1], it has attracted significant attention of researchers from multiple sub-fields of machine intelligence. In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018. Many of those contributions have inspired new directions in this area, which has matured significantly since witnessing the first generation methods. Hence, as a legacy sequel of [2], this literature review focuses on the advances in this area since 2018. To ensure authenticity, we mainly consider peer-reviewed contributions published in the prestigious sources of computer vision and machine learning research. Besides a comprehensive literature review, the article also provides concise definitions of technical terminologies for non-experts in this domain. Finally, this article discusses challenges and future outlook of this direction based on the literature reviewed herein and [2].
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.