Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving partial differential equations (PDEs) in various scientific and engineering domains. However, traditional PINN architectures typically rely on fully connected multilayer perceptrons (MLPs), lacking the sparsity and modularity inherent in many traditional numerical solvers. This study investigates a novel approach by merging established PINN methodologies with brain-inspired neural network techniques to address this architectural limitation. We leverage Brain-Inspired Modular Training (BIMT), leveraging concepts such as locality, sparsity, and modularity inspired by the organization of the brain. Through BIMT, we demonstrate the evolution of PINN architectures from fully connected structures to highly sparse and modular forms, resulting in reduced computational complexity and memory requirements. We showcase the efficacy of this approach by solving differential equations with varying spectral components, revealing insights into the spectral bias phenomenon and its impact on neural network architecture. Moreover, we derive basic PINN building blocks through BIMT training on simple problems akin to convolutional and attention modules in deep neural networks, enabling the construction of modular PINN architectures. Our experiments show that these modular architectures offer improved accuracy compared to traditional fully connected MLP PINNs, showcasing their potential for enhancing PINN performance while reducing computational overhead. Overall, this study contributes to advancing the understanding and development of efficient and effective neural network architectures for solving PDEs, bridging the gap between PINNs and traditional numerical methods.
A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by M\"uller et al. (2021), and derive convergence rates in both the noisy and noiseless settings.
We introduce Tune without Validation (Twin), a pipeline for tuning learning rate and weight decay without validation sets. We leverage a recent theoretical framework concerning learning phases in hypothesis space to devise a heuristic that predicts what hyper-parameter (HP) combinations yield better generalization. Twin performs a grid search of trials according to an early-/non-early-stopping scheduler and then segments the region that provides the best results in terms of training loss. Among these trials, the weight norm strongly correlates with predicting generalization. To assess the effectiveness of Twin, we run extensive experiments on 20 image classification datasets and train several families of deep networks, including convolutional, transformer, and feed-forward models. We demonstrate proper HP selection when training from scratch and fine-tuning, emphasizing small-sample scenarios.
We revisit the problem of spurious modes that are sometimes encountered in partial differential equations discretizations. It is generally suspected that one of the causes for spurious modes is due to how boundary conditions are treated, and we use this as the starting point of our investigations. By regarding boundary conditions as algebraic constraints on a differential equation, we point out that any differential equation with homogeneous boundary conditions also admits a typically infinite number of hidden or implicit boundary conditions. In most discretization schemes, these additional implicit boundary conditions are violated, and we argue that this is what leads to the emergence of spurious modes. These observations motivate two definitions of the quality of computed eigenvalues based on violations of derivatives of boundary conditions on the one hand, and on the Grassmann distance between subspaces associated with computed eigenspaces on the other. Both of these tests are based on a standardized treatment of boundary conditions and do not require a priori knowledge of eigenvalue locations. The effectiveness of these tests is demonstrated on several examples known to have spurious modes. In addition, these quality tests show that in most problems, about half the computed spectrum of a differential operator is of low quality. The tests also specifically identify the low accuracy modes, which can then be projected out as a type of model reduction scheme.
We define a general formulation of quantum PCPs, which captures adaptivity and multiple unentangled provers, and give a detailed construction of the quantum reduction to a local Hamiltonian with a constant promise gap. The reduction turns out to be a versatile subroutine to prove properties of quantum PCPs, allowing us to show: (i) Non-adaptive quantum PCPs can simulate adaptive quantum PCPs when the number of proof queries is constant. In fact, this can even be shown to hold when the non-adaptive quantum PCP picks the proof indices simply uniformly at random from a subset of all possible index combinations, answering an open question by Aharonov, Arad, Landau and Vazirani (STOC '09). (ii) If the $q$-local Hamiltonian problem with constant promise gap can be solved in $\mathsf{QCMA}$, then $\mathsf{QPCP}[q] \subseteq \mathsf{QCMA}$ for any $q \in O(1)$. (iii) If $\mathsf{QMA}(k)$ has a quantum PCP for any $k \leq \text{poly}(n)$, then $\mathsf{QMA}(2) = \mathsf{QMA}$, connecting two of the longest-standing open problems in quantum complexity theory. Moreover, we also show that there exists (quantum) oracles relative to which certain quantum PCP statements are false. Hence, any attempt to prove the quantum PCP conjecture requires, just as was the case for the classical PCP theorem, (quantumly) non-relativizing techniques.
Particle-based fluid simulations have emerged as a powerful tool for solving the Navier-Stokes equations, especially in cases that include intricate physics and free surfaces. The recent addition of machine learning methods to the toolbox for solving such problems is pushing the boundary of the quality vs. speed tradeoff of such numerical simulations. In this work, we lead the way to Lagrangian fluid simulators compatible with deep learning frameworks, and propose JAX-SPH - a Smoothed Particle Hydrodynamics (SPH) framework implemented in JAX. JAX-SPH builds on the code for dataset generation from the LagrangeBench project (Toshev et al., 2023) and extends this code in multiple ways: (a) integration of further key SPH algorithms, (b) restructuring the code toward a Python library, (c) verification of the gradients through the solver, and (d) demonstration of the utility of the gradients for solving inverse problems as well as a Solver-in-the-Loop application. Our code is available at //github.com/tumaer/jax-sph.
An asymptotic theory is established for linear functionals of the predictive function given by kernel ridge regression, when the reproducing kernel Hilbert space is equivalent to a Sobolev space. The theory covers a wide variety of linear functionals, including point evaluations, evaluation of derivatives, $L_2$ inner products, etc. We establish the upper and lower bounds of the estimates and their asymptotic normality. It is shown that $\lambda\sim n^{-1}$ is the universal optimal order of magnitude for the smoothing parameter to balance the variance and the worst-case bias. The theory also implies that the optimal $L_\infty$ error of kernel ridge regression can be attained under the optimal smoothing parameter $\lambda\sim n^{-1}\log n$. These optimal rates for the smoothing parameter differ from the known optimal rate $\lambda\sim n^{-\frac{2m}{2m+d}}$ that minimizes the $L_2$ error of the kernel ridge regression.
Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on $[n]$ are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also referred to as adaptive tester). Given the profound impact of parallel computing over the past decades, there has been a strong desire to design algorithms that enable high parallelization. Despite significant algorithmic advancements over the last decade, parallelizable techniques (also termed non-adaptive testers) have $\tilde{O}(\log^{12}n)$ query complexity, a prohibitively large complexity to be of practical usage. Therefore, the primary challenge is whether it is possible to design algorithms that enable high parallelization while achieving efficient query complexity. Our work provides an affirmative answer to the aforementioned challenge: we present a highly parallelizable tester with a query complexity of $\tilde{O}(\log n)$, achieved through a single round of adaptivity, marking a significant stride towards harmonizing parallelizability and efficiency in equivalence testing.
Recently, promptable segmentation models, such as the Segment Anything Model (SAM), have demonstrated robust zero-shot generalization capabilities on static images. These promptable models exhibit denoising abilities for imprecise prompt inputs, such as imprecise bounding boxes. In this paper, we explore the potential of applying SAM to track and segment objects in videos where we recognize the tracking task as a prompt denoising task. Specifically, we iteratively propagate the bounding box of each object's mask in the preceding frame as the prompt for the next frame. Furthermore, to enhance SAM's denoising capability against position and size variations, we propose a multi-prompt strategy where we provide multiple jittered and scaled box prompts for each object and preserve the mask prediction with the highest semantic similarity to the template mask. We also introduce a point-based refinement stage to handle occlusions and reduce cumulative errors. Without involving tracking modules, our approach demonstrates comparable performance in video object/instance segmentation tasks on three datasets: DAVIS2017, YouTubeVOS2018, and UVO, serving as a concise baseline and endowing SAM-based downstream applications with tracking capabilities.
Recently, Mutual Information (MI) has attracted attention in bounding the generalization error of Deep Neural Networks (DNNs). However, it is intractable to accurately estimate the MI in DNNs, thus most previous works have to relax the MI bound, which in turn weakens the information theoretic explanation for generalization. To address the limitation, this paper introduces a probabilistic representation of DNNs for accurately estimating the MI. Leveraging the proposed MI estimator, we validate the information theoretic explanation for generalization, and derive a tighter generalization bound than the state-of-the-art relaxations.
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.