In this article, we propose a new numerical method and its analysis to solve eigenvalue problems for self-adjoint Schr{\"o}dinger operators, by combining the Feshbach-Schur perturbation theory with the spectral Fourier discretization. In order to analyze the method, we establish an abstract framework of Feshbach-Schur perturbation theory with minimal regularity assumptions on the potential that is then applied to the setting of the new spectral Fourier discretization method. Finally, we present some numerical results that underline the theoretical findings.
The present paper continues our investigation of an implementation of a least-squares collocation method for higher-index differential-algebraic equations. In earlier papers, we were able to substantiate the choice of basis functions and collocation points for a robust implementation as well as algorithms for the solution of the discrete system. The present paper is devoted to an analytic estimation of condition numbers for different components of an implementation. We present error estimations, which show the sources for the different errors.
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the Hard Concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids, which we call "mixed random variables." Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic ("sample-and-project") and an intrinsic one (based on face stratification). We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables.
The first focus of this paper is the characterization of the spectrum and the singular values of the coefficient matrix stemming from the discretization with space-time grid for a parabolic diffusion problem and from the approximation of distributed order fractional equations. For this purpose we will use the classical GLT theory and the new concept of GLT momentary symbols. The first permits to describe the singular value or eigenvalue asymptotic distribution of the sequence of the coefficient matrices, the latter permits to derive a function, which describes the singular value or eigenvalue distribution of the matrix of the sequence, even for small matrix-sizes but under given assumptions. The note is concluded with a list of open problems, including the use of our machinery in the study of iteration matrices, especially those concerning multigrid-type techniques.
The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.
We propose a framework for single-shot predictive uncertainty quantification of a neural network that replaces the conventional Bayesian notion of weight probability density function (PDF) with a functional defined on the model weights in a reproducing kernel Hilbert space (RKHS). The resulting RKHS based analysis yields a potential field based interpretation of the model weight PDF, which allows us to use a perturbation theory based approach in the RKHS to formulate a moment decomposition problem over the model weight-output uncertainty. We show that the extracted moments from this approach automatically decompose the weight PDF around the local neighborhood of the specified model output and determine with great sensitivity the local heterogeneity and anisotropy of the weight PDF around a given model prediction output. Consequently, these functional moments provide much sharper estimates of model predictive uncertainty than the central stochastic moments characterized by Bayesian and ensemble methods. We demonstrate this experimentally by evaluating the error detection capability of the model uncertainty quantification methods on test data that has undergone a covariate shift away from the training PDF learned by the model. We find our proposed measure for uncertainty quantification to be significantly more precise and better calibrated than baseline methods on various benchmark datasets, while also being much faster to compute.
Tensors, i.e., multi-linear functions, are a fundamental building block of machine learning algorithms. In order to train on large data-sets, it is common practice to distribute the computation amongst workers. However, stragglers and other faults can severely impact the performance and overall training time. A novel strategy to mitigate these failures is the use of coded computation. We introduce a new metric for analysis called the typical recovery threshold, which focuses on the most likely event and provide a novel construction of distributed coded tensor operations which are optimal with this measure. We show that our general framework encompasses many other computational schemes and metrics as a special case. In particular, we prove that the recovery threshold and the tensor rank can be recovered as a special case of the typical recovery threshold when the probability of noise, i.e., a fault, is equal to zero, thereby providing a noisy generalization of noiseless computation as a serendipitous result. Far from being a purely theoretical construction, these definitions lead us to practical random code constructions, i.e., locally random p-adic alloy codes, which are optimal with respect to the measures. We analyze experiments conducted on Amazon EC2 and establish that they are faster and more numerically stable than many other benchmark computation schemes in practice, as is predicted by theory.
In this work, we consider fracture propagation in nearly incompressible and (fully) incompressible materials using a phase-field formulation. We use a mixed form of the elasticity equation to overcome volume locking effects and develop a robust, nonlinear and linear solver scheme and preconditioner for the resulting system. The coupled variational inequality system, which is solved monolithically, consists of three unknowns: displacements, pressure, and phase-field. Nonlinearities due to coupling, constitutive laws, and crack irreversibility are solved using a combined Newton algorithm for the nonlinearities in the partial differential equation and employing a primal-dual active set strategy for the crack irreverrsibility constraint. The linear system in each Newton step is solved iteratively with a flexible generalized minimal residual method (GMRES). The key contribution of this work is the development of a problem-specific preconditioner that leverages the saddle-point structure of the displacement and pressure variable. Four numerical examples in pure solids and pressure-driven fractures are conducted on uniformly and locally refined meshes to investigate the robustness of the solver concerning the Poisson ratio as well as the discretization and regularization parameters.
A stabilized finite element method is introduced for the simulation of time-periodic creeping flows, such as those found in the cardiorespiratory systems. The new technique, which is formulated in the frequency rather than time domain, strictly uses real arithmetics and permits the use of similar shape functions for pressure and velocity for ease of implementation. It involves the addition of the Laplacian of pressure to the continuity equation with a complex-valued stabilization parameter that is derived systematically from the momentum equation. The numerical experiments show the excellent accuracy and robustness of the proposed method in simulating flows in complex and canonical geometries for a wide range of conditions. The present method significantly outperforms a traditional solver in terms of both computational cost and scalability, which lowers the overall solution turnover time by several orders of magnitude.
The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.