We design and analyze an iterative two-grid algorithm for the finite element discretizations of strongly nonlinear elliptic boundary value problems in this paper. We propose an iterative two-grid algorithm, in which a nonlinear problem is first solved on the coarse space, and then a symmetric positive definite problem is solved on the fine space. The innovation of this paper lies in the establishment of a first convergence analysis, which requires simultaneous estimation of four interconnected error estimates. We also present some numerical experiments to confirm the efficiency of the proposed algorithm.
In this study, we examine numerical approximations for 2nd-order linear-nonlinear differential equations with diverse boundary conditions, followed by the residual corrections of the first approximations. We first obtain numerical results using the Galerkin weighted residual approach with Bernstein polynomials. The generation of residuals is brought on by the fact that our first approximation is computed using numerical methods. To minimize these residuals, we use the compact finite difference scheme of 4th-order convergence to solve the error differential equations in accordance with the error boundary conditions. We also introduce the formulation of the compact finite difference method of fourth-order convergence for the nonlinear BVPs. The improved approximations are produced by adding the error values derived from the approximations of the error differential equation to the weighted residual values. Numerical results are compared to the exact solutions and to the solutions available in the published literature to validate the proposed scheme, and high accuracy is achieved in all cases
Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. To characterize how the tails change under various operations, we develop an algebra which acts on a three-parameter family of tail asymptotics and which is based on the generalized Gamma distribution. Our algebraic operations are closed under addition and multiplication; they are capable of distinguishing sub-Gaussians with differing scales; and they handle ratios sufficiently well to reproduce the tails of most important statistical distributions directly from their definitions. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
In this paper, we use the optimization formulation of nonlinear Kalman filtering and smoothing problems to develop second-order variants of iterated Kalman smoother (IKS) methods. We show that Newton's method corresponds to a recursion over affine smoothing problems on a modified state-space model augmented by a pseudo measurement. The first and second derivatives required in this approach can be efficiently computed with widely available automatic differentiation tools. Furthermore, we show how to incorporate line-search and trust-region strategies into the proposed second-order IKS algorithm in order to regularize updates between iterations. Finally, we provide numerical examples to demonstrate the method's efficiency in terms of runtime compared to its batch counterpart.
In this paper we consider the weighted $k$-Hamming and $k$-Edit distances, that are natural generalizations of the classical Hamming and Edit distances. As main results of this paper we prove that for any $k\geq 2$ the DECIS-$k$-Hamming problem is $\mathbb{P}$-SPACE-complete and the DECIS-$k$-Edit problem is NEXPTIME-complete.
In a Subgraph Problem we are given some graph and want to find a feasible subgraph that optimizes some measure. We consider Multistage Subgraph Problems (MSPs), where we are given a sequence of graph instances (stages) and are asked to find a sequence of subgraphs, one for each stage, such that each is optimal for its respective stage and the subgraphs for subsequent stages are as similar as possible. We present a framework that provides a $(1/\sqrt{2\chi})$-approximation algorithm for the $2$-stage restriction of an MSP if the similarity of subsequent solutions is measured as the intersection cardinality and said MSP is preficient, i.e., we can efficiently find a single-stage solution that prefers some given subset. The approximation factor is dependent on the instance's intertwinement $\chi$, a similarity measure for multistage graphs. We also show that for any MSP, independent of similarity measure and preficiency, given an exact or approximation algorithm for a constant number of stages, we can approximate the MSP for an unrestricted number of stages. Finally, we combine and apply these results and show that the above restrictions describe a very rich class of MSPs and that proving membership for this class is mostly straightforward. As examples, we explicitly state these proofs for natural multistage versions of Perfect Matching, Shortest s-t-Path, Minimum s-t-Cut and further classical problems on bipartite or planar graphs, namely Maximum Cut, Vertex Cover, Independent Set, and Biclique.
Thepaperprovesconvergenceofone-levelandmultilevelunsymmetriccollocationforsecondorderelliptic boundary value problems on the bounded domains. By using Schaback's linear discretization theory,L2 errors are obtained based on the kernel-based trial spaces generated by the compactly supported radial basis functions. For the one-level unsymmetric collocation case, we obtain convergence when the testing discretization is finer than the trial discretization. The convergence rates depend on the regularity of the solution, the smoothness of the computing domain, and the approximation of scaled kernel-based spaces. The multilevel process is implemented by employing successive refinement scattered data sets and scaled compactly supported radial basis functions with varying support radii. Convergence of multilevel collocation is further proved based on the theoretical results of one-level unsymmetric collocation. In addition to having the same dependencies as the one-level collocation, the convergence rates of multilevel unsymmetric collocation especially depends on the increasing rules of scattered data and the selection of scaling parameters.
This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a $d$-dimensional distribution $\mathcal{P}$ which is symmetric at zero, we perturb the weight of $f$ and define $F(W) = \mathbb{E}[f({W + U})]$, where $U$ is a random sample from $\mathcal{P}$. This injection induces regularization through the Hessian trace of $f$ for small, isotropic Gaussian perturbations. Thus, the weight-perturbed function biases to minimizers with low Hessian trace. Several prior works have studied settings related to this weight-perturbed function by designing algorithms to improve generalization. Still, convergence rates are not known for finding minima under the average perturbations of the function $F$. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of $\mathcal{P}$ to reduce variance. We then provide a rigorous analysis, showing matching upper and lower bounds of our algorithm for finding an approximate first-order stationary point of $F$ when the gradient of $f$ is Lipschitz-continuous. We empirically validate our algorithm for several image classification tasks with various architectures. Compared to sharpness-aware minimization, we note a 12.6% and 7.8% drop in the Hessian trace and top eigenvalue of the found minima, respectively, averaged over eight datasets. Ablation studies validate the benefit of the design of our algorithm.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.