We propose iterative projection methods for solving square or rectangular consistent linear systems $Ax = b$. Projection methods use sketching matrices (possibly randomized) to generate a sequence of small projected subproblems, but even the smaller systems can be costly. We develop a process that appends one column each iteration to the sketching matrix and that converges in a finite number of iterations independent of whether the sketch is random or deterministic. In general, our process generates orthogonal updates to the approximate solution $x_k$. By choosing the sketch to be the set of all previous residuals, we obtain a simple recursive update and convergence in at most rank($A$) iterations (in exact arithmetic). By choosing a sequence of identity columns for the sketch, we develop a generalization of the Kaczmarz method. In experiments on large sparse systems, our method (PLSS) with residual sketches is competitive with LSQR, and our method with residual and identity sketches compares favorably to state-of-the-art randomized methods.
Numerically solving ordinary differential equations (ODEs) is a naturally serial process and as a result the vast majority of ODE solver software are serial. In this manuscript we developed a set of parallelized ODE solvers using extrapolation methods which exploit "parallelism within the method" so that arbitrary user ODEs can be parallelized. We describe the specific choices made in the implementation of the explicit and implicit extrapolation methods which allow for generating low overhead static schedules to then exploit with optimized multi-threaded implementations. We demonstrate that while the multi-threading gives a noticeable acceleration on both explicit and implicit problems, the explicit parallel extrapolation methods gave no significant improvement over state-of-the-art even with a multi-threading advantage against current optimized high order Runge-Kutta tableaus. However, we demonstrate that the implicit parallel extrapolation methods are able to achieve state-of-the-art performance (2x-4x) on standard multicore x86 CPUs for systems of $<200$ stiff ODEs solved at low tolerance, a typical setup for a vast majority of users of high level language equation solver suites. The resulting method is distributed as the first widely available open source software for within-method parallel acceleration targeting typical modest compute architectures.
A graph is rectilinear planar if it admits a planar orthogonal drawing without bends. While testing rectilinear planarity is NP-hard in general, it is a long-standing open problem to establish a tight upper bound on its complexity for partial 2-trees, i.e., graphs whose biconnected components are series-parallel. We describe a new $O(n^2 \log^2 n)$-time algorithm to test rectilinear planarity of partial 2-trees, which improves over the current best bound of $O(n^3 \log n)$. Moreover, for series-parallel graphs where no two parallel-components share a pole, we are able to achieve optimal $O(n)$-time complexity. Our algorithms are based on an extensive study and a deeper understanding of the notion of orthogonal spirality, introduced in 1998 to describe how much an orthogonal drawing of a subgraph is rolled-up in an orthogonal drawing of the graph.
On an assigned graph, the problem of Multi-Agent Pathfinding (MAPF) consists in finding paths for multiple agents, avoiding collisions. Finding the minimum-length solution is known to be NP-hard, and computation times grows exponentially with the number of agents. However, in industrial applications, it is important to find feasible, suboptimal solutions, in a time that grows polynomially with the number of agents. Such algorithms exist for undirected and biconnected directed graphs. Our main contribution is to generalize these algorithms to the more general case of strongly connected directed graphs. In particular, given a MAPF problem with at least two holes, we present an algorithm that checks the problem feasibility in linear time with respect to the number of nodes, and provides a feasible solution in polynomial time.
In this research note, I derive explicit dynamical systems for language within an acquisition-driven framework (Niyogi \& Berwick, 1997; Niyogi, 2006) assuming that children/learners follow the Tolerance Principle (Yang, 2016) to determine whether a rule is productive during the process of language acquisition. I consider different theoretical parameters such as population size (finite vs. infinite) and the number of previous generations that provide learners with data. Multiple simulations of the dynamics obtained here and applications to diacrhonic language data are in preparation, so they are not included in this first note.
In this paper, for solving large-scale nonlinear equations we propose a nonlinear sampling Kaczmarz-Motzkin (NSKM) method. Based on the local tangential cone condition and the Jensen's inequality, we prove convergence of our method with two different assumptions. Then, for solving nonlinear equations with the convex constraints we present two variants of the NSKM method: the projected sampling Kaczmarz-Motzkin (PSKM) method and the accelerated projected sampling Kaczmarz-Motzkin (APSKM) method. With the use of the nonexpansive property of the projection and the convergence of the NSKM method, the convergence analysis is obtained. Numerical results show that the NSKM method with the sample of the suitable size outperforms the nonlinear randomized Kaczmarz (NRK) method in terms of calculation times. The APSKM and PSKM methods are practical and promising for the constrained nonlinear problem.
A generalized finite element method is proposed for solving a heterogeneous reaction-diffusion equation with a singular perturbation parameter $\varepsilon$, based on locally approximating the solution on each subdomain by solution of a local reaction-diffusion equation and eigenfunctions of a local eigenproblem. These local problems are posed on some domains slightly larger than the subdomains with oversampling size $\delta^{\ast}$. The method is formulated at the continuous level as a direct discretization of the continuous problem and at the discrete level as a coarse-space approximation for its standard FE discretizations. Exponential decay rates for local approximation errors with respect to $\delta^{\ast}/\varepsilon$ and $\delta^{\ast}/h$ (at the discrete level with $h$ denoting the fine FE mesh size) and with the local degrees of freedom are established. In particular, it is shown that the method at the continuous level converges uniformly with respect to $\varepsilon$ in the standard $H^{1}$ norm, and that if the oversampling size is relatively large with respect to $\varepsilon$ and $h$ (at the discrete level), the solutions of the local reaction-diffusion equations provide good local approximations for the solution and thus the local eigenfunctions are not needed. Numerical results are provided to verify the theoretical results.
We extend results known for the randomized Gauss-Seidel and the Gauss-Southwell methods for the case of a Hermitian and positive definite matrix to certain classes of non-Hermitian matrices. We obtain convergence results for a whole range of parameters describing the probabilities in the randomized method or the greedy choice strategy in the Gauss-Southwell-type methods. We identify those choices which make our convergence bounds best possible. Our main tool is to use weighted l1-norms to measure the residuals. A major result is that the best convergence bounds that we obtain for the expected values in the randomized algorithm are as good as the best for the deterministic, but more costly algorithms of Gauss-Southwell type. Numerical experiments illustrate the convergence of the method and the bounds obtained. Comparisons with the randomized Kaczmarz method are also presented.
In this paper, we study a second-order accurate and linear numerical scheme for the nonlocal Cahn-Hilliard equation. The scheme is established by combining a modified Crank-Nicolson approximation and the Adams-Bashforth extrapolation for the temporal discretization, and by applying the Fourier spectral collocation to the spatial discretization. In addition, two stabilization terms in different forms are added for the sake of the numerical stability. We conduct a complete convergence analysis by using the higher-order consistency estimate for the numerical scheme, combined with the rough error estimate and the refined estimate. By regarding the numerical solution as a small perturbation of the exact solution, we are able to justify the discrete $\ell^\infty$ bound of the numerical solution, as a result of the rough error estimate. Subsequently, the refined error estimate is derived to obtain the optimal rate of convergence, following the established $\ell^\infty$ bound of the numerical solution. Moreover, the energy stability is also rigorously proved with respect to a modified energy. The proposed scheme can be viewed as the generalization of the second-order scheme presented in an earlier work, and the energy stability estimate has greatly improved the corresponding result therein.
Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing strategy, i.e., an algorithm, and normally for one linear algebra expression many alternative algorithms exist. Although mathematically equivalent, those algorithms might exhibit significant differences in terms of performance. Several high-level languages and tools for matrix computations such as Julia, Armadillo, Linnea, etc., make algorithmic choices by minimizing the number of Floating Point Operations (FLOPs). However, there can be several algorithms that share the same (or have nearly identical) number of FLOPs; in many cases, these algorithms exhibit execution times which are statistically equivalent and one could arbitrarily select one of them as the best algorithm. It is however not unlikely to find cases where the execution times are significantly different from one another (despite the FLOP count being almost the same). It is also possible that the algorithm that minimizes FLOPs is not the one that minimizes execution time. In this work, we develop a methodology to test the reliability of FLOPs as discriminant for linear algebra algorithms. Given a set of algorithms (for an instance of a linear algebra expression) as input, the methodology ranks them into performance classes; algorithms in the same class are statistically equivalent in performance. To this end, we measure the algorithms iteratively until the changes in the ranks converge to a value close to zero. FLOPs are a valid discriminant for an instance if all the algorithms with minimum FLOPs are assigned the best rank; otherwise, the instance is regarded as an anomaly, which can then be used in the investigation of the root cause of performance differences.
We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $\gamma$, which is based on a simultaneous boosting technique and requires only $\textrm{poly}(k/\gamma)$ samples-per-task and $\textrm{poly}(k\log(d)/\gamma)$ samples in total. In addition, we prove a computational separation, showing that assuming there exists a concept class that cannot be learned in the attribute-efficient model, we can construct another concept class such that can be learned in the attribute-efficient model, but cannot be multitask learned efficiently -- multitask learning this concept class either requires super-polynomial time complexity or a much larger total number of samples.