In this work, an approximate family of implicit multiderivative Runge-Kutta (MDRK) time integrators for stiff initial value problems is presented. The approximation procedure is based on the recent Approximate Implicit Taylor method (Baeza et al. in Comput. Appl. Math. 39:304, 2020). As a Taylor method can be written in MDRK format, the novel family constitutes a multistage generalization. Two different alternatives are investigated for the computation of the higher order derivatives: either directly as part of the stage equation, or either as a separate formula for each derivative added on top of the stage equation itself. From linearizing through Newton's method, it turns out that the conditioning of the Newton matrix behaves significantly different for both cases. We show that direct computation results in a matrix with a conditioning that is highly dependent on the stiffness, increasing exponentially in the stiffness parameter with the amount of derivatives. Adding separate formulas has a more favorable behavior, the matrix conditioning being linearly dependent on the stiffness, regardless of the amount of derivatives. Despite increasing the Newton system significantly in size, through several numerical results it is demonstrated that doing so can be considerably beneficial.
It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity.
Speeding has been acknowledged as a critical determinant in increasing the risk of crashes and their resulting injury severities. This paper demonstrates that severe speeding-related crashes within the state of Pennsylvania have a spatial clustering trend, where four crash datasets are extracted from four hotspot districts. Two log-likelihood ratio (LR) tests were conducted to determine whether speeding-related crashes classified by hotspot districts should be modeled separately. The results suggest that separate modeling is necessary. To capture the unobserved heterogeneity, four correlated random parameter order models with heterogeneity in means are employed to explore the factors contributing to crash severity involving at least one vehicle speeding. Overall, the findings exhibit that some indicators are observed to be spatial instability, including hit pedestrian crashes, head-on crashes, speed limits, work zones, light conditions (dark), rural areas, older drivers, running stop signs, and running red lights. Moreover, drunk driving, exceeding the speed limit, and being unbelted present relative spatial stability in four district models. This paper provides insights into preventing speeding-related crashes and potentially facilitating the development of corresponding crash injury mitigation policies.
Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.
In 2006, Biere, Jussila, and Sinz made the key observation that the underlying logic behind algorithms for constructing Reduced, Ordered Binary Decision Diagrams (BDDs) can be encoded as steps in a proof in the extended resolution logical framework. Through this, a BDD-based Boolean satisfiability (SAT) solver can generate a checkable proof of unsatisfiability for a set of clauses. Such a proof indicates that the formula is truly unsatisfiable without requiring the user to trust the BDD package or the SAT solver built on top of it. We extend their work to enable arbitrary existential quantification of the formula variables, a critical capability for BDD-based SAT solvers. We demonstrate the utility of this approach by applying a BDD-based solver, implemented by modifying an existing BDD package, to several challenging Boolean satisfiability problems. Our resultsdemonstrate scaling for parity formulas, as well as the Urquhart, mutilated chessboard, and pigeonhole problems far beyond that of other proof-generating SAT solvers.
We study recovery of amplitudes and nodes of a finite impulse train from noisy frequency samples. This problem is known as super-resolution under sparsity constraints and has numerous applications. An especially challenging scenario occurs when the separation between Dirac pulses is smaller than the Nyquist-Shannon-Rayleigh limit. Despite large volumes of research and well-established worst-case recovery bounds, there is currently no known computationally efficient method which achieves these bounds in practice. In this work we combine the well-known Prony's method for exponential fitting with a recently established decimation technique for analyzing the super-resolution problem in the above mentioned regime. We show that our approach attains optimal asymptotic stability in the presence of noise, and has lower computational complexity than the current state of the art methods.
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where $\lambda$ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be ``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the $(\eta,c)$-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds.
In this work, we study discrete minimizers of the Ginzburg-Landau energy in finite element spaces. Special focus is given to the influence of the Ginzburg-Landau parameter $\kappa$. This parameter is of physical interest as large values can trigger the appearance of vortex lattices. Since the vortices have to be resolved on sufficiently fine computational meshes, it is important to translate the size of $\kappa$ into a mesh resolution condition, which can be done through error estimates that are explicit with respect to $\kappa$ and the spatial mesh width $h$. For that, we first work in an abstract framework for a general class of discrete spaces, where we present convergence results in a problem-adapted $\kappa$-weighted norm. Afterwards we apply our findings to Lagrangian finite elements and a particular generalized finite element construction. In numerical experiments we confirm that our derived $L^2$- and $H^1$-error estimates are indeed optimal in $\kappa$ and $h$.
Force-free plasmas are a good approximation where the plasma pressure is tiny compared with the magnetic pressure, which is the case during the cold vertical displacement event (VDE) of a major disruption in a tokamak. On time scales long compared with the transit time of Alfven waves, the evolution of a force-free plasma is most efficiently described by the quasi-static magnetohydrodynamic (MHD) model, which ignores the plasma inertia. Here we consider a regularized quasi-static MHD model for force-free plasmas in tokamak disruptions and propose a mimetic finite difference (MFD) algorithm. The full geometry of an ITER-like tokamak reactor is treated, with a blanket module region, a vacuum vessel region, and the plasma region. Specifically, we develop a parallel, fully implicit, and scalable MFD solver based on PETSc and its DMStag data structure for the discretization of the five-field quasi-static perpendicular plasma dynamics model on a 3D structured mesh. The MFD spatial discretization is coupled with a fully implicit DIRK scheme. The algorithm exactly preserves the divergence-free condition of the magnetic field under the resistive Ohm's law. The preconditioner employed is a four-level fieldsplit preconditioner, which is created by combining separate preconditioners for individual fields, that calls multigrid or direct solvers for sub-blocks or exact factorization on the separate fields. The numerical results confirm the divergence-free constraint is strongly satisfied and demonstrate the performance of the fieldsplit preconditioner and overall algorithm. The simulation of ITER VDE cases over the actual plasma current diffusion time is also presented.
With the maturity of web services, containers, and cloud computing technologies, large services in traditional systems (e.g. the computation services of machine learning and artificial intelligence) are gradually being broken down into many microservices to increase service reusability and flexibility. Therefore, this study proposes an efficiency analysis framework based on queuing models to analyze the efficiency difference of breaking down traditional large services into n microservices. For generalization, this study considers different service time distributions (e.g. exponential distribution of service time and fixed service time) and explores the system efficiency in the worst-case and best-case scenarios through queuing models (i.e. M/M/1 queuing model and M/D/1 queuing model). In each experiment, it was shown that the total time required for the original large service was higher than that required for breaking it down into multiple microservices, so breaking it down into multiple microservices can improve system efficiency. It can also be observed that in the best-case scenario, the improvement effect becomes more significant with an increase in arrival rate. However, in the worst-case scenario, only slight improvement was achieved. This study found that breaking down into multiple microservices can effectively improve system efficiency and proved that when the computation time of the large service is evenly distributed among multiple microservices, the best improvement effect can be achieved. Therefore, this study's findings can serve as a reference guide for future development of microservice architecture.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.