A modification of Newton's method for solving systems of $n$ nonlinear equations is presented. The new matrix-free method relies on a given decomposition of the invertible Jacobian of the residual into invertible sparse local Jacobians according to the chain rule of differentiation. It is motivated in the context of local Jacobians with bandwidth $2m+1$ for $m\ll n$. A reduction of the computational cost by $\mathcal{O}(\frac{n}{m})$ can be observed. Supporting run time measurements are presented for the tridiagonal case showing a reduction of the computational cost by $\mathcal{O}(n).$ Generalization yields the combinatorial Matrix-Free Newton Step problem. We prove NP-completeness and we present algorithmic components for building methods for the approximate solution. Inspired by adjoint Algorithmic Differentiation, the new method shares several challenges for the latter including the DAG Reversal problem. Further challenges are due to combinatorial problems in sparse linear algebra such as Bandwidth or Directed Elimination Ordering.
Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.
Minimizing the weight of an edge set satisfying parity constraints is a challenging branch of combinatorial optimization as witnessed by the binary hypergraph chapter of Alexander Schrijver's book ``Combinatorial Optimization" (Chapter 80). This area contains relevant graph theory problems including open cases of the Max Cut problem and some multiflow problems. We clarify the interconnections between some of these problems and establish three levels of difficulties. On the one hand, we prove that the Shortest Odd Path problem in undirected graphs without cycles of negative total weight and several related problems are NP-hard, settling a long-standing open question asked by Lov\'asz (Open Problem 27 in Schrijver's book ``Combinatorial Optimization''). On the other hand, we provide an efficient algorithm to the closely related and well-studied Minimum-weight Odd $T$-Join problem for non-negative weights: our algorithm runs in FPT time parameterized by $c$, where $c$ is the number of connected components in some efficiently computed minimum-weight $T$-join. If negative weights are also allowed, then finding a minimum-weight odd $\{s,t\}$-join is equivalent to the Minimum-weight Odd $T$-Join problem for arbitrary weights, whose complexity is still only conjectured to be polynomial-time solvable. The analogous problems for digraphs are also considered.
We examine the problem of variance components testing in general mixed effects models using the likelihood ratio test. We account for the presence of nuisance parameters, i.e. the fact that some untested variances might also be equal to zero. Two main issues arise in this context leading to a non regular setting. First, under the null hypothesis the true parameter value lies on the boundary of the parameter space. Moreover, due to the presence of nuisance parameters the exact location of these boundary points is not known, which prevents from using classical asymptotic theory of maximum likelihood estimation. Then, in the specific context of nonlinear mixed-effects models, the Fisher information matrix is singular at the true parameter value. We address these two points by proposing a shrinked parametric bootstrap procedure, which is straightforward to apply even for nonlinear models. We show that the procedure is consistent, solving both the boundary and the singularity issues, and we provide a verifiable criterion for the applicability of our theoretical results. We show through a simulation study that, compared to the asymptotic approach, our procedure has a better small sample performance and is more robust to the presence of nuisance parameters. A real data application is also provided.
The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern $P$ in a text $T$. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, play an important role. In this paper, we analyze the properties of periods of parameterized strings and propose a generalization of Galil and Seiferas's exact matching algorithm (1980) into parameterized matching, which runs in $O(\pi|T|+|P|)$ time and $O(\log{|P|}+|{\rm\Pi}|)$ space in addition to the input space, where ${\rm\Pi}$ is the parameter alphabet and $\pi$ is the number of parameter characters appearing in $P$ plus one.
We study the problem of estimating a large, low-rank matrix corrupted by additive noise of unknown covariance, assuming one has access to additional side information in the form of noise-only measurements. We study the Whiten-Shrink-reColor (WSC) workflow, where a "noise covariance whitening" transformation is applied to the observations, followed by appropriate singular value shrinkage and a "noise covariance re-coloring" transformation. We show that under the mean square error loss, a unique, asymptotically optimal shrinkage nonlinearity exists for the WSC denoising workflow, and calculate it in closed form. To this end, we calculate the asymptotic eigenvector rotation of the random spiked F-matrix ensemble, a result which may be of independent interest. With sufficiently many pure-noise measurements, our optimally-tuned WSC denoising workflow outperforms, in mean square error, matrix denoising algorithms based on optimal singular value shrinkage which do not make similar use of noise-only side information; numerical experiments show that our procedure's relative performance is particularly strong in challenging statistical settings with high dimensionality and large degree of heteroscedasticity.
In multivariate time series analysis, the coherence measures the linear dependency between two-time series at different frequencies. However, real data applications often exhibit nonlinear dependency in the frequency domain. Conventional coherence analysis fails to capture such dependency. The quantile coherence, on the other hand, characterizes nonlinear dependency by defining the coherence at a set of quantile levels based on trigonometric quantile regression. Although quantile coherence is a more powerful tool, its estimation remains challenging due to the high level of noise. This paper introduces a new estimation technique for quantile coherence. The proposed method is semi-parametric, which uses the parametric form of the spectrum of the vector autoregressive (VAR) model as an approximation to the quantile spectral matrix, along with nonparametric smoothing across quantiles. For each fixed quantile level, we obtain the VAR parameters from the quantile periodograms, then, using the Durbin-Levinson algorithm, we calculate the preliminary estimate of quantile coherence using the VAR parameters. Finally, we smooth the preliminary estimate of quantile coherence across quantiles using a nonparametric smoother. Numerical results show that the proposed estimation method outperforms nonparametric methods. We show that quantile coherence-based bivariate time series clustering has advantages over the ordinary VAR coherence. For applications, the identified clusters of financial stocks by quantile coherence with a market benchmark are shown to have an intriguing and more accurate structure of diversified investment portfolios that may be used by investors to make better decisions.
The entropy production rate is a central quantity in non-equilibrium statistical physics, scoring how far a stochastic process is from being time-reversible. In this paper, we compute the entropy production of diffusion processes at non-equilibrium steady-state under the condition that the time-reversal of the diffusion remains a diffusion. We start by characterising the entropy production of both discrete and continuous-time Markov processes. We investigate the time-reversal of time-homogeneous stationary diffusions and recall the most general conditions for the reversibility of the diffusion property, which includes hypoelliptic and degenerate diffusions, and locally Lipschitz vector fields. We decompose the drift into its time-reversible and irreversible parts, or equivalently, the generator into symmetric and antisymmetric operators. We show the equivalence with a decomposition of the backward Kolmogorov equation considered in hypocoercivity theory, and a decomposition of the Fokker-Planck equation in GENERIC form. The main result shows that when the time-irreversible part of the drift is in the range of the volatility matrix (almost everywhere) the forward and time-reversed path space measures of the process are mutually equivalent, and evaluates the entropy production. When this does not hold, the measures are mutually singular and the entropy production is infinite. We verify these results using exact numerical simulations of linear diffusions. We illustrate the discrepancy between the entropy production of non-linear diffusions and their numerical simulations in several examples and illustrate how the entropy production can be used for accurate numerical simulation. Finally, we discuss the relationship between time-irreversibility and sampling efficiency, and how we can modify the definition of entropy production to score how far a process is from being generalised reversible.
The color video inpainting problem is one of the most challenging problem in the modern imaging science. It aims to recover a color video from a small part of pixels that may contain noise. However, there are less of robust models that can simultaneously preserve the coupling of color channels and the evolution of color video frames. In this paper, we present a new robust quaternion tensor completion (RQTC) model to solve this challenging problem and derive the exact recovery theory. The main idea is to build a quaternion tensor optimization model to recover a low-rank quaternion tensor that represents the targeted color video and a sparse quaternion tensor that represents noise. This new model is very efficient to recover high dimensional data that satisfies the prior low-rank assumption. To solve the case without low-rank property, we introduce a new low-rank learning RQTC model, which rearranges similar patches classified by a quaternion learning method into smaller tensors satisfying the prior low-rank assumption. We also propose fast algorithms with global convergence guarantees. In numerical experiments, the proposed methods successfully recover color videos with eliminating color contamination and keeping the continuity of video scenery, and their solutions are of higher quality in terms of PSNR and SSIM values than the state-of-the-art algorithms.
Despite the growing interest in parallel-in-time methods as an approach to accelerate numerical simulations in atmospheric modelling, improving their stability and convergence remains a substantial challenge for their application to operational models. In this work, we study the temporal parallelization of the shallow water equations on the rotating sphere combined with time-stepping schemes commonly used in atmospheric modelling due to their stability properties, namely an Eulerian implicit-explicit (IMEX) method and a semi-Lagrangian semi-implicit method (SL-SI-SETTLS). The main goal is to investigate the performance of parallel-in-time methods, namely Parareal and Multigrid Reduction in Time (MGRIT), when these well-established schemes are used on the coarse discretization levels and provide insights on how they can be improved for better performance. We begin by performing an analytical stability study of Parareal and MGRIT applied to a linearized ordinary differential equation depending on the choice of coarse scheme. Next, we perform numerical simulations of two standard tests to evaluate the stability, convergence and speedup provided by the parallel-in-time methods compared to a fine reference solution computed serially. We also conduct a detailed investigation on the influence of artificial viscosity and hyperviscosity approaches, applied on the coarse discretization levels, on the performance of the temporal parallelization. Both the analytical stability study and the numerical simulations indicate a poorer stability behaviour when SL-SI-SETTLS is used on the coarse levels, compared to the IMEX scheme. With the IMEX scheme, a better trade-off between convergence, stability and speedup compared to serial simulations can be obtained under proper parameters and artificial viscosity choices, opening the perspective of the potential competitiveness for realistic models.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.