Many numerical problems with input $x$ and output $y$ can be formulated as an system of equations $F(x, y) = 0$ where the goal is to solve for $y$. The condition number measures the change of $y$ for small perturbations to $x$. From this numerical problem, one can derive a (typically underdetermined) subproblem by omitting any number of constraints from $F$. We propose a condition number for underdetermined systems that relates the condition number of a numerical problem to those of its subproblems. We illustrate the use of our technique by computing the condition of two problems that do not have a finite condition number in the classic sense: any two-factor matrix decompositions and Tucker decompositions.
The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures, which are available in a large dataset allowing for statistical mechanics analysis. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost always 3 or 4, similar to the case of river bifurcation as reported by Strahler (1957) and Horton (1945). From the theory behind the number, we show that it is the lower limit of the amount of memory required to process sentences under a particular model. A mathematical analysis of random trees provides a further conjecture on the nature of the Strahler number, revealing that it is not a constant but grows logarithmically. This finding uncovers the statistical basics behind the Strahler number as a characteristic of a general tree structure target.
We present an exponentially convergent numerical method to approximate the solution of the Cauchy problem for the inhomogeneous fractional differential equation with an unbounded operator coefficient and Caputo fractional derivative in time. The numerical method is based on the newly obtained solution formula that consolidates the mild solution representations of sub-parabolic, parabolic and sub-hyperbolic equations with sectorial operator coefficient $A$ and non-zero initial data. The involved integral operators are approximated using the sinc-quadrature formulas that are tailored to the spectral parameters of $A$, fractional order $\alpha$ and the smoothness of the first initial condition, as well as to the properties of the equation's right-hand side $f(t)$. The resulting method possesses exponential convergence for positive sectorial $A$, any finite $t$, including $t = 0$ and the whole range $\alpha \in (0,2)$. It is suitable for a practically important case, when no knowledge of $f(t)$ is available outside the considered interval $t \in [0, T]$. The algorithm of the method is capable of multi-level parallelism. We provide numerical examples that confirm the theoretical error estimates.
Tensor networks have been an important concept and technique in many research areas, such as quantum computation and machine learning. We study the exponential complexity of contracting tensor networks on two special graph structures: planar graphs and finite element graphs. We prove that any finite element graph has a $O(d\sqrt{\max\{\Delta,d\}N})$ size edge separator. Furthermore, we develop a $2^{O(d\sqrt{\max\{\Delta,d\}N})}$ time algorithm to contracting a tensor network consisting of $N$ Boolean tensors, whose underlying graph is a finite element graph with maximum degree $\Delta$ and has no face with more than $d$ boundary edges in the planar skeleton, based on the $2^{O(\sqrt{\Delta N})}$ time algorithm \cite{fastcounting} for planar Boolean tensor network contractions. We use two methods to accelerate the exponential algorithms by transferring high-dimensional tensors to low-dimensional tensors. We put up a $O(k)$ size planar gadget for any Boolean symmetric tensor of dimension $k$, where the gadget only consists of Boolean tensors with dimension no more than $5$. Another method is decomposing any tensor into a series of vectors (unary functions), according to its \emph{CP decomposition} \cite{tensor-rank}. We also prove the sub-exponential time lower bound for contracting tensor networks under the counting \emph{Exponential Time Hypothesis} (\#ETH) holds.
The goal of this work is to study waves interacting with partially immersed objects allowed to move freely in the vertical direction, and in a regime in which the propagation of the waves is described by the one dimensional Boussinesq-Abbott system. The problem can be reduced to a transmission problem for this Boussinesq system, in which the transmission conditions between the components of the domain at the left and at the right of the object are determined through the resolution of coupled forced ODEs in time satisfied by the vertical displacement of the object and the average discharge in the portion of the fluid located under the object. We propose a new extended formulation in which these ODEs are complemented by two other forced ODEs satisfied by the trace of the surface elevation at the contact points. The interest of this new extended formulation is that the forcing terms are easy to compute numerically and that the surface elevation at the contact points is furnished for free. Based on this formulation, we propose a second order scheme that involves a generalization of the MacCormack scheme with nonlocal flux and a source term, which is coupled to a second order Heun scheme for the ODEs. In order to validate this scheme, several explicit solutions for this wave-structure interaction problem are derived and can serve as benchmark for future codes. As a byproduct, our method provides a second order scheme for the generation of waves at the entrance of the numerical domain for the Boussinesq-Abbott system.
Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.
Ensuring the long-term reproducibility of data analyses requires results stability tests to verify that analysis results remain within acceptable variation bounds despite inevitable software updates and hardware evolutions. This paper introduces a numerical variability approach for results stability tests, which determines acceptable variation bounds using random rounding of floating-point calculations. By applying the resulting stability test to \fmriprep, a widely-used neuroimaging tool, we show that the test is sensitive enough to detect subtle updates in image processing methods while remaining specific enough to accept numerical variations within a reference version of the application. This result contributes to enhancing the reliability and reproducibility of data analyses by providing a robust and flexible method for stability testing.
Statistical data by their very nature are indeterminate in the sense that if one repeated the process of collecting the data the new data set would be somewhat different from the original. Therefore, a statistical method, a map $\Phi$ taking a data set $x$ to a point in some space F, should be stable at $x$: Small perturbations in $x$ should result in a small change in $\Phi(x)$. Otherwise, $\Phi$ is useless at $x$ or -- and this is important -- near $x$. So one doesn't want $\Phi$ to have "singularities," data sets $x$ such that the the limit of $\Phi(y)$ as $y$ approaches $x$ doesn't exist. (Yes, the same issue arises elsewhere in applied math.) However, broad classes of statistical methods have topological obstructions of continuity: They must have singularities. We show why and give lower bounds on the Hausdorff dimension, even Hausdorff measure, of the set of singularities of such data maps. There seem to be numerous examples. We apply mainly topological methods to study the (topological) singularities of functions defined (on dense subsets of) "data spaces" and taking values in spaces with nontrivial homology. At least in this book, data spaces are usually compact manifolds. The purpose is to gain insight into the numerical conditioning of statistical description, data summarization, and inference and learning methods. We prove general results that can often be used to bound below the dimension of the singular set. We apply our topological results to develop lower bounds on Hausdorff measure of the singular set. We apply these methods to the study of plane fitting and measuring location of data on spheres. This is not a "final" version, merely another attempt.
Nowadays, numerical models are widely used in most of engineering fields to simulate the behaviour of complex systems, such as for example power plants or wind turbine in the energy sector. Those models are nevertheless affected by uncertainty of different nature (numerical, epistemic) which can affect the reliability of their predictions. We develop here a new method for quantifying conditional parameter uncertainty within a chain of two numerical models in the context of multiphysics simulation. More precisely, we aim to calibrate the parameters $\theta$ of the second model of the chain conditionally on the value of parameters $\lambda$ of the first model, while assuming the probability distribution of $\lambda$ is known. This conditional calibration is carried out from the available experimental data of the second model. In doing so, we aim to quantify as well as possible the impact of the uncertainty of $\lambda$ on the uncertainty of $\theta$. To perform this conditional calibration, we set out a nonparametric Bayesian formalism to estimate the functional dependence between $\theta$ and $\lambda$, denoted $\theta(\lambda)$. First, each component of $\theta(\lambda)$ is assumed to be the realization of a Gaussian process prior. Then, if the second model is written as a linear function of $\theta(\lambda)$, the Bayesian machinery allows us to compute analytically the posterior predictive distribution of $\theta(\lambda)$ for any set of realizations $\lambda$. The effectiveness of the proposed method is illustrated on several analytical examples.
In this paper, we devise a scheme for kernelizing, in sublinear space and polynomial time, various problems on planar graphs. The scheme exploits planarity to ensure that the resulting algorithms run in polynomial time and use O((sqrt(n) + k) log n) bits of space, where n is the number of vertices in the input instance and k is the intended solution size. As examples, we apply the scheme to Dominating Set and Vertex Cover. For Dominating Set, we also show that a well-known kernelization algorithm due to Alber et al. (JACM 2004) can be carried out in polynomial time and space O(k log n). Along the way, we devise restricted-memory procedures for computing region decompositions and approximating the aforementioned problems, which might be of independent interest.
The volume function V(t) of a compact set S\in R^d is just the Lebesgue measure of the set of points within a distance to S not larger than t. According to some classical results in geometric measure theory, the volume function turns out to be a polynomial, at least in a finite interval, under a quite intuitive, easy to interpret, sufficient condition (called ``positive reach'') which can be seen as an extension of the notion of convexity. However, many other simple sets, not fulfilling the positive reach condition, have also a polynomial volume function. To our knowledge, there is no general, simple geometric description of such sets. Still, the polynomial character of $V(t)$ has some relevant consequences since the polynomial coefficients carry some useful geometric information. In particular, the constant term is the volume of S and the first order coefficient is the boundary measure (in Minkowski's sense). This paper is focused on sets whose volume function is polynomial on some interval starting at zero, whose length (that we call ``polynomial reach'') might be unknown. Our main goal is to approximate such polynomial reach by statistical means, using only a large enough random sample of points inside S. The practical motivation is simple: when the value of the polynomial reach , or rather a lower bound for it, is approximately known, the polynomial coefficients can be estimated from the sample points by using standard methods in polynomial approximation. As a result, we get a quite general method to estimate the volume and boundary measure of the set, relying only on an inner sample of points and not requiring the use any smoothing parameter. This paper explores the theoretical and practical aspects of this idea.