The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. We introduce a framework for extending techniques of multivariate analysis to such settings. The proposed framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as B-splines. This is formally identical to the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We describe how to translate the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering to the continuous-time setting. We illustrate the methods with a novel perspective on a well-known Canadian weather data set, and with applications to neurobiological and environmetric data. The methods are implemented in the publicly available R package \texttt{ctmva}.
Simulating physical problems involving multi-time scale coupling is challenging due to the need of solving these multi-time scale processes simultaneously. In response to this challenge, this paper proposed an explicit multi-time step algorithm coupled with a solid dynamic relaxation scheme. The explicit scheme simplifies the equation system in contrast to the implicit scheme, while the multi-time step algorithm allows the equations of different physical processes to be solved under different time step sizes. Furthermore, an implicit viscous damping relaxation technique is applied to significantly reduce computational iterations required to achieve equilibrium in the comparatively fast solid response process. To validate the accuracy and efficiency of the proposed algorithm, two distinct scenarios, i.e., a nonlinear hardening bar stretching and a fluid diffusion coupled with Nafion membrane flexure, are simulated. The results show good agreement with experimental data and results from other numerical methods, and the simulation time is reduced firstly by independently addressing different processes with the multi-time step algorithm and secondly decreasing solid dynamic relaxation time through the incorporation of damping techniques.
For a singular integral equation on an interval of the real line, we study the behavior of the error of a delta-delta discretization. We show that the convergence is non-uniform, between order $O(h^{2})$ in the interior of the interval and a boundary layer where the consistency error does not tend to zero.
Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.
Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
In 1-equation URANS models of turbulence the eddy viscosity is given by $\nu_{T}=0.55l(x,t)\sqrt{k(x,t)}$ . The length scale $l$ must be pre-specified and $k(x,t)$ is determined by solving a nonlinear partial differential equation. We show that in interesting cases the spacial mean of $k(x,t)$ satisfies a simple ordinary differential equation. Using its solution in $\nu_{T}$ results in a 1/2-equation model. This model has attractive analytic properties. Further, in comparative tests in 2d and 3d the velocity statistics produced by the 1/2-equation model are comparable to those of the full 1-equation model.
Binary responses arise in a multitude of statistical problems, including binary classification, bioassay, current status data problems and sensitivity estimation. There has been an interest in such problems in the Bayesian nonparametrics community since the early 1970s, but inference given binary data is intractable for a wide range of modern simulation-based models, even when employing MCMC methods. Recently, Christensen (2023) introduced a novel simulation technique based on counting permutations, which can estimate both posterior distributions and marginal likelihoods for any model from which a random sample can be generated. However, the accompanying implementation of this technique struggles when the sample size is too large (n > 250). Here we present perms, a new implementation of said technique which is substantially faster and able to handle larger data problems than the original implementation. It is available both as an R package and a Python library. The basic usage of perms is illustrated via two simple examples: a tractable toy problem and a bioassay problem. A more complex example involving changepoint analysis is also considered. We also cover the details of the implementation and illustrate the computational speed gain of perms via a simple simulation study.
The nonlocality of the fractional operator causes numerical difficulties for long time computation of the time-fractional evolution equations. This paper develops a high-order fast time-stepping discontinuous Galerkin finite element method for the time-fractional diffusion equations, which saves storage and computational time. The optimal error estimate $O(N^{-p-1} + h^{m+1} + \varepsilon N^{r\alpha})$ of the current time-stepping discontinuous Galerkin method is rigorous proved, where $N$ denotes the number of time intervals, $p$ is the degree of polynomial approximation on each time subinterval, $h$ is the maximum space step, $r\ge1$, $m$ is the order of finite element space, and $\varepsilon>0$ can be arbitrarily small. Numerical simulations verify the theoretical analysis.
In Bayesian statistics, posterior contraction rates (PCRs) quantify the speed at which the posterior distribution concentrates on arbitrarily small neighborhoods of a true model, in a suitable way, as the sample size goes to infinity. In this paper, we develop a new approach to PCRs, with respect to strong norm distances on parameter spaces of functions. Critical to our approach is the combination of a local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, which allows to set forth an interesting connection between PCRs and some classical problems arising in mathematical analysis, probability and statistics, e.g., Laplace methods for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of mean Glivenko-Cantelli theorems, and estimates of weighted Poincar\'e-Wirtinger constants. We first present a theorem on PCRs for a model in the regular infinite-dimensional exponential family, which exploits sufficient statistics of the model, and then extend such a theorem to a general dominated model. These results rely on the development of novel techniques to evaluate Laplace integrals and weighted Poincar\'e-Wirtinger constants in infinite-dimension, which are of independent interest. The proposed approach is applied to the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our approach leads to optimal PCRs in finite-dimensional models, whereas for infinite-dimensional models it is shown explicitly how the prior distribution affect PCRs.
Recently, quantum computing experiments have for the first time exceeded the capability of classical computers to perform certain computations -- a milestone termed "quantum computational advantage." However, verifying the output of the quantum device in these experiments required extremely large classical computations. An exciting next step for demonstrating quantum capability would be to implement tests of quantum computational advantage with efficient classical verification, such that larger system sizes can be tested and verified. One of the first proposals for an efficiently-verifiable test of quantumness consists of hiding a secret classical bitstring inside a circuit of the class IQP, in such a way that samples from the circuit's output distribution are correlated with the secret (arXiv:0809.0847). The classical hardness of this protocol has been supported by evidence that directly simulating IQP circuits is hard, but the security of the protocol against other (non-simulating) classical attacks has remained an open question. In this work we demonstrate that the protocol is not secure against classical forgery. We describe a classical algorithm that can not only convince the verifier that the (classical) prover is quantum, but can in fact can extract the secret key underlying a given protocol instance. Furthermore, we show that the key extraction algorithm is efficient in practice for problem sizes of hundreds of qubits. Finally, we provide an implementation of the algorithm, and give the secret vector underlying the "$25 challenge" posted online by the authors of the original paper.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.