In this paper, we investigate the almost sure convergence, in supremum norm, of the rank-based linear wavelet estimator for a multivariate copula density. Based on empirical process tools, we prove a uniform limit law for the deviation, from its expectation, of an oracle estimator (obtained for known margins), from which we derive the exact convergence rate of the rank-based linear estimator. This rate reveals to be optimal in a minimax sense over Besov balls for the supremum norm loss, whenever the resolution level is suitably chosen.
Gaussian copula mixture models (GCMM) are the generalization of Gaussian Mixture models using the concept of copula. Its mathematical definition is given and the properties of likelihood function are studied in this paper. Based on these properties, extended Expectation Maximum algorithms are developed for estimating parameters for the mixture of copulas while marginal distributions corresponding to each component is estimated using separate nonparametric statistical methods. In the experiment, GCMM can achieve better goodness-of-fitting given the same number of clusters as GMM; furthermore, GCMM can utilize unsynchronized data on each dimension to achieve deeper mining of data.
This work studies an experimental design problem where {the values of a predictor variable, denoted by $x$}, are to be determined with the goal of estimating a function $m(x)$, which is observed with noise. A linear model is fitted to $m(x)$ but it is not assumed that the model is correctly specified. It follows that the quantity of interest is the best linear approximation of $m(x)$, which is denoted by $\ell(x)$. It is shown that in this framework the ordinary least squares estimator typically leads to an inconsistent estimation of $\ell(x)$, and rather weighted least squares should be considered. An asymptotic minimax criterion is formulated for this estimator, and a design that minimizes the criterion is constructed. An important feature of this problem is that the $x$'s should be random, rather than fixed. Otherwise, the minimax risk is infinite. It is shown that the optimal random minimax design is different from its deterministic counterpart, which was studied previously, and a simulation study indicates that it generally performs better when $m(x)$ is a quadratic or a cubic function. Another finding is that when the variance of the noise goes to infinity, the random and deterministic minimax designs coincide. The results are illustrated for polynomial regression models and the general case is also discussed.
An inner-product Hilbert space formulation of the Kemeny distance is defined over the domain of all permutations with ties upon the extended real line, and results in an unbiased minimum variance (Gauss-Markov) correlation estimator upon a homogeneous i.i.d. sample. In this work, we construct and prove the necessary requirements to extend this linear topology for both Spearman's \(\rho\) and Kendall's \(\tau_{b}\), showing both spaces to be both biased and inefficient upon practical data domains. A probability distribution is defined for the Kemeny \(\tau_{\kappa}\) estimator, and a Studentisation adjustment for finite samples is provided as well. This work allows for a general purpose linear model duality to be identified as a unique consistent solution to many biased and unbiased estimation scenarios.
Fern\'andez-Dur\'an and Gregorio-Dom\'inguez (2014) defined a family of probability distributions for a vector of circular random variables by considering multiple nonnegative trigonometric sums. These distributions are highly flexible and can present numerous modes and skewness. Several operations on these multivariate distributions were translated into operations on the vector of parameters; for instance, marginalization involves calculating the eigenvectors and eigenvalues of a matrix, and independence among subsets of the vector of circular variables translates to a Kronecker product of the corresponding subsets of the vector of parameters. The derivation of marginal and conditional densities from the joint multivariate density is important when applying this model in practice to real datasets. A goodness-of-fit test based on the characteristic function and an alternative parameter estimation algorithm for high-dimensional circular data was presented and applied to a real dataset on the daily times of occurrence of maxima and minima of prices in financial markets.
We characterize the uniqueness condition in the hardcore model for bipartite graphs with degree bounds only on one side, and provide a nearly linear time sampling algorithm that works up to the uniqueness threshold. We show that the uniqueness threshold for bipartite graph has almost the same form of the tree uniqueness threshold for general graphs, except with degree bounds only on one side of the bipartition. The hardcore model from statistical physics can be seen as a weighted enumeration of independent sets. Its bipartite version (#BIS) is a central open problem in approximate counting. Compared to the same problem in a general graph, surprising tractable regime have been identified that are believed to be hard in general. This is made possible by two lines of algorithmic approach: the high-temperature algorithms starting from Liu and Lu (STOC 2015), and the low-temperature algorithms starting from Helmuth, Perkins, and Regts (STOC 2019). In this work, we study the limit of these algorithms in the high-temperature case. Our characterization of the uniqueness condition is obtained by proving decay of correlations for arguably the best possible regime, which involves locating fixpoints of multivariate iterative rational maps and showing their contraction. We also give a nearly linear time sampling algorithm based on simulating field dynamics only on one side of the bipartite graph that works up to the uniqueness threshold. Our algorithm is very different from the original high-temperature algorithm of Liu and Lu, and it makes use of a connection between correlation decay and spectral independence of Markov chains. Last but not the least, we are able to show that the standard Glauber dynamics on both side of the bipartite graph mixes in polynomial time up to the uniqueness.
In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.
Chernoff approximations are a flexible and powerful tool of functional analysis, which can be used, in particular, to find numerically approximate solutions of some differential equations with variable coefficients. For many classes of equations such approximations have already been constructed since pioneering papers of Prof. O.G.Somlyanov in 2000, however, the speed of their convergence to the exact solution has not been properly studied. We select the heat equation (because its exact solutions are already known) as a simple yet informative model example for the study of the rate of convergence of Chernoff approximations. Examples illustrating the rate of convergence of Chernoff approximations to the solution of the Cauchy problem for the heat equation are constructed in the paper. Numerically we show that for initial conditions that are smooth enough the order of approximation is equal to the order of Chernoff tangency of the Chernoff function used. We also consider not smooth enough initial conditions and show how H\"older class of initial condition is related to the rate of convergence. This method of study in the future can be applied to general second order parabolic equation with variable coefficients by a slight modification of our Python 3 code. This arXiv version of the text is a supplementary material for our journal article. Here we include all the written text from the article and additionally all illustrations (Appendix A) and full text of the Python 3 code (Appendix B).
We study the sharp interface limit of the stochastic Cahn-Hilliard equation with cubic double-well potential and additive space-time white noise $\epsilon^{\sigma}\dot{W}$ where $\epsilon>0$ is an interfacial width parameter. We prove that, for sufficiently large scaling constant $\sigma >0$, the stochastic Cahn-Hilliard equation converges to the deterministic Mullins-Sekerka/Hele-Shaw problem for $\epsilon\rightarrow 0$. The convergence is shown in suitable fractional Sobolev norms as well as in the $L^p$-norm for $p\in (2, 4]$ in spatial dimension $d=2,3$. This generalizes the existing result for the space-time white noise to dimension $d=3$ and improves the existing results for smooth noise, which were so far limited to $p\in \left(2, frac{2d+8}{d+2}\right]$ in spatial dimension $d=2,3$. As a byproduct of the analysis of the stochastic problem with space-time white noise, we identify minimal regularity requirements on the noise which allow convergence to the sharp interface limit in the $\mathbb{H}^1$-norm and also provide improved convergence estimates for the sharp interface limit of the deterministic problem.
According to ICH Q8 guidelines, the biopharmaceutical manufacturer submits a design space (DS) definition as part of the regulatory approval application, in which case process parameter (PP) deviations within this space are not considered a change and do not trigger a regulatory post approval procedure. A DS can be described by non-linear PP ranges, i.e., the range of one PP conditioned on specific values of another. However, independent PP ranges (linear combinations) are often preferred in biopharmaceutical manufacturing due to their operation simplicity. While some statistical software supports the calculation of a DS comprised of linear combinations, such methods are generally based on discretizing the parameter space - an approach that scales poorly as the number of PPs increases. Here, we introduce a novel method for finding linear PP combinations using a numeric optimizer to calculate the largest design space within the parameter space that results in critical quality attribute (CQA) boundaries within acceptance criteria, predicted by a regression model. A precomputed approximation of tolerance intervals is used in inequality constraints to facilitate fast evaluations of this boundary using a single matrix multiplication. Correctness of the method was validated against different ground truths with known design spaces. Compared to stateof-the-art, grid-based approaches, the optimizer-based procedure is more accurate, generally yields a larger DS and enables the calculation in higher dimensions. Furthermore, a proposed weighting scheme can be used to favor certain PPs over others and therefore enabling a more dynamic approach to DS definition and exploration. The increased PP ranges of the larger DS provide greater operational flexibility for biopharmaceutical manufacturers.
This paper investigates the mean square error (MSE)-optimal conditional mean estimator (CME) in one-bit quantized systems in the context of channel estimation with jointly Gaussian inputs. We analyze the relationship of the generally nonlinear CME to the linear Bussgang estimator, a well-known method based on Bussgang's theorem. We highlight a novel observation that the Bussgang estimator is equal to the CME for different special cases, including the case of univariate Gaussian inputs and the case of multiple pilot signals in the absence of additive noise prior to the quantization. For the general cases we conduct numerical simulations to quantify the gap between the Bussgang estimator and the CME. This gap increases for higher dimensions and longer pilot sequences. We propose an optimal pilot sequence, motivated by insights from the CME, and derive a novel closed-form expression of the MSE for that case. Afterwards, we find a closed-form limit of the MSE in the asymptotically large number of pilots regime that also holds for the Bussgang estimator. Lastly, we present numerical experiments for various system parameters and for different performance metrics which illuminate the behavior of the optimal channel estimator in the quantized regime. In this context, the well-known stochastic resonance effect that appears in quantized systems can be quantified.