This note reports partial results related to the Gaussian product inequality (GPI) conjecture for the joint distribution of traces of Wishart matrices. In particular, several GPI-related results from Wei (2014) and Liu et al. (2015) are extended in two ways: by replacing the power functions with more general classes of functions and by replacing the usual Gaussian and multivariate gamma distributional assumptions by the more general trace-Wishart distribution assumption. These findings suggest that a Kronecker product form of the GPI holds for diagonal blocks of any Wishart distribution.
Consider the sum $Y=B+B(H)$ of a Brownian motion $B$ and an independent fractional Brownian motion $B(H)$ with Hurst parameter $H\in(0,1)$. Surprisingly, even though $B(H)$ is not a semimartingale, Cheridito proved in [Bernoulli 7 (2001) 913--934] that $Y$ is a semimartingale if $H>3/4$. Moreover, $Y$ is locally equivalent to $B$ in this case, so $H$ cannot be consistently estimated from local observations of $Y$. This paper pivots on a second surprise in this model: if $B$ and $B(H)$ become correlated, then $Y$ will never be a semimartingale, and $H$ can be identified, regardless of its value. This and other results will follow from a detailed statistical analysis of a more general class of processes called mixed semimartingales, which are semiparametric extensions of $Y$ with stochastic volatility in both the martingale and the fractional component. In particular, we derive consistent estimators and feasible central limit theorems for all parameters and processes that can be identified from high-frequency observations. We further show that our estimators achieve optimal rates in a minimax sense. The estimation of mixed semimartingales with correlation is motivated by applications to high-frequency financial data contaminated by rough noise.
Many proposals for the identification of causal effects in the presence of unmeasured confounding require an instrumental variable or negative control that satisfies strong, untestable exclusion restrictions. In this paper, we will instead show how one can identify causal effects for a point exposure by using a measured confounder as a 'bespoke instrumental variable'. This strategy requires an external reference population that does not have access to the exposure, and a stability condition on the confounderoutcome association between reference and target populations. Building on recent identification results of Richardson and Tchetgen Tchetgen (2021), we develop the semiparametric efficiency theory for a general bespoke instrumental variable model, and obtain a multiply robust locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study, concerning atomic bomb survivors in Japan.
Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric $\sqrt{n}$-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples.
Importance sampling (IS) is valuable in reducing the variance of Monte Carlo sampling for many areas, including finance, rare event simulation, and Bayesian inference. It is natural and obvious to combine quasi-Monte Carlo (QMC) methods with IS to achieve a faster rate of convergence. However, a naive replacement of Monte Carlo with QMC may not work well. This paper investigates the convergence rates of randomized QMC-based IS for estimating integrals with respect to a Gaussian measure, in which the IS measure is a Gaussian or $t$ distribution. We prove that if the target function satisfies the so-called boundary growth condition and the covariance matrix of the IS density has eigenvalues no smaller than 1, then randomized QMC with the Gaussian proposal has a root mean squared error of $O(N^{-1+\epsilon})$ for arbitrarily small $\epsilon>0$. Similar results of $t$ distribution as the proposal are also established. These sufficient conditions help to assess the effectiveness of IS in QMC. For some particular applications, we find that the Laplace IS, a very general approach to approximate the target function by a quadratic Taylor approximation around its mode, has eigenvalues smaller than 1, making the resulting integrand less favorable for QMC. From this point of view, when using Gaussian distributions as the IS proposal, a change of measure via Laplace IS may transform a favorable integrand into unfavorable one for QMC although the variance of Monte Carlo sampling is reduced. We also give some examples to verify our propositions and warn against naive replacement of MC with QMC under IS proposals. Numerical results suggest that using Laplace IS with $t$ distributions is more robust than that with Gaussian distributions.
In a desired environmental protection system, groundwater may not be excluded. In addition to the problem of over-exploitation, in total disagreement with the concept of sustainable development, another not negligible issue concerns the groundwater contamination. Mainly, this aspect is due to intensive agricultural activities or industrialized areas. In literature, several papers have dealt with transport problem, especially for inverse problems in which the release history or the source location are identified. The innovative aim of the paper is to develop a data-driven model that is able to analyze multiple scenarios, even strongly non-linear, in order to solve forward and inverse transport problems, preserving the reliability of the results and reducing the uncertainty. Furthermore, this tool has the characteristic of providing extremely fast responses, essential to identify remediation strategies immediately. The advantages produced by the model were compared with literature studies. In this regard, a feedforward artificial neural network, which has been trained to handle different cases, represents the data-driven model. Firstly, to identify the concentration of the pollutant at specific observation points in the study area (forward problem); secondly, to deal with inverse problems identifying the release history at known source location; then, in case of one contaminant source, identifying the release history and, at the same time, the location of the source in a specific sub-domain of the investigated area. At last, the observation error is investigated and estimated. The results are satisfactorily achieved, highlighting the capability of the ANN to deal with multiple scenarios by approximating nonlinear functions without the physical point of view that describes the phenomenon, providing reliable results, with very low computational burden and uncertainty.
A new decomposition method for nonstationary signals, named Adaptive Local Iterative Filtering (ALIF), has been recently proposed in the literature. Given its similarity with the Empirical Mode Decomposition (EMD) and its more rigorous mathematical structure, which makes feasible to study its convergence compared to EMD, ALIF has really good potentiality to become a reference method in the analysis of signals containing strong nonstationary components, like chirps, multipaths and whistles, in many applications, like Physics, Engineering, Medicine and Finance, to name a few. In [11], the authors analyzed the spectral properties of the matrices produced by the ALIF method, in order to study its stability. Various results are achieved in that work through the use of Generalized Locally Toeplitz (GLT) sequences theory, a powerful tool originally designed to extract information on the asymptotic behavior of the spectra for PDE discretization matrices. In this manuscript we focus on answering some of the open questions contained in [11], and in doing so, we also develop new theory and results for the GLT sequences.
Measurement error in the covariate of main interest (e.g. the exposure variable, or the risk factor) is common in epidemiologic and health studies. It can effect the relative risk estimator or other types of coefficients derived from the fitted regression model. In order to perform a measurement error analysis, one needs information about the error structure. Two sources of validation data are an internal subset of the main data, and external or independent study. For the both sources, the true covariate is measured (that is, without error), or alternatively, its surrogate, which is error-prone covariate, is measured several times (repeated measures). This paper compares the precision in estimation via the different validation sources in the Cox model with a changepoint in the main covariate, using the bias correction methods RC and RR. The theoretical properties under each validation source is presented. In a simulation study it is found that the best validation source in terms of smaller mean square error and narrower confidence interval is the internal validation with measure of the true covariate in a common disease case, and the external validation with repeated measures of the surrogate for a rare disease case. In addition, it is found that addressing the correlation between the true covariate and its surrogate, and the value of the changepoint, is needed, especially in the rare disease case.
Historically used in settings where the outcome is rare or data collection is expensive, outcome-dependent sampling is relevant to many modern settings where data is readily available for a biased sample of the target population, such as public administrative data. Under outcome-dependent sampling, common effect measures such as the average risk difference and the average risk ratio are not identified, but the conditional odds ratio is. Aggregation of the conditional odds ratio is challenging since summary measures are generally not identified. Furthermore, the marginal odds ratio can be larger (or smaller) than all conditional odds ratios. This so-called non-collapsibility of the odds ratio is avoidable if we use an alternative aggregation to the standard arithmetic mean. We provide a new definition of collapsibility that makes this choice of aggregation method explicit, and we demonstrate that the odds ratio is collapsible under geometric aggregation. We describe how to partially identify, estimate, and do inference on the geometric odds ratio under outcome-dependent sampling. Our proposed estimator is based on the efficient influence function and therefore has doubly robust-style properties.
In this paper, we consider a class of symmetry groups associated to communication channels, which can informally be viewed as the transformations of the set of inputs that ``commute'' with the action of the channel. These groups were first studied by Polyanskiy in (IEEEToIT 2013). We show the simple result that the input distribution that attains the maximum mutual information for a given channel is a ``fixed point'' of its group. We conjecture (and give empirical evidence) that the channel group of the deletion channel is extremely small (it contains a number of elements constant in the blocklength). We prove a special case of this conjecture. This serves as some formal justification for why the analysis of the binary deletion channel has proved much more difficult than its memoryless counterparts.
In a series of recent papers the spectral behavior of the matrix sequence $\{Y_nT_n(f)\}$ is studied in the sense of the spectral distribution, where $Y_n$ is the main antidiagonal (or flip matrix) and $T_n(f)$ is the Toeplitz matrix generated by the function $f$, with $f$ being Lebesgue integrable and with real Fourier coefficients. This kind of study is also motivated by computational purposes for the solution of the related large linear systems using the (preconditioned) MINRES algorithm. Here we complement the spectral study with more results holding both asymptotically and for a fixed dimension $n$, and with regard to eigenvalues, singular values, and eigenvectors of $T_n(f), Y_nT_n(f)$ and to several relationships among them: beside fast linear solvers, a further target is the design of ad hoc procedures for the computation of the related spectra via matrix-less algorithms, with a cost being linear in the number of computed eigenvalues. We emphasize that the challenge of the case of non-monotone generating functions is considered in the current work, for which the previous matrix-less algorithms fail. Numerical experiments are reported and commented, with the aim of showing in a visual way the theoretical analysis.