Given $n$ independent and identically distributed observations and measuring the value of obtaining an additional observation in terms of Le Cam's notion of deficiency between experiments, we show for certain types of non-parametric experiments that the value of an additional observation decreases at a rate of $1/\sqrt{n}$. This is distinct from the known typical decrease at a rate of $1/n$ for parametric experiments and the non-decreasing value in the case of very large experiments. In particular, the rate of $1/\sqrt{n}$ holds for the experiment given by observing samples from a density about which we know only that it is bounded from below by some fixed constant. Thus there exists an experiment where the value of additional observations tends to zero but for which no estimator that is consistent (in total variation distance) exists.
We show that the problem of counting the number of $n$-variable unate functions reduces to the problem of counting the number of $n$-variable monotone functions. Using recently obtained results on $n$-variable monotone functions, we obtain counts of $n$-variable unate functions up to $n=9$. We use an enumeration strategy to obtain the number of $n$-variable balanced monotone functions up to $n=7$. We show that the problem of counting the number of $n$-variable balanced unate functions reduces to the problem of counting the number of $n$-variable balanced monotone functions, and consequently, we obtain the number of $n$-variable balanced unate functions up to $n=7$. Using enumeration, we obtain the numbers of equivalence classes of $n$-variable balanced monotone functions, unate functions and balanced unate functions up to $n=6$. Further, for each of the considered sub-class of $n$-variable monotone and unate functions, we also obtain the corresponding numbers of $n$-variable non-degenerate functions.
Branching process inspired models are widely used to estimate the effective reproduction number -- a useful summary statistic describing an infectious disease outbreak -- using counts of new cases. Case data is a real-time indicator of changes in the reproduction number, but is challenging to work with because cases fluctuate due to factors unrelated to the number of new infections. We develop a new model that incorporates the number of diagnostic tests as a surveillance model covariate. Using simulated data and data from the SARS-CoV-2 pandemic in California, we demonstrate that incorporating tests leads to improved performance over the state-of-the-art.
I propose an alternative algorithm to compute the MMS voting rule. Instead of using linear programming, in this new algorithm the maximin support value of a committee is computed using a sequence of maximum flow problems.
We investigate a class of parametric elliptic semilinear partial differential equations of second order with homogeneous essential boundary conditions, where the coefficients and the right-hand side (and hence the solution) may depend on a parameter. This model can be seen as a reaction-diffusion problem with a polynomial nonlinearity in the reaction term. The efficiency of various numerical approximations across the entire parameter space is closely related to the regularity of the solution with respect to the parameter. We show that if the coefficients and the right-hand side are analytic or Gevrey class regular with respect to the parameter, the same type of parametric regularity is valid for the solution. The key ingredient of the proof is the combination of the alternative-to-factorial technique from our previous work [1] with a novel argument for the treatment of the power-type nonlinearity in the reaction term. As an application of this abstract result, we obtain rigorous convergence estimates for numerical integration of semilinear reaction-diffusion problems with random coefficients using Gaussian and Quasi-Monte Carlo quadrature. Our theoretical findings are confirmed in numerical experiments.
We consider the problem of counting 4-cycles ($C_4$) in an undirected graph $G$ of $n$ vertices and $m$ edges (in bipartite graphs, 4-cycles are also often referred to as $\textit{butterflies}$). There have been a number of previous algorithms for this problem based on sorting the graph by degree and using randomized hash tables. These are appealing in theory due to compact storage and fast access on average. But, the performance of hash tables can degrade unpredictably and are also vulnerable to adversarial input. We develop a new simpler algorithm for counting $C_4$ requiring $O(m\bar\delta(G))$ time and $O(n)$ space, where $\bar \delta(G) \leq O(\sqrt{m})$ is the $\textit{average degeneracy}$ parameter introduced by Burkhardt, Faber \& Harris (2020). It has several practical improvements over previous algorithms; for example, it is fully deterministic, does not require any sorting of the input graph, and uses only addition and array access in its inner loops. To the best of our knowledge, all previous efficient algorithms for $C_4$ counting have required $\Omega(m)$ space in addition to storing the input graph. Our algorithm is very simple and easily adapted to count 4-cycles incident to each vertex and edge. Empirical tests demonstrate that our array-based approach is $4\times$ -- $7\times$ faster on average compared to popular hash table implementations.
Effective application of mathematical models to interpret biological data and make accurate predictions often requires that model parameters are identifiable. Approaches to assess the so-called structural identifiability of models are well-established for ordinary differential equation models, yet there are no commonly adopted approaches that can be applied to assess the structural identifiability of the partial differential equation (PDE) models that are requisite to capture spatial features inherent to many phenomena. The differential algebra approach to structural identifiability has recently been demonstrated to be applicable to several specific PDE models. In this brief article, we present general methodology for performing structural identifiability analysis on partially observed linear reaction-advection-diffusion (RAD) PDE models. We show that the differential algebra approach can always, in theory, be applied to linear RAD models. Moreover, despite the perceived complexity introduced by the addition of advection and diffusion terms, identifiability of spatial analogues of non-spatial models cannot decrease structural identifiability. Finally, we show that our approach can also be applied to a class of non-linear PDE models that are linear in the unobserved variables, and conclude by discussing future possibilities and computational cost of performing structural identifiability analysis on more general PDE models in mathematical biology.
Both two-valued and three-valued conditional logic (CL), defined by Guzm\'an and Squier (1990) and based on McCarthy's non-commutative connectives, axiomatise a short-circuit logic (SCL) that defines more identities than MSCL (Memorising SCL), which also has a two- and a three-valued variant. This follows from the fact that the definable connective that prescribes full left-sequential conjunction is commutative in CL. We show that in CL, the full left-sequential connectives and negation define Bochvar's three-valued strict logic. In two-valued CL, the full left-sequential connectives and negation define a commutative logic that is weaker than propositional logic because the absorption laws do not hold. Next, we show that the original, equational axiomatisation of CL is not independent and give several alternative, independent axiomatisations.
Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
We investigate the frequentist guarantees of the variational sparse Gaussian process regression model. In the theoretical analysis, we focus on the variational approach with spectral features as inducing variables. We derive guarantees and limitations for the frequentist coverage of the resulting variational credible sets. We also derive sufficient and necessary lower bounds for the number of inducing variables required to achieve minimax posterior contraction rates. The implications of these results are demonstrated for different choices of priors. In a numerical analysis we consider a wider range of inducing variable methods and observe similar phenomena beyond the scope of our theoretical findings.
The complexity class Quantum Statistical Zero-Knowledge ($\mathsf{QSZK}$) captures computational difficulties of the time-bounded quantum state testing problem with respect to the trace distance, known as the Quantum State Distinguishability Problem (QSDP) introduced by Watrous (FOCS 2002). However, QSDP is in $\mathsf{QSZK}$ merely within the constant polarizing regime, similar to its classical counterpart shown by Sahai and Vadhan (JACM 2003) due to the polarization lemma (error reduction for SDP). Recently, Berman, Degwekar, Rothblum, and Vasudevan (TCC 2019) extended the $\mathsf{SZK}$ containment for SDP beyond the polarizing regime via the time-bounded distribution testing problems with respect to the triangular discrimination and the Jensen-Shannon divergence. Our work introduces proper quantum analogs for these problems by defining quantum counterparts for triangular discrimination. We investigate whether the quantum analogs behave similarly to their classical counterparts and examine the limitations of existing approaches to polarization regarding quantum distances. These new $\mathsf{QSZK}$-complete problems improve $\mathsf{QSZK}$ containments for QSDP beyond the polarizing regime and establish a simple $\mathsf{QSZK}$-hardness for the quantum entropy difference problem (QEDP) defined by Ben-Aroya, Schwartz, and Ta-Shma (ToC 2010). Furthermore, we prove that QSDP with some exponentially small errors is in $\mathsf{PP}$, while the same problem without error is in $\mathsf{NQP}$.