We consider a stationary Markov process that models certain queues with a bulk service of fixed number m of admitted customers. We find an integral expression of its transition probability function in terms of certain multi-orthogonal polynomials. We study the convergence of the appropriate scheme of simultaneous quadrature rules to design an algorithm for computing this integral expression.
Over-the-air computation has the potential to increase the communication-efficiency of data-dependent distributed wireless systems, but is vulnerable to eavesdropping. We consider over-the-air computation over block-fading additive white Gaussian noise channels in the presence of a passive eavesdropper. The goal is to design a secure over-the-air computation scheme. We propose a scheme that achieves MSE-security against the eavesdropper by employing zero-forced artificial noise, while keeping the distortion at the legitimate receiver small. In contrast to former approaches, the security does not depend on external helper nodes to jam the eavesdropper's receive signal. We thoroughly design the system parameters of the scheme, propose an artificial noise design that harnesses unused transmit power for security, and give an explicit construction rule. Our design approach is applicable both if the eavesdropper's channel coefficients are known and if they are unknown in the signal design. Simulations demonstrate the performance, and show that our noise design outperforms other methods.
This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.
Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$\mu_{\pi}$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $\|e^{r}\|_{q \rightarrow 2} \propto \exp\big(\mu_{\pi}(r) +\frac{2q}{q-2}\big) $ and $P$ is hyperbounded with norm control $\|P\|_{2 \rightarrow q }< e^{\frac{1}{2}[\frac{1}{2}-\frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. \emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of \emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.
The optimal error estimate that depending only on the polynomial degree of $ \varepsilon^{-1}$ is established for the temporal semi-discrete scheme of the Cahn-Hilliard equation, which is based on the scalar auxiliary variable (SAV) formulation. The key to our analysis is to convert the structure of the SAV time-stepping scheme back to a form compatible with the original format of the Cahn-Hilliard equation, which makes it feasible to use spectral estimates to handle the nonlinear term. Based on the transformation of the SAV numerical scheme, the optimal error estimate for the temporal semi-discrete scheme which depends only on the low polynomial order of $\varepsilon^{-1}$ instead of the exponential order, is derived by using mathematical induction, spectral arguments, and the superconvergence properties of some nonlinear terms. Numerical examples are provided to illustrate the discrete energy decay property and validate our theoretical convergence analysis.
Failure is common in clinical trials since the successful failures presented in negative results always indicate the ways that should not be taken. In this paper, we proposed an automated approach to extracting positive and negative clinical research results by introducing a PICOE (Population, Intervention, Comparation, Outcome, and Effect) framework to represent randomized controlled trials (RCT) reports, where E indicates the effect between a specific I and O. We developed a pipeline to extract and assign the corresponding statistical effect to a specific I-O pair from natural language RCT reports. The extraction models achieved a high degree of accuracy for ICO and E descriptive words extraction through two rounds of training. By defining a threshold of p-value, we find in all Covid-19 related intervention-outcomes pairs with statistical tests, negative results account for nearly 40%. We believe that this observation is noteworthy since they are extracted from the published literature, in which there is an inherent risk of reporting bias, preferring to report positive results rather than negative results. We provided a tool to systematically understand the current level of clinical evidence by distinguishing negative results from the positive results.
Conformal prediction constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features. It has a coverage guarantee at any nominal level without additional assumptions on their distribution. Its computation deplorably requires a refitting procedure for all replacement candidates of the target response. In regression settings, this corresponds to an infinite number of model fits. Apart from relatively simple estimators that can be written as pieces of linear function of the response, efficiently computing such sets is difficult, and is still considered as an open problem. We exploit the fact that, \emph{often}, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding algorithms. We investigate how this approach can overcome many limitations of formerly used strategies; we discuss its complexity and drawbacks.
We address the problem of integrating data from multiple observational and interventional studies to eventually compute counterfactuals in structural causal models. We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm from the case of a single study to that of multiple ones. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. On this basis, it delivers interval approximations to counterfactual results, which collapse to points in the identifiable case. The algorithm is very general, it works on semi-Markovian models with discrete variables and can compute any counterfactual. Moreover, it automatically determines if a problem is feasible (the parameter region being nonempty), which is a necessary step not to yield incorrect results. Systematic numerical experiments show the effectiveness and accuracy of the algorithm, while hinting at the benefits of integrating heterogeneous data to get informative bounds in case of unidentifiability.
Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a follow up sample allows researchers to conduct hypothesis tests for MNAR, which are not possible when using only the original incomplete data. Investigating how properties of these tests are affected by the follow up sample design is little explored in the literature. Our results provide comprehensive insight into the properties of one such test, based on the commonly used selection model framework. We determine conditions for recovery samples that allow the test to be applied appropriately and effectively, i.e. with known Type I error rates and optimized with respect to power. We thus provide an integrated framework for testing for the presence of MNAR and designing follow up samples in an efficient cost-effective way. The performance of our methodology is evaluated through simulation studies as well as on a real data sample.
The efficacy of numerical methods like integral estimates via Gaussian quadrature formulas depends on the localization of the zeros of the associated family of orthogonal polynomials. In this regard, following the renewed interest in quadrature formulas on the unit circle, and $R_{II}$-type polynomials, which include the complementary Romanovski-Routh polynomials, in this work we present a collection of properties of their zeros. Our results include extreme bounds, convexity, and density, alongside the connection of such polynomials to classical orthogonal polynomials via asymptotic formulas.
We develop a new second-order unstaggered path-conservative central-upwind (PCCU) scheme for ideal and shallow water magnetohydrodynamics (MHD) equations. The new scheme possesses several important properties: it locally preserves the divergence-free constraint, it does not rely on any (approximate) Riemann problem solver, and it robustly produces high-resolution and non-oscillatory results. The derivation of the scheme is based on the Godunov-Powell nonconservative modifications of the studied MHD systems. The local divergence-free property is enforced by augmenting the modified systems with the evolution equations for the corresponding derivatives of the magnetic field components. These derivatives are then used to design a special piecewise linear reconstruction of the magnetic field, which guarantees a non-oscillatory nature of the resulting scheme. In addition, the proposed PCCU discretization accounts for the jump of the nonconservative product terms across cell interfaces, thereby ensuring stability. We test the proposed PCCU scheme on several benchmarks for both ideal and shallow water MHD systems. The obtained numerical results illustrate the performance of the new scheme, its robustness, and its ability not only to achieve high resolution, but also preserve the positivity of computed quantities such as density, pressure, and water depth.