Treatment effects on asymmetric and heavy tailed distributions are better reflected at extreme tails rather than at averages or intermediate quantiles. In such distributions, standard methods for estimating quantile treatment effects can provide misleading inference due to the high variability of the estimators at the extremes. In this work, we propose a novel method which incorporates a heavy tailed component in the outcome distribution to estimate the extreme tails and simultaneously employs quantile regression to model the remainder of the distribution. The threshold between the bulk of the distribution and the extreme tails is estimated by utilising a state of the art technique. Simulation results show the superiority of the proposed method over existing estimators for quantile causal effects at extremes in the case of heavy tailed distributions. The method is applied to analyse a real dataset on the London transport network. In this application, the methodology proposed can assist in effective decision making to improve network performance, where causal inference in the extremes for heavy tailed distributions is often a key aim.
Randomized controlled trials (RCTs) are considered as the gold standard for testing causal hypotheses in the clinical domain. However, the investigation of prognostic variables of patient outcome in a hypothesized cause-effect route is not feasible using standard statistical methods. Here, we propose a new automated causal inference method (AutoCI) built upon the invariant causal prediction (ICP) framework for the causal re-interpretation of clinical trial data. Compared to existing methods, we show that the proposed AutoCI allows to efficiently determine the causal variables with a clear differentiation on two real-world RCTs of endometrial cancer patients with mature outcome and extensive clinicopathological and molecular data. This is achieved via suppressing the causal probability of non-causal variables by a wide margin. In ablation studies, we further demonstrate that the assignment of causal probabilities by AutoCI remain consistent in the presence of confounders. In conclusion, these results confirm the robustness and feasibility of AutoCI for future applications in real-world clinical analysis.
The Wiener-Hopf equations are a Toeplitz system of linear equations that naturally arise in several applications in time series. These include the update and prediction step of the stationary Kalman filter equations and the prediction of bivariate time series. The celebrated Wiener-Hopf technique is usually used for solving these equations and is based on a comparison of coefficients in a Fourier series expansion. However, a statistical interpretation of both the method and solution is opaque. The purpose of this note is to revisit the (discrete) Wiener-Hopf equations and obtain an alternative solution that is more aligned with classical techniques in time series analysis. Specifically, we propose a solution to the Wiener-Hopf equations that combines linear prediction with deconvolution. The Wiener-Hopf solution requires the spectral factorization of the underlying spectral density function. For ease of evaluation it is often assumed that the spectral density is rational. This allows one to obtain a computationally tractable solution. However, this leads to an approximation error when the underlying spectral density is not a rational function. We use the proposed solution with Baxter's inequality to derive an error bound for the rational spectral density approximation.
This paper introduces a new Bayesian changepoint approach called the decoupled approach that separates the process of modeling and changepoint analysis. The approach utilizes a Bayesian dynamic linear model (DLM) for the modeling step and a weighted penalized likelihood estimator on the posterior of the Bayesian DLM to identify changepoints. A Bayesian DLM, with shrinkage priors, can provide smooth estimates of the underlying trend in presence of complex noise components; however, the inability to shrink exactly to zero make changepoint analysis difficult. Penalized likelihood estimators can be effective in estimating location of changepoints; however, they require a relatively smooth estimate of the data. The decoupled approach combines the flexibility of the Bayesian DLM along with the hard thresholding property of penalized likelihood estimator to extend application of changepoint analysis. The approach provides a robust framework that allows for identification of changepoints in highly complex Bayesian models. The approach can identify changes in mean, higher order trends and regression coefficients. We illustrate the approach's flexibility and robustness by comparing against several alternative methods in a wide range of simulations and two real world examples.
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators which take the form of U-process-like dyadic empirical processes. We provide uniform point estimation and distributional results for the dyadic kernel density estimator, giving valid and feasible procedures for robust uniform inference. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized $t$-process. A consistent variance estimator is introduced in order to obtain analogous results for the Studentized $t$-process, enabling the construction of provably valid and feasible uniform confidence bands for the unknown density function. A crucial feature of U-process-like dyadic empirical processes is that they may be "degenerate" at some or possibly all points in the support of the data, a property making our uniform analysis somewhat delicate. Nonetheless we show formally that our proposed methods for uniform inference remain robust to the potential presence of such unknown degenerate points. For the purpose of implementation, we discuss uniform inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias-correction methods. We illustrate the empirical finite-sample performance of our robust inference methods in a simulation study. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.
Data and Science has stood out in the generation of results, whether in the projects of the scientific domain or business domain. CERN Project, Scientific Institutes, companies like Walmart, Google, Apple, among others, need data to present their results and make predictions in the competitive data world. Data and Science are words that together culminated in a globally recognized term called Data Science. Data Science is in its initial phase, possibly being part of formal sciences and also being presented as part of applied sciences, capable of generating value and supporting decision making. Data Science considers science and, consequently, the scientific method to promote decision making through data intelligence. In many cases, the application of the method (or part of it) is considered in Data Science projects in scientific domain (social sciences, bioinformatics, geospatial projects) or business domain (finance, logistic, retail), among others. In this sense, this article addresses the perspectives of Data Science as a multidisciplinary area, considering science and the scientific method, and its formal structure which integrate Statistics, Computer Science, and Business Science, also taking into account Artificial Intelligence, emphasizing Machine Learning, among others. The article also deals with the perspective of applied Data Science, since Data Science is used for generating value through scientific and business projects. Data Science persona is also discussed in the article, concerning the education of Data Science professionals and its corresponding profiles, since its projection changes the field of data in the world.
The Monte Carlo simulation (MCS) is a statistical methodology used in a large number of applications. It uses repeated random sampling to solve problems with a probability interpretation to obtain high-quality numerical results. The MCS is simple and easy to develop, implement, and apply. However, its computational cost and total runtime can be quite high as it requires many samples to obtain an accurate approximation with low variance. In this paper, a novel MCS, called the self-adaptive BAT-MCS, based on the binary-adaption-tree algorithm (BAT) and our proposed self-adaptive simulation-number algorithm is proposed to simply and effectively reduce the run time and variance of the MCS. The proposed self-adaptive BAT-MCS was applied to a simple benchmark problem to demonstrate its application in network reliability. The statistical characteristics, including the expectation, variance, and simulation number, and the time complexity of the proposed self-adaptive BAT-MCS are discussed. Furthermore, its performance is compared to that of the traditional MCS extensively on a large-scale problem.
When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This paper answers this question by offering a new asymptotic theory about inference with missing data that is more general than existing theories. By using more powerful tools from real analysis and probability theory than those used in previous research, it proves that as the sample size increases and the extent of missingness decreases, the mean-loglikelihood function generated by partial data and that ignores the missingness mechanism will almost surely converge uniformly to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences from partial data, such as posterior modes, uncertainty estimates, confidence intervals, likelihood ratios, test statistics, and indeed, all quantities or features derived from the partial-data loglikelihood function, will be consistently estimated. They will approximate their complete-data analogues. This adds to previous research which has only proved the consistency and asymptotic normality of the posterior mode, and developed separate theories for Direct-Likelihood, Bayesian, and Frequentist inference. Practical implications of this result are discussed, and the theory is verified using a previous study of International Human Rights Law.
Process mining bridges the gap between process management and data science by discovering process models using event logs derived from real-world data. Besides mandatory event attributes, additional attributes can be part of an event representing domain data, such as human resources and costs. Data-enhanced process models provide a visualization of domain data associated to process activities directly in the process model, allowing to monitor the actual values of domain data in the form of event attribute aggregations. However, event logs can have so many attributes that it is difficult to decide, which one is of interest to observe throughout the process. This paper introduces three mechanisms to support domain data selection, allowing process analysts and domain experts to progressively reach their information of interest. We applied the proposed technique on the MIMIC-IV real-world data set on hospitalizations in the US.
The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.
Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.