Univariate Weibull distribution is a well-known lifetime distribution and has been widely used in reliability and survival analysis. In this paper, we introduce a new family of bivariate generalized Weibull (BGW) distributions, whose univariate marginals are exponentiated Weibull distribution. Different statistical quantiles like marginals, conditional distribution, conditional expectation, product moments, correlation and a measure component reliability are derived. Various measures of dependence and statistical properties along with ageing properties are examined. Further, the copula associated with BGW distribution and its various important properties are also considered. The methods of maximum likelihood and Bayesian estimation are employed to estimate unknown parameters of the model. A Monte Carlo simulation and real data study are carried out to demonstrate the performance of the estimators and results have proven the effectiveness of the distribution in real-life situations
Working with so-called linkages allows to define a copula-based, $[0,1]$-valued multivariate dependence measure $\zeta^1(\boldsymbol{X},Y)$ quantifying the scale-invariant extent of dependence of a random variable $Y$ on a $d$-dimensional random vector $\boldsymbol{X}=(X_1,\ldots,X_d)$ which exhibits various good and natural properties. In particular, $\zeta^1(\boldsymbol{X},Y)=0$ if and only if $\boldsymbol{X}$ and $Y$ are independent, $\zeta^1(\boldsymbol{X},Y)$ is maximal exclusively if $Y$ is a function of $\boldsymbol{X}$, and ignoring one or several coordinates of $\boldsymbol{X}$ can not increase the resulting dependence value. After introducing and analyzing the metric $D_1$ underlying the construction of the dependence measure and deriving examples showing how much information can be lost by only considering all pairwise dependence values $\zeta^1(X_1,Y),\ldots,\zeta^1(X_d,Y)$ we derive a so-called checkerboard estimator for $\zeta^1(\boldsymbol{X},Y)$ and show that it is strongly consistent in full generality, i.e., without any smoothness restrictions on the underlying copula. Some simulations illustrating the small sample performance of the estimator complement the established theoretical results.
We obtain explicit $p$-Wasserstein distance error bounds between the distribution of the multi-parameter MLE and the multivariate normal distribution. Our general bounds are given for possibly high-dimensional, independent and identically distributed random vectors. Our general bounds are of the optimal $\mathcal{O}(n^{-1/2})$ order. Explicit numerical constants are given when $p\in(1,2]$, and in the case $p>2$ the bounds are explicit up to a constant factor that only depends on $p$. We apply our general bounds to derive Wasserstein distance error bounds for the multivariate normal approximation of the MLE in several settings; these being single-parameter exponential families, the normal distribution under canonical parametrisation, and the multivariate normal distribution under non-canonical parametrisation. In addition, we provide upper bounds with respect to the bounded Wasserstein distance when the MLE is implicitly defined.
A recent UK Biobank study clustered 156 parameterised models associating risk factors with common diseases, to identify shared causes of disease. Parametric models are often more familiar and interpretable than clustered data, can build-in prior knowledge, adjust for known confounders, and use marginalisation to emphasise parameters of interest. Estimates include a Maximum Likelihood Estimate (MLE) that is (approximately) normally distributed, and its covariance. Clustering models rarely consider the covariances of data points, that are usually unavailable. Here a clustering model is formulated that accounts for covariances of the data, and assumes that all MLEs in a cluster are the same. The log-likelihood is exactly calculated in terms of the fitted parameters, with the unknown cluster means removed by marginalisation. The procedure is equivalent to calculating the Bayesian Information Criterion (BIC) without approximation, and can be used to assess the optimum number of clusters for a given clustering algorithm. The log-likelihood has terms to penalise poor fits and model complexity, and can be maximised to determine the number and composition of clusters. Results can be similar to using the ad-hoc "elbow criterion", but are less subjective. The model is also formulated as a Dirichlet process mixture model (DPMM). The overall approach is equivalent to a multi-layer algorithm that characterises features through the normally distributed MLEs of a fitted model, and then clusters the normal distributions. Examples include simulated data, and clustering of diseases in UK Biobank data using estimated associations with risk factors. The results can be applied directly to measured data and their estimated covariances, to the output from clustering models, or the DPMM implementation can be used to cluster fitted models directly.
Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs (Gro-M$^3$s) for multivariate categorical data, which improve parsimony and interpretability. In Gro-M$^3$s, observed variables are partitioned into groups such that the latent membership is constant across variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we propose transparent identifiability conditions for both the unknown grouping structure and the associated model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet Gro-M$^3$s to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through an application to a functional disability dataset.
Extreme-value copulas arise as the limiting dependence structure of component-wise maxima. Defined in terms of a functional parameter, they are one of the most widespread copula families due to its flexibility and ability to capture asymmetry. Despite this, meeting the complex analytical properties of this parameter in an unconstrained setting still remains a challenge, restricting most uses to either models with very few parameters or non-parametric models. On this paper we focus on the bivariate case and propose a novel approach for estimating this functional parameter in a semiparametric manner. Our procedure relies on a series of basic transformations starting from a zero-integral spline. Spline coordinates are fit through maximum likelihood estimation, leveraging gradient optimization, without imposing further constraints. We conduct several experiments on both simulated and real data. Specifically, we test our method on scarce data gathered by the gravitational wave detection LIGO and Virgo collaborations.
We propose a framework for Bayesian Likelihood-Free Inference (LFI) based on Generalized Bayesian Inference. To define the generalized posterior, we use Scoring Rules (SRs), which evaluate probabilistic models given an observation. As in LFI we can sample from the model (but not evaluate the likelihood), we employ SRs with easy empirical estimators. Our framework includes novel approaches and popular LFI techniques (such as Bayesian Synthetic Likelihood), which benefit from the generalized Bayesian interpretation. Our method enjoys posterior consistency in a well-specified setting when a strictly-proper SR is used (i.e., one whose expectation is uniquely minimized when the model corresponds to the data generating process). Further, we prove a finite sample generalization bound and outlier robustness for the Kernel and Energy Score posteriors, and propose a strategy suitable for the LFI setup for tuning the learning rate in the generalized posterior. We run simulations studies with pseudo-marginal Markov Chain Monte Carlo (MCMC) and compare with related approaches, which we show do not enjoy robustness and consistency.
In recent years, conditional copulas, that allow dependence between variables to vary according to the values of one or more covariates, have attracted increasing attention. In high dimension, vine copulas offer greater flexibility compared to multivariate copulas, since they are constructed using bivariate copulas as building blocks. In this paper we present a novel inferential approach for multivariate distributions, which combines the flexibility of vine constructions with the advantages of Bayesian nonparametrics, not requiring the specification of parametric families for each pair copula. Expressing multivariate copulas using vines allows us to easily account for covariate specifications driving the dependence between response variables. More precisely, we specify the vine copula density as an infinite mixture of Gaussian copulas, defining a Dirichlet process (DP) prior on the mixing measure, and we perform posterior inference via Markov chain Monte Carlo (MCMC) sampling. Our approach is successful as for clustering as well as for density estimation. We carry out intensive simulation studies and apply the proposed approach to investigate the impact of natural disasters on financial development. Our results show that the methodology is able to capture the heterogeneity in the dataset and to reveal different behaviours of different country clusters in relation to natural disasters.
Thanks to technological advances leading to near-continuous time observations, emerging multivariate point process data offer new opportunities for causal discovery. However, a key obstacle in achieving this goal is that many relevant processes may not be observed in practice. Naive estimation approaches that ignore these hidden variables can generate misleading results because of the unadjusted confounding. To plug this gap, we propose a deconfounding procedure to estimate high-dimensional point process networks with only a subset of the nodes being observed. Our method allows flexible connections between the observed and unobserved processes. It also allows the number of unobserved processes to be unknown and potentially larger than the number of observed nodes. Theoretical analyses and numerical studies highlight the advantages of the proposed method in identifying causal interactions among the observed processes.
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.
Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.