Robust inference based on the minimization of statistical divergences has proved to be a useful alternative to classical techniques based on maximum likelihood and related methods. Basu et al. (1998) introduced the density power divergence (DPD) family as a measure of discrepancy between two probability density functions and used this family for robust estimation of the parameter for independent and identically distributed data. Ghosh et al. (2017) proposed a more general class of divergence measures, namely the S-divergence family and discussed its usefulness in robust parametric estimation through several asymptotic properties and some numerical illustrations. In this paper, we develop the results concerning the asymptotic breakdown point for the minimum S-divergence estimators (in particular the minimum DPD estimator) under general model setups. The primary result of this paper provides lower bounds to the asymptotic breakdown point of these estimators which are independent of the dimension of the data, in turn corroborating their usefulness in robust inference under high dimensional data.
Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.
Singularly perturbed problems present inherent difficulty due to the presence of a thin boundary layer in its solution. To overcome this difficulty, we propose using deep operator networks (DeepONets), a method previously shown to be effective in approximating nonlinear operators between infinite-dimensional Banach spaces. In this paper, we demonstrate for the first time the application of DeepONets to one-dimensional singularly perturbed problems, achieving promising results that suggest their potential as a robust tool for solving this class of problems. We consider the convergence rate of the approximation error incurred by the operator networks in approximating the solution operator, and examine the generalization gap and empirical risk, all of which are shown to converge uniformly with respect to the perturbation parameter. By utilizing Shishkin mesh points as locations of the loss function, we conduct several numerical experiments that provide further support for the effectiveness of operator networks in capturing the singular boundary layer behavior.
The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a direct update of the estimate of the inverse Hessian instead will be favored. This generalizes a trick introduced in [2] for the specific case of logistic regression, by directly updating the estimate of the inverse Hessian. Under mild assumptions such as local strong convexity at the optimum, we establish almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates. The unified framework considered in this paper covers the case of linear, logistic or softmax regressions to name a few. Numerical experiments on simulated data give the empirical evidence of the pertinence of the proposed methods, which outperform popular competitors particularly in case of bad initializa-tions.
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used to simulate from parameter distributions of interest, such as generalized linear model parameters. The "Metropolis step" is a keystone concept that underlies classical and modern MCMC methods and facilitates simple analysis of complex statistical models. Beyond Bayesian analysis, MCMC is useful for generating uncertainty intervals, even under the common scenario in causal inference in which the target parameter is not directly estimated by a single, fitted statistical model. We demonstrate, with a worked example, pseudo-code, and R code, the basic mechanics of the Metropolis algorithm. We use the Metropolis algorithm to estimate the odds ratio and risk difference contrasting the risk of childhood leukemia among those exposed to high versus low level magnetic fields. This approach can be used for inference from Bayesian and frequentist paradigms and, in small samples, offers advantages over large-sample methods like the bootstrap.
Partial orders are a natural model for the social hierarchies that may constrain "queue-like" rank-order data. However, the computational cost of counting the linear extensions of a general partial order on a ground set with more than a few tens of elements is prohibitive. Vertex-series-parallel partial orders (VSPs) are a subclass of partial orders which admit rapid counting and represent the sorts of relations we expect to see in a social hierarchy. However, no Bayesian analysis of VSPs has been given to date. We construct a marginally consistent family of priors over VSPs with a parameter controlling the prior distribution over VSP depth. The prior for VSPs is given in closed form. We extend an existing observation model for queue-like rank-order data to represent noise in our data and carry out Bayesian inference on "Royal Acta" data and Formula 1 race data. Model comparison shows our model is a better fit to the data than Plackett-Luce mixtures, Mallows mixtures, and "bucket order" models and competitive with more complex models fitting general partial orders.
While there exists several inferential methods for analyzing functional data in factorial designs, there is a lack of statistical tests that are valid (i) in general designs, (ii) under non-restrictive assumptions on the data generating process and (iii) allow for coherent post-hoc analyses. In particular, most existing methods assume Gaussianity or equal covariance functions across groups (homoscedasticity) and are only applicable for specific study designs that do not allow for evaluation of interactions. Moreover, all available strategies are only designed for testing global hypotheses and do not directly allow a more in-depth analysis of multiple local hypotheses. To address the first two problems (i)-(ii), we propose flexible integral-type test statistics that are applicable in general factorial designs under minimal assumptions on the data generating process. In particular, we neither postulate homoscedasticity nor Gaussianity. To approximate the statistics' null distribution, we adopt a resampling approach and validate it methodologically. Finally, we use our flexible testing framework to (iii) infer several local null hypotheses simultaneously. To allow for powerful data analysis, we thereby take the complex dependencies of the different local test statistics into account. In extensive simulations we confirm that the new methods are flexibly applicable. Two illustrate data analyses complete our study. The new testing procedures are implemented in the R package multiFANOVA, which will be available on CRAN soon.
The primary objective of this scholarly work is to develop two estimation procedures - maximum likelihood estimator (MLE) and method of trimmed moments (MTM) - for the mean and variance of lognormal insurance payment severity data sets affected by different loss control mechanism, for example, truncation (due to deductibles), censoring (due to policy limits), and scaling (due to coinsurance proportions), in insurance and financial industries. Maximum likelihood estimating equations for both payment-per-payment and payment-per-loss data sets are derived which can be solved readily by any existing iterative numerical methods. The asymptotic distributions of those estimators are established via Fisher information matrices. Further, with a goal of balancing efficiency and robustness and to remove point masses at certain data points, we develop a dynamic MTM estimation procedures for lognormal claim severity models for the above-mentioned transformed data scenarios. The asymptotic distributional properties and the comparison with the corresponding MLEs of those MTM estimators are established along with extensive simulation studies. Purely for illustrative purpose, numerical examples for 1500 US indemnity losses are provided which illustrate the practical performance of the established results in this paper.
The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection), a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.