Criminal networks arise from the unique attempt to balance a need of establishing frequent ties among affiliates to facilitate the coordination of illegal activities, with the necessity to sparsify the overall connectivity architecture to hide from law enforcement. This efficiency-security tradeoff is also combined with the creation of groups of redundant criminals that exhibit similar connectivity patterns, thus guaranteeing resilient network architectures. State-of-the-art models for such data are not designed to infer these unique structures. In contrast to such solutions we develop a computationally-tractable Bayesian zero-inflated Poisson stochastic block model (ZIP-SBM), which identifies groups of redundant criminals with similar connectivity patterns, and infers both overt and covert block interactions within and across such groups. This is accomplished by modeling weighted ties (corresponding to counts of interactions among pairs of criminals) via zero-inflated Poisson distributions with block-specific parameters that quantify complex patterns in the excess of zero ties in each block (security) relative to the distribution of the observed weighted ties within that block (efficiency). The performance of ZIP-SBM is illustrated in simulations and in a study of summits co-attendances in a complex Mafia organization, where we unveil efficiency-security structures adopted by the criminal organization that were hidden to previous analyses.
Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.
Basic general properties are considered for the Fisher-type information involving higher order derivatives. They are used to explore various properties of probability densities and to derive Stam-type inequalities.
Leveraging the large body of work devoted in recent years to describe redundancy and synergy in multivariate interactions among random variables, we propose a novel approach to quantify cooperative effects in feature importance, one of the most used techniques for explainable artificial intelligence. In particular, we propose an adaptive version of a well-known metric of feature importance, named Leave One Covariate Out (LOCO), to disentangle high-order effects involving a given input feature in regression problems. LOCO is the reduction of the prediction error when the feature under consideration is added to the set of all the features used for regression. Instead of calculating the LOCO using all the features at hand, as in its standard version, our method searches for the multiplet of features that maximize LOCO and for the one that minimize it. This provides a decomposition of the LOCO as the sum of a two-body component and higher-order components (redundant and synergistic), also highlighting the features that contribute to building these high-order effects alongside the driving feature. We report the application to proton/pion discrimination from simulated detector measures by GEANT.
We study the problem of constructing an estimator of the average treatment effect (ATE) with observational data. The celebrated doubly-robust, augmented-IPW (AIPW) estimator generally requires consistent estimation of both nuisance functions for standard root-n inference, and moreover that the product of the errors of the nuisances should shrink at a rate faster than $n^{-1/2}$. A recent strand of research has aimed to understand the extent to which the AIPW estimator can be improved upon (in a minimax sense). Under structural assumptions on the nuisance functions, the AIPW estimator is typically not minimax-optimal, and improvements can be made using higher-order influence functions (Robins et al, 2017). Conversely, without any assumptions on the nuisances beyond the mean-square-error rates at which they can be estimated, the rate achieved by the AIPW estimator is already optimal (Balakrishnan et al, 2023; Jin and Syrgkanis, 2024). We make three main contributions. First, we propose a new hybrid class of distributions that combine structural agnosticism regarding the nuisance function space with additional smoothness constraints. Second, we calculate minimax lower bounds for estimating the ATE in the new class, as well as in the pure structure-agnostic one. Third, we propose a new estimator of the ATE that enjoys doubly-robust asymptotic linearity; it can yield asymptotically valid Wald-type confidence intervals even when the propensity score or the outcome model is inconsistently estimated, or estimated at a slow rate. Under certain conditions, we show that its rate of convergence in the new class can be much faster than that achieved by the AIPW estimator and, in particular, matches the minimax lower bound rate, thereby establishing its optimality. Finally, we complement our theoretical findings with simulations.
Density dependence occurs at the individual level but is often evaluated at the population level, leading to difficulties or even controversies in detecting such a process. Bayesian individual-based models such as spatial capture-recapture (SCR) models provide opportunities to study density dependence at the individual level, but such an approach remains to be developed and evaluated. In this study, we developed a SCR model that links habitat use to apparent survival and recruitment through density dependent processes at the individual level. Using simulations, we found that the model can properly inform habitat use, but tends to underestimate the effect of density dependence on apparent survival and recruitment. The reason for such underestimations is likely due to the fact that SCR models have difficulties in identifying the locations of unobserved individuals while assuming they are uniformly distributed. How to accurately estimate the locations of unobserved individuals, and thus density dependence, remains a challenging topic in spatial statistics and statistical ecology.
Biological neural networks seem qualitatively superior (e.g. in learning, flexibility, robustness) to current artificial like Multi-Layer Perceptron (MLP) or Kolmogorov-Arnold Network (KAN). Simultaneously, in contrast to them: biological have fundamentally multidirectional signal propagation \cite{axon}, also of probability distributions e.g. for uncertainty estimation, and are believed not being able to use standard backpropagation training \cite{backprop}. There are proposed novel artificial neurons based on HCR (Hierarchical Correlation Reconstruction) allowing to remove the above low level differences: with neurons containing local joint distribution model (of its connections), representing joint density on normalized variables as just linear combination of $(f_\mathbf{j})$ orthonormal polynomials: $\rho(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$ and $B\subset \mathbb{N}^d$ some chosen basis. By various index summations of such $(a_\mathbf{j})_{\mathbf{j}\in B}$ tensor as neuron parameters, we get simple formulas for e.g. conditional expected values for propagation in any direction, like $E[x|y,z]$, $E[y|x]$, which degenerate to KAN-like parametrization if restricting to pairwise dependencies. Such HCR network can also propagate probability distributions (also joint) like $\rho(y,z|x)$. It also allows for additional training approaches, like direct $(a_\mathbf{j})$ estimation, through tensor decomposition, or more biologically plausible information bottleneck training: layers directly influencing only neighbors, optimizing content to maximize information about the next layer, and minimizing about the previous to remove noise, extract crucial information.
The goal of uplift modeling is to recommend actions that optimize specific outcomes by determining which entities should receive treatment. One common approach involves two steps: first, an inference step that estimates conditional average treatment effects (CATEs), and second, an optimization step that ranks entities based on their CATE values and assigns treatment to the top k within a given budget. While uplift modeling typically focuses on binary treatments, many real-world applications are characterized by continuous-valued treatments, i.e., a treatment dose. This paper presents a predict-then-optimize framework to allow for continuous treatments in uplift modeling. First, in the inference step, conditional average dose responses (CADRs) are estimated from data using causal machine learning techniques. Second, in the optimization step, we frame the assignment task of continuous treatments as a dose-allocation problem and solve it using integer linear programming (ILP). This approach allows decision-makers to efficiently and effectively allocate treatment doses while balancing resource availability, with the possibility of adding extra constraints like fairness considerations or adapting the objective function to take into account instance-dependent costs and benefits to maximize utility. The experiments compare several CADR estimators and illustrate the trade-offs between policy value and fairness, as well as the impact of an adapted objective function. This showcases the framework's advantages and flexibility across diverse applications in healthcare, lending, and human resource management. All code is available on github.com/SimonDeVos/UMCT.
This manuscript describes the notions of blocker and interdiction applied to well-known optimization problems. The main interest of these two concepts is the capability to analyze the existence of a combinatorial structure after some modifications. We focus on graph modification, like removing vertices or links in a network. In the interdiction version, we have a budget for modification to reduce as much as possible the size of a given combinatorial structure. Whereas, for the blocker version, we minimize the number of modifications such that the network does not contain a given combinatorial structure. Blocker and interdiction problems have some similarities and can be applied to well-known optimization problems. We consider matching, connectivity, shortest path, max flow, and clique problems. For these problems, we analyze either the blocker version or the interdiction one. Applying the concept of blocker or interdiction to well-known optimization problems can change their complexities. Some optimization problems become harder when one of these two notions is applied. For this reason, we propose some complexity analysis to show when an optimization problem, or the associated decision problem, becomes harder. Another fundamental aspect developed in the manuscript is the use of exact methods to tackle these optimization problems. The main way to solve these problems is to use integer linear programming to model them. An interesting aspect of integer linear programming is the possibility to analyze theoretically the strength of these models, using cutting planes. For most of the problems studied in this manuscript, a polyhedral analysis is performed to prove the strength of inequalities or describe new families of inequalities. The exact algorithms proposed are based on Branch-and-Cut or Branch-and-Price algorithm, where dedicated separation and pricing algorithms are proposed.
Extreme events over large spatial domains may exhibit highly heterogeneous tail dependence characteristics, yet most existing spatial extremes models yield only one dependence class over the entire spatial domain. To accurately characterize "data-level dependence'' in analysis of extreme events, we propose a mixture model that achieves flexible dependence properties and allows high-dimensional inference for extremes of spatial processes. We modify the popular random scale construction that multiplies a Gaussian random field by a single radial variable; we allow the radial variable to vary smoothly across space and add non-stationarity to the Gaussian process. As the level of extremeness increases, this single model exhibits both asymptotic independence at long ranges and either asymptotic dependence or independence at short ranges. We make joint inference on the dependence model and a marginal model using a copula approach within a Bayesian hierarchical model. Three different simulation scenarios show close to nominal frequentist coverage rates. Lastly, we apply the model to a dataset of extreme summertime precipitation over the central United States. We find that the joint tail of precipitation exhibits non-stationary dependence structure that cannot be captured by limiting extreme value models or current state-of-the-art sub-asymptotic models.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.