Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.
In the analysis of cluster randomized trials, two typical features are that individuals within a cluster are correlated and that the total number of clusters can sometimes be limited. While model-robust treatment effect estimators have been recently developed, their asymptotic theory requires the number of clusters to approach infinity, and one often has to empirically assess the applicability of those methods in finite samples. To address this challenge, we propose a conformal causal inference framework that achieves the target coverage probability of treatment effects in finite samples without the need for asymptotic approximations. Meanwhile, we prove that this framework is compatible with arbitrary working models, including machine learning algorithms leveraging baseline covariates, possesses robustness against arbitrary misspecification of working models, and accommodates a variety of within-cluster correlations. Under this framework, we offer efficient algorithms to make inferences on treatment effects at both the cluster and individual levels, applicable to user-specified covariate subgroups and two types of test data. Finally, we demonstrate our methods via simulations and a real data application based on a cluster randomized trial for treating chronic pain.
In the last years, social media has gained an unprecedented amount of attention, playing a pivotal role in shaping the contemporary landscape of communication and connection. However, Coordinated Inhautentic Behaviour (CIB), defined as orchestrated efforts by entities to deceive or mislead users about their identity and intentions, has emerged as a tactic to exploit the online discourse. In this study, we quantify the efficacy of CIB tactics by defining a general framework for evaluating the influence of a subset of nodes in a directed tree. We design two algorithms that provide optimal and greedy post-hoc placement strategies that lead to maximising the configuration influence. We then consider cascades from information spreading on Twitter to compare the observed behaviour with our algorithms. The results show that, according to our model, coordinated accounts are quite inefficient in terms of their network influence, thus suggesting that they may play a less pivotal role than expected. Moreover, the causes of these poor results may be found in two separate aspects: a bad placement strategy and a scarcity of resources.
We analyze the prediction error of principal component regression (PCR) and prove high probability bounds for the corresponding squared risk conditional on the design. Our first main result shows that PCR performs comparably to the oracle method obtained by replacing empirical principal components by their population counterparts, provided that an effective rank condition holds. On the other hand, if the latter condition is violated, then empirical eigenvalues start to have a significant upward bias, resulting in a self-induced regularization of PCR. Our approach relies on the behavior of empirical eigenvalues, empirical eigenvectors and the excess risk of principal component analysis in high-dimensional regimes.
Survival analysis can sometimes involve individuals who will not experience the event of interest, forming what is known as the cured group. Identifying such individuals is not always possible beforehand, as they provide only right-censored data. Ignoring the presence of the cured group can introduce bias in the final model. This paper presents a method for estimating a semiparametric additive hazards model that accounts for the cured fraction. Unlike regression coefficients in a hazard ratio model, those in an additive hazard model measure hazard differences. The proposed method uses a primal-dual interior point algorithm to obtain constrained maximum penalized likelihood estimates of the model parameters, including the regression coefficients and the baseline hazard, subject to certain non-negativity constraints.
A term in a corpus is said to be ``bursty'' (or overdispersed) when its occurrences are concentrated in few out of many documents. In this paper, we propose Residual Inverse Collection Frequency (RICF), a statistical significance test inspired heuristic for quantifying term burstiness. The chi-squared test is, to our knowledge, the sole test of statistical significance among existing term burstiness measures. Chi-squared test term burstiness scores are computed from the collection frequency statistic (i.e., the proportion that a specified term constitutes in relation to all terms within a corpus). However, the document frequency of a term (i.e., the proportion of documents within a corpus in which a specific term occurs) is exploited by certain other widely used term burstiness measures. RICF addresses this shortcoming of the chi-squared test by virtue of its term burstiness scores systematically incorporating both the collection frequency and document frequency statistics. We evaluate the RICF measure on a domain-specific technical terminology extraction task using the GENIA Term corpus benchmark, which comprises 2,000 annotated biomedical article abstracts. RICF generally outperformed the chi-squared test in terms of precision at k score with percent improvements of 0.00% (P@10), 6.38% (P@50), 6.38% (P@100), 2.27% (P@500), 2.61% (P@1000), and 1.90% (P@5000). Furthermore, RICF performance was competitive with the performances of other well-established measures of term burstiness. Based on these findings, we consider our contributions in this paper as a promising starting point for future exploration in leveraging statistical significance testing in text analysis.
Accounting for exposure measurement errors has been recognized as a crucial problem in environmental epidemiology for over two decades. Bayesian hierarchical models offer a coherent probabilistic framework for evaluating associations between environmental exposures and health effects, which take into account exposure measurement errors introduced by uncertainty in the estimated exposure as well as spatial misalignment between the exposure and health outcome data. While two-stage Bayesian analyses are often regarded as a good alternative to fully Bayesian analyses when joint estimation is not feasible, there has been minimal research on how to properly propagate uncertainty from the first-stage exposure model to the second-stage health model, especially in the case of a large number of participant locations along with spatially correlated exposures. We propose a scalable two-stage Bayesian approach, called a sparse multivariate normal (sparse MVN) prior approach, based on the Vecchia approximation for assessing associations between exposure and health outcomes in environmental epidemiology. We compare its performance with existing approaches through simulation. Our sparse MVN prior approach shows comparable performance with the fully Bayesian approach, which is a gold standard but is impossible to implement in some cases. We investigate the association between source-specific exposures and pollutant (nitrogen dioxide (NO$_2$))-specific exposures and birth outcomes for 2012 in Harris County, Texas, using several approaches, including the newly developed method.
Over the past decade, characterizing the exact asymptotic risk of regularized estimators in high-dimensional regression has emerged as a popular line of work. This literature considers the proportional asymptotics framework, where the number of features and samples both diverge, at a rate proportional to each other. Substantial work in this area relies on Gaussianity assumptions on the observed covariates. Further, these studies often assume the design entries to be independent and identically distributed. Parallel research investigates the universality of these findings, revealing that results based on the i.i.d.~Gaussian assumption extend to a broad class of designs, such as i.i.d.~sub-Gaussians. However, universality results examining dependent covariates so far focused on correlation-based dependence or a highly structured form of dependence, as permitted by right rotationally invariant designs. In this paper, we break this barrier and study a dependence structure that in general falls outside the purview of these established classes. We seek to pin down the extent to which results based on i.i.d.~Gaussian assumptions persist. We identify a class of designs characterized by a block dependence structure that ensures the universality of i.i.d.~Gaussian-based results. We establish that the optimal values of the regularized empirical risk and the risk associated with convex regularized estimators, such as the Lasso and ridge, converge to the same limit under block dependent designs as they do for i.i.d.~Gaussian entry designs. Our dependence structure differs significantly from correlation-based dependence, and enables, for the first time, asymptotically exact risk characterization in prevalent nonparametric regression problems in high dimensions. Finally, we illustrate through experiments that this universality becomes evident quite early, even for relatively moderate sample sizes.
In many causal studies, outcomes are censored by death, in the sense that they are neither observed nor defined for units who die. In such studies, the focus is usually on the stratum of always survivors up to a single fixed time s. Building on a recent strand of the literature, we propose an extended framework for the analysis of longitudinal studies, where units can die at different time points, and the main endpoints are observed and well defined only up to the death time. We develop a Bayesian longitudinal principal stratification framework, where units are cross classified according to the longitudinal death status. Under this framework, the focus is on causal effects for the principal strata of units that would be alive up to a time point s irrespective of their treatment assignment, where these strata may vary as a function of s. We can get precious insights into the effects of treatment by inspecting the distribution of baseline characteristics within each longitudinal principal stratum, and by investigating the time trend of both principal stratum membership and survivor-average causal effects. We illustrate our approach for the analysis of a longitudinal observational study aimed to assess, under the assumption of strong ignorability of treatment assignment, the causal effects of a policy promoting start ups on firms survival and hiring policy, where firms hiring status is censored by death.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.