This article offers a new paradigm for analyzing the behavior of uncertain multivariable systems using a set of quantities we call \emph{inferential moments}. Marginalization is an uncertainty quantification process that averages conditional probabilities to quantify the \emph{expected value} of a probability of interest. Inferential moments are higher order conditional probability moments that describe how a distribution is expected to respond to new information. Of particular interest in this article is the \emph{inferential deviation}, which is the expected fluctuation of the probability of one variable in response to an inferential update of another. We find a power series expansion of the Mutual Information in terms of inferential moments, which implies that inferential moment logic may be useful for tasks typically performed with information theoretic tools. We explore this in two applications that analyze the inferential deviations of a Bayesian Network to improve situational awareness and decision-making. We implement a simple greedy algorithm for optimal sensor tasking using inferential deviations that generally outperforms a similar greedy Mutual Information algorithm in terms of predictive probabilistic error.
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings. Algorithmic decisions are used in assigning students to schools, users to advertisers, and applicants to job interviews. These decisions should heed the preferences of individuals, and simultaneously be fair with respect to their merits (synonymous with fit, future performance, or need). Merits conditioned on observable features are always \emph{uncertain}, a fact that is exacerbated by the widespread use of machine learning algorithms to infer merit from the observables. As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it. We design a linear programming framework to find fair utility-maximizing distributions over allocations, and we show that the linear program is robust to perturbations in the estimated parameters of the uncertain merit distributions, a key property in combining the approach with machine learning techniques.
We uncover how SGD interacts with batch normalization and can exhibit undesirable training dynamics such as divergence. More precisely, we study how Single Shuffle (SS) and Random Reshuffle (RR) -- two widely used variants of SGD -- interact surprisingly differently in the presence of batch normalization: RR leads to much more stable evolution of training loss than SS. As a concrete example, for regression using a linear network with batch normalization, we prove that SS and RR converge to distinct global optima that are "distorted" away from gradient descent. Thereafter, for classification we characterize conditions under which training divergence for SS and RR can, and cannot occur. We present explicit constructions to show how SS leads to distorted optima in regression and divergence for classification, whereas RR avoids both distortion and divergence. We validate our results by confirming them empirically in realistic settings, and conclude that the separation between SS and RR used with batch normalization is relevant in practice.
Empirical evidence demonstrates that citations received by scholarly publications follow a pattern of preferential attachment, resulting in a power-law distribution. Such asymmetry has sparked significant debate regarding the use of citations for research evaluation. However, a consensus has yet to be established concerning the historical trends in citation concentration. Are citations becoming more concentrated in a small number of articles? Or have recent geopolitical and technical changes in science led to more decentralized distributions? This ongoing debate stems from a lack of technical clarity in measuring inequality. Given the variations in citation practices across disciplines and over time, it is crucial to account for multiple factors that can influence the findings. This article explores how reference-based and citation-based approaches, uncited articles, citation inflation, the expansion of bibliometric databases, disciplinary differences, and self-citations affect the evolution of citation concentration. Our results indicate a decreasing trend in citation concentration, primarily driven by a decline in uncited articles, which, in turn, can be attributed to the growing significance of Asia and Europe. On the whole, our findings clarify current debates on citation concentration and show that, contrary to a widely-held belief, citations are increasingly scattered.
The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.
Mendelian randomization (MR) is an instrumental variable (IV) approach to infer causal relationships between exposures and outcomes with genome-wide association studies (GWAS) summary data. However, the multivariable inverse-variance weighting (IVW) approach, which serves as the foundation for most MR approaches, cannot yield unbiased causal effect estimates in the presence of many weak IVs. To address this problem, we proposed the MR using Bias-corrected Estimating Equation (MRBEE) that can infer unbiased causal relationships with many weak IVs and account for horizontal pleiotropy simultaneously. While the practical significance of MRBEE was demonstrated in our parallel work (Lorincz-Comi (2023)), this paper established the statistical theories of multivariable IVW and MRBEE with many weak IVs. First, we showed that the bias of the multivariable IVW estimate is caused by the error-in-variable bias, whose scale and direction are inflated and influenced by weak instrument bias and sample overlaps of exposures and outcome GWAS cohorts, respectively. Second, we investigated the asymptotic properties of multivariable IVW and MRBEE, showing that MRBEE outperforms multivariable IVW regarding unbiasedness of causal effect estimation and asymptotic validity of causal inference. Finally, we applied MRBEE to examine myopia and revealed that education and outdoor activity are causal to myopia whereas indoor activity is not.
Bayesian inference and the use of posterior or posterior predictive probabilities for decision making have become increasingly popular in clinical trials. The current approach toward Bayesian clinical trials is, however, a hybrid Bayesian-frequentist approach where the design and decision criteria are assessed with respect to frequentist operating characteristics such as power and type I error rate. These operating characteristics are commonly obtained via simulation studies. In this article we propose methodology to utilize large sample theory of the posterior distribution to define simple parametric models for the sampling distribution of the Bayesian test statistics, i.e., posterior tail probabilities. The parameters of these models are then estimated using a small number of simulation scenarios, thereby refining these models to capture the sampling distribution for small to moderate sample size. The proposed approach toward assessment of operating characteristics and sample size determination can be considered as simulation-assisted rather than simulation-based and significantly reduces the computational burden for design of Bayesian trials.
In this work we connect two notions: That of the nonparametric mode of a probability measure, defined by asymptotic small ball probabilities, and that of the Onsager-Machlup functional, a generalized density also defined via asymptotic small ball probabilities. We show that in a separable Hilbert space setting and under mild conditions on the likelihood, modes of a Bayesian posterior distribution based upon a Gaussian prior exist and agree with the minimizers of its Onsager-Machlup functional and thus also with weak posterior modes. We apply this result to inverse problems and derive conditions on the forward mapping under which this variational characterization of posterior modes holds. Our results show rigorously that in the limit case of infinite-dimensional data corrupted by additive Gaussian or Laplacian noise, nonparametric maximum a posteriori estimation is equivalent to Tikhonov-Phillips regularization. In comparison with the work of Dashti, Law, Stuart, and Voss (2013), the assumptions on the likelihood are relaxed so that they cover in particular the important case of white Gaussian process noise. We illustrate our results by applying them to a severely ill-posed linear problem with Laplacian noise, where we express the maximum a posteriori estimator analytically and study its rate of convergence in the small noise limit.
This paper presents an end-to-end framework for robust structure/control optimization of an industrial benchmark. When dealing with space structures, a reduction of the spacecraft mass is paramount to minimize the mission cost and maximize the propellant availability. However, a lighter design comes with a bigger structural flexibility and the resulting impact on control performance. Two optimization architectures (distributed and monolithic) are proposed in order to face this issue. In particular the Linear Fractional Transformation (LFT) framework is exploited to formally set the two optimization problems by including parametric uncertainties. Large sets of uncertainties have to be indeed taken into account in spacecraft control design due to the impossibility to completely validate structural models in micro-gravity conditions with on-ground experiments and to the evolution of spacecraft dynamics during the mission (structure degradation and fuel consumption). In particular the Two-Input Two-Output Port (TITOP) multi-body approach is used to build the flexible dynamics in a minimal LFT form. The two proposed optimization algorithms are detailed and their performance are compared on an ESA future exploration mission, the ENVISION benchmark. With both approaches, an important reduction of the mass is obtained by coping with the mission's control performance/stability requirements and a large set of uncertainties.
On small neighborhoods of the capacity-achieving input distributions, the decrease of the mutual information with the distance to the capacity-achieving input distributions is bounded below by a linear function of the square of the distance to the capacity-achieving input distributions for all channels with (possibly multiple) linear constraints and finite input sets using an identity due to Tops{\o}e and Pinsker's inequality. Counter examples demonstrating non-existence of such a quadratic bound are provided for the case of infinite many linear constraints and the case of infinite input sets. Using a Taylor series approximation, rather than Pinsker's inequality, the exact characterization of the slowest decrease of the mutual information with the distance to the capacity-achieving input distributions is determined on small neighborhoods of the capacity-achieving input distributions. Analogous results are established for classical-quantum channels whose output density operators are defined on a separable Hilbert spaces. Implications of these observations for the channel coding problem and applications of the proof technique to related problems are discussed.
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at //github.com/gitgiter/Graph-Adversarial-Learning.