Statistical power is a measure of the replicability of a categorical hypothesis test. Formally, it is the probability of detecting an effect, if there is a true effect present in the population. Hence, optimizing statistical power as a function of some parameters of a hypothesis test is desirable. However, for most hypothesis tests, the explicit functional form of statistical power for individual model parameters is unknown; but calculating power for a given set of values of those parameters is possible using simulated experiments. These simulated experiments are usually computationally expensive. Hence, developing the entire statistical power manifold using simulations can be very time-consuming. We propose a novel genetic algorithm-based framework for learning statistical power manifolds. For a multiple linear regression $F$-test, we show that the proposed algorithm/framework learns the statistical power manifold much faster as compared to a brute-force approach as the number of queries to the power oracle is significantly reduced. We also show that the quality of learning the manifold improves as the number of iterations increases for the genetic algorithm. Such tools are useful for evaluating statistical power trade-offs when researchers have little information regarding a priori best guesses of primary effect sizes of interest or how sampling variability in non-primary effects impacts power for primary ones.
$1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape \textsc{Gril} for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules.
Using administrative patient-care data such as Electronic Health Records and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect their parameter estimates of interest. We review four easy-to-implement weighting approaches to reduce selection bias and explain through a simulation study when they can rescue us in practice with analysis of real world data. We provide annotated R codes to implement these methods.
Hierarchical Federated Learning (HFL) is a distributed machine learning paradigm tailored for multi-tiered computation architectures, which supports massive access of devices' models simultaneously. To enable efficient HFL, it is crucial to design suitable incentive mechanisms to ensure that devices actively participate in local training. However, there are few studies on incentive mechanism design for HFL. In this paper, we design two-level incentive mechanisms for the HFL with a two-tiered computing structure to encourage the participation of entities in each tier in the HFL training. In the lower-level game, we propose a coalition formation game to joint optimize the edge association and bandwidth allocation problem, and obtain efficient coalition partitions by the proposed preference rule, which can be proven to be stable by exact potential game. In the upper-level game, we design the Stackelberg game algorithm, which not only determines the optimal number of edge aggregations for edge servers to maximize their utility, but also optimize the unit reward provided for the edge aggregation performance to ensure the interests of cloud servers. Furthermore, numerical results indicate that the proposed algorithms can achieve better performance than the benchmark schemes.
With the rising complexity of numerous novel applications that serve our modern society comes the strong need to design efficient computing platforms. Designing efficient hardware is, however, a complex multi-objective problem that deals with multiple parameters and their interactions. Given that there are a large number of parameters and objectives involved in hardware design, synthesizing all possible combinations is not a feasible method to find the optimal solution. One promising approach to tackle this problem is statistical modeling of a desired hardware performance. Here, we propose a model-based active learning approach to solve this problem. Our proposed method uses Bayesian models to characterize various aspects of hardware performance. We also use transfer learning and Gaussian regression bootstrapping techniques in conjunction with active learning to create more accurate models. Our proposed statistical modeling method provides hardware models that are sufficiently accurate to perform design space exploration as well as performance prediction simultaneously. We use our proposed method to perform design space exploration and performance prediction for various hardware setups, such as micro-architecture design and OpenCL kernels for FPGA targets. Our experiments show that the number of samples required to create performance models significantly reduces while maintaining the predictive power of our proposed statistical models. For instance, in our performance prediction setting, the proposed method needs 65% fewer samples to create the model, and in the design space exploration setting, our proposed method can find the best parameter settings by exploring less than 50 samples.
To estimate causal effects, analysts performing observational studies in health settings utilize several strategies to mitigate bias due to confounding by indication. There are two broad classes of approaches for these purposes: use of confounders and instrumental variables (IVs). Because such approaches are largely characterized by untestable assumptions, analysts must operate under an indefinite paradigm that these methods will work imperfectly. In this tutorial, we formalize a set of general principles and heuristics for estimating causal effects in the two approaches when the assumptions are potentially violated. This crucially requires reframing the process of observational studies as hypothesizing potential scenarios where the estimates from one approach are less inconsistent than the other. While most of our discussion of methodology centers around the linear setting, we touch upon complexities in non-linear settings and flexible procedures such as target minimum loss-based estimation (TMLE) and double machine learning (DML). To demonstrate the application of our principles, we investigate the use of donepezil off-label for mild cognitive impairment (MCI). We compare and contrast results from confounder and IV methods, traditional and flexible, within our analysis and to a similar observational study and clinical trial.
This paper reconsiders end-to-end learning approaches to the Optimal Power Flow (OPF). Existing methods, which learn the input/output mapping of the OPF, suffer from scalability issues due to the high dimensionality of the output space. This paper first shows that the space of optimal solutions can be significantly compressed using principal component analysis (PCA). It then proposes Compact Learning, a new method that learns in a subspace of the principal components before translating the vectors into the original output space. This compression reduces the number of trainable parameters substantially, improving scalability and effectiveness. Compact Learning is evaluated on a variety of test cases from the PGLib with up to 30,000 buses. The paper also shows that the output of Compact Learning can be used to warm-start an exact AC solver to restore feasibility, while bringing significant speed-ups.
The dynamic nature of resource allocation and runtime conditions on Cloud can result in high variability in a job's runtime across multiple iterations, leading to a poor experience. Identifying the sources of such variation and being able to predict and adjust for them is crucial to cloud service providers to design reliable data processing pipelines, provision and allocate resources, adjust pricing services, meet SLOs and debug performance hazards. In this paper, we analyze the runtime variation of millions of production SCOPE jobs on Cosmos, an exabyte-scale internal analytics platform at Microsoft. We propose an innovative 2-step approach to predict job runtime distribution by characterizing typical distribution shapes combined with a classification model with an average accuracy of >96%, out-performing traditional regression models and better capturing long tails. We examine factors such as job plan characteristics and inputs, resource allocation, physical cluster heterogeneity and utilization, and scheduling policies. To the best of our knowledge, this is the first study on predicting categories of runtime distributions for enterprise analytics workloads at scale. Furthermore, we examine how our methods can be used to analyze what-if scenarios, focusing on the impact of resource allocation, scheduling, and physical cluster provisioning decisions on a job's runtime consistency and predictability.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Deep learning models on graphs have achieved remarkable performance in various graph analysis tasks, e.g., node classification, link prediction and graph clustering. However, they expose uncertainty and unreliability against the well-designed inputs, i.e., adversarial examples. Accordingly, various studies have emerged for both attack and defense addressed in different graph analysis tasks, leading to the arms race in graph adversarial learning. For instance, the attacker has poisoning and evasion attack, and the defense group correspondingly has preprocessing- and adversarial- based methods. Despite the booming works, there still lacks a unified problem definition and a comprehensive review. To bridge this gap, we investigate and summarize the existing works on graph adversarial learning tasks systemically. Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks, and give proper definitions and taxonomies at the same time. Besides, we emphasize the importance of related evaluation metrics, and investigate and summarize them comprehensively. Hopefully, our works can serve as a reference for the relevant researchers, thus providing assistance for their studies. More details of our works are available at //github.com/gitgiter/Graph-Adversarial-Learning.