{mayi_des}
In studies of maternal exposure to air pollution a children's health outcome is regressed on exposures observed during pregnancy. The distributed lag nonlinear model (DLNM) is a statistical method commonly implemented to estimate an exposure-time-response function when it is postulated the exposure effect is nonlinear. Previous implementations of the DLNM estimate an exposure-time-response surface parameterized with a bivariate basis expansion. However, basis functions such as splines assume smoothness across the entire exposure-time-response surface, which may be unrealistic in settings where the exposure is associated with the outcome only in a specific time window. We propose a framework for estimating the DLNM based on Bayesian additive regression trees. Our method operates using a set of regression trees that each assume piecewise constant relationships across the exposure-time space. In a simulation, we show that our model outperforms spline-based models when the exposure-time surface is not smooth, while both methods perform similarly in settings where the true surface is smooth. Importantly, the proposed approach is lower variance and more precisely identifies critical windows during which exposure is associated with a future health outcome. We apply our method to estimate the association between maternal exposure to PM$_{2.5}$ and birth weight in a Colorado USA birth cohort.
Empirical researchers are usually interested in investigating the impacts of baseline covariates have when uncovering sample heterogeneity and separating samples into more homogeneous groups. However, a considerable number of studies in the structural equation modeling (SEM) framework usually start with vague hypotheses in terms of heterogeneity and possible reasons. It suggests that (1) the determination and specification of a proper model with covariates is not straightforward, and (2) the exploration process may be computational intensive given that a model in the SEM framework is usually complicated and the pool of candidate covariates is usually huge in the psychological and educational domain where the SEM framework is widely employed. Following Bakk and Kuha (2017), this article presents a two-step growth mixture model (GMM) that examines the relationship between latent classes of nonlinear trajectories and baseline characteristics. Our simulation studies demonstrate that the proposed model is capable of clustering the nonlinear change patterns, and estimating the parameters of interest unbiasedly, precisely, as well as exhibiting appropriate confidence interval coverage. Considering the pool of candidate covariates is usually huge and highly correlated, this study also proposes implementing exploratory factor analysis (EFA) to reduce the dimension of covariate space. We illustrate how to use the hybrid method, the two-step GMM and EFA, to efficiently explore the heterogeneity of nonlinear trajectories of longitudinal mathematics achievement data.
Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. For protein-level results based on peptide-level quantification data, an aggregation step is also included. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.
We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized ones, but this operation is not differentiable. There are mainly two approaches to training quantized networks with gradient-based optimizers. First, a straight-through estimator (STE) replaces the zero derivative of the rounding with that of an identity function, which causes a gradient mismatch problem. Second, soft quantizers approximate the rounding with continuous functions at training time, and exploit the rounding for quantization at test time. This alleviates the gradient mismatch, but causes a quantizer gap problem. We alleviate both problems in a unified framework. To this end, we introduce a novel quantizer, dubbed a distance-aware quantizer (DAQ), that mainly consists of a distance-aware soft rounding (DASR) and a temperature controller. To alleviate the gradient mismatch problem, DASR approximates the discrete rounding with the kernel soft argmax, which is based on our insight that the quantization can be formulated as a distance-based assignment problem between full-precision values and quantized ones. The controller adjusts the temperature parameter in DASR adaptively according to the input, addressing the quantizer gap problem. Experimental results on standard benchmarks show that DAQ outperforms the state of the art significantly for various bit-widths without bells and whistles.
Suppose that particles are randomly distributed in $\bR^d$, and they are subject to identical stochastic motion independently of each other. The Smoluchowski process describes fluctuations of the number of particles in an observation region over time. This paper studies properties of the Smoluchowski processes and considers related statistical problems. In the first part of the paper we revisit probabilistic properties of the Smoluchowski process in a unified and principled way: explicit formulas for generating functionals and moments are derived, conditions for stationarity and Gaussian approximation are discussed, and relations to other stochastic models are highlighted. The second part deals with statistics of the Smoluchowki processes. We consider two different models of the particle displacement process: the undeviated uniform motion (when a particle moves with random constant velocity along a straight line) and the Brownian motion displacement. In the setting of the undeviated uniform motion we study the problems of estimating the mean speed and the speed distribution, while for the Brownian displacement model the problem of estimating the diffusion coefficient is considered. In all these settings we develop estimators with provable accuracy guarantees.
In randomized trials with continuous-valued outcomes the goal is often to estimate the difference in average outcomes between two treatment groups. However, the outcome in some trials is longitudinal, meaning that multiple measurements of the same outcome are taken over time for each subject. The target of inference in this case is often still the difference in averages at a given timepoint. One way to analyze these data is to ignore the measurements at intermediate timepoints and proceed with a standard covariate-adjusted analysis (e.g. ANCOVA) with the complete cases. However, it is generally thought that exploiting information from intermediate timepoints using mixed models for repeated measures (MMRM) a) increases power and b) more naturally "handles" missing data. Here we prove that neither of these conclusions is entirely correct when baseline covariates are adjusted for without including time-by-covariate interactions. We back these claims up with simulations. MMRM provides benefits over complete-cases ANCOVA in many cases, but covariate-time interaction terms should always be included to guarantee the best results.
Empirical researchers are usually interested in investigating the impacts of baseline covariates have when uncovering sample heterogeneity and separating samples into more homogeneous groups. However, a considerable number of studies in the structural equation modeling (SEM) framework usually start with vague hypotheses in terms of heterogeneity and possible reasons. It suggests that (1) the determination and specification of a proper model with covariates is not straightforward, and (2) the exploration process may be computational intensive given that a model in the SEM framework is usually complicated and the pool of candidate covariates is usually huge in the psychological and educational domain where the SEM framework is widely employed. Following Bakk and Kuha (2017), this article presents a two-step growth mixture model (GMM) that examines the relationship between latent classes of nonlinear trajectories and baseline characteristics. Our simulation studies demonstrate that the proposed model is capable of clustering the nonlinear change patterns, and estimating the parameters of interest unbiasedly, precisely, as well as exhibiting appropriate confidence interval coverage. Considering the pool of candidate covariates is usually huge and highly correlated, this study also proposes implementing exploratory factor analysis (EFA) to reduce the dimension of covariate space. We illustrate how to use the hybrid method, the two-step GMM and EFA, to efficiently explore the heterogeneity of nonlinear trajectories of longitudinal mathematics achievement data.
In this work, we study how to implement a distributed algorithm for the power method in a parallel manner. As the existing distributed power method is usually sequentially updating the eigenvectors, it exhibits two obvious disadvantages: 1) when it calculates the $h$th eigenvector, it needs to wait for the results of previous $(h-1)$ eigenvectors, which causes a delay in acquiring all the eigenvalues; 2) when calculating each eigenvector, it needs a certain cost of information exchange within the neighboring nodes for every power iteration, which could be unbearable when the number of eigenvectors or the number of nodes is large. This motivates us to propose a parallel distributed power method, which simultaneously calculates all the eigenvectors at each power iteration to ensure that more information could be exchanged in one shaking-hand of communication. We are particularly interested in the distributed power method for both an eigenvalue decomposition (EVD) and a singular value decomposition (SVD), wherein the distributed process is proceed based on a gossip algorithm. It can be shown that, under the same condition, the communication cost of the gossip-based parallel method is only $1/H$ times of that for the sequential counterpart, where $H$ is the number of eigenvectors we want to compute, while the convergence time and error performance of the proposed parallel method are both comparable to those of its sequential counterpart.
Distributionally robust optimization (DRO) is a worst-case framework for stochastic optimization under uncertainty that has drawn fast-growing studies in recent years. When the underlying probability distribution is unknown and observed from data, DRO suggests to compute the worst-case distribution within a so-called uncertainty set that captures the involved statistical uncertainty. In particular, DRO with uncertainty set constructed as a statistical divergence neighborhood ball has been shown to provide a tool for constructing valid confidence intervals for nonparametric functionals, and bears a duality with the empirical likelihood (EL). In this paper, we show how adjusting the ball size of such type of DRO can reduce higher-order coverage errors similar to the Bartlett correction. Our correction, which applies to general von Mises differentiable functionals, is more general than the existing EL literature that only focuses on smooth function models or $M$-estimation. Moreover, we demonstrate a higher-order "self-normalizing" property of DRO regardless of the choice of divergence. Our approach builds on the development of a higher-order expansion of DRO, which is obtained through an asymptotic analysis on a fixed point equation arising from the Karush-Kuhn-Tucker conditions.
There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protected attributes, as well as their intersections. In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases that are encoded in word embeddings, eliminating the need for access to protected attributes. Crucially, it only requires access to individuals' names at training time and not at deployment time. We evaluate two variations of our proposed method using a large-scale dataset of online biographies. We find that both variations simultaneously reduce race and gender biases, with almost no reduction in the classifier's overall true positive rate.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.