In multiple testing several criteria to control for type I errors exist. The false discovery rate, which evaluates the expected proportion of false discoveries among the rejected null hypotheses, has become the standard approach in this setting. However, false discovery rate control may be too conservative when the effects are weak. In this paper we alternatively propose to control the number of significant effects, where 'significant' refers to a pre-specified threshold $\gamma$. This means that a $(1-\alpha)$-lower confidence bound $L$ for the number of non-true null hypothesis with p-values below $\gamma$ is provided. When one rejects the nulls corresponding to the $L$ smallest p-values, the probability that the number of false positives exceeds the number of false negatives among the significant effects is bounded by $\alpha$. Relative merits of the proposed criterion are discussed. Procedures to control for the number of significant effects in practice are introduced and investigated both theoretically and through simulations. Illustrative real data applications are given.
While generalized linear mixed models (GLMMs) are a fundamental tool in applied statistics, many specifications -- such as those involving categorical factors with many levels or interaction terms -- can be computationally challenging to estimate due to the need to compute or approximate high-dimensional integrals. Variational inference (VI) methods are a popular way to perform such computations, especially in the Bayesian context. However, naive VI methods can provide unreliable uncertainty quantification. We show that this is indeed the case in the GLMM context, proving that standard VI (i.e. mean-field) dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to VI methods whose uncertainty quantification does not deteriorate in high-dimensions, and whose total computational cost scales linearly with the number of parameters and observations. Our theoretical and numerical results focus on GLMMs with Gaussian or binomial likelihoods, and rely on connections to random graph theory to obtain sharp high-dimensional asymptotic analysis. We also provide generic results, which are of independent interest, relating the accuracy of variational inference to the convergence rate of the corresponding coordinate ascent variational inference (CAVI) algorithm for Gaussian targets. Our proposed partially-factorized VI (PF-VI) methodology for GLMMs is implemented in the R package vglmer, see //github.com/mgoplerud/vglmer . Numerical results with simulated and real data examples illustrate the favourable computation cost versus accuracy trade-off of PF-VI.
This study investigates the misclassification excess risk bound in the context of 1-bit matrix completion, a significant problem in machine learning involving the recovery of an unknown matrix from a limited subset of its entries. Matrix completion has garnered considerable attention in the last two decades due to its diverse applications across various fields. Unlike conventional approaches that deal with real-valued samples, 1-bit matrix completion is concerned with binary observations. While prior research has predominantly focused on the estimation error of proposed estimators, our study shifts attention to the prediction error. This paper offers theoretical analysis regarding the prediction errors of two previous works utilizing the logistic regression model: one employing a max-norm constrained minimization and the other employing nuclear-norm penalization. Significantly, our findings demonstrate that the latter achieves the minimax-optimal rate without the need for an additional logarithmic term. These novel results contribute to a deeper understanding of 1-bit matrix completion by shedding light on the predictive performance of specific methodologies.
We consider the inverse problem of reconstructing an unknown function $u$ from a finite set of measurements, under the assumption that $u$ is the trajectory of a transport-dominated problem with unknown input parameters. We propose an algorithm based on the Parameterized Background Data-Weak method (PBDW) where dynamical sensor placement is combined with approximation spaces that evolve in time. We prove that the method ensures an accurate reconstruction at all times and allows to incorporate relevant physical properties in the reconstructed solutions by suitably evolving the dynamical approximation space. As an application of this strategy we consider Hamiltonian systems modeling wave-type phenomena, where preservation of the geometric structure of the flow plays a crucial role in the accuracy and stability of the reconstructed trajectory.
We present a discretization of the dynamic optimal transport problem for which we can obtain the convergence rate for the value of the transport cost to its continuous value when the temporal and spatial stepsize vanish. This convergence result does not require any regularity assumption on the measures, though experiments suggest that the rate is not sharp. Via an analysis of the duality gap we also obtain the convergence rates for the gradient of the optimal potentials and the velocity field under mild regularity assumptions. To obtain such rates we discretize the dual formulation of the dynamic optimal transport problem and use the mature literature related to the error due to discretizing the Hamilton-Jacobi equation.
The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects (CACE and NACE), we address outcome missingness of the \textit{latent missing at random} type (LMAR, also known as \textit{latent ignorability}). That is, conditional on covariates and treatment assigned, the missingness may depend on compliance type. Within the instrumental variable (IV) approach to noncompliance, methods have been proposed for handling LMAR outcome that additionally invoke an exclusion restriction type assumption on missingness, but no solution has been proposed for when a non-IV approach is used. This paper focuses on effect identification in the presence of LMAR outcome, with a view to flexibly accommodate different principal identification approaches. We show that under treatment assignment ignorability and LMAR only, effect nonidentifiability boils down to a set of two connected mixture equations involving unidentified stratum-specific response probabilities and outcome means. This clarifies that (except for a special case) effect identification generally requires two additional assumptions: a \textit{specific missingness mechanism} assumption and a \textit{principal identification} assumption. This provides a template for identifying effects based on separate choices of these assumptions. We consider a range of specific missingness assumptions, including those that have appeared in the literature and some new ones. Incidentally, we find an issue in the existing assumptions, and propose a modification of the assumptions to avoid the issue. Results under different assumptions are illustrated using data from the Baltimore Experience Corps Trial.
In this article, we study nonparametric inference for a covariate-adjusted regression function. This parameter captures the average association between a continuous exposure and an outcome after adjusting for other covariates. In particular, under certain causal conditions, this parameter corresponds to the average outcome had all units been assigned to a specific exposure level, known as the causal dose-response curve. We propose a debiased local linear estimator of the covariate-adjusted regression function, and demonstrate that our estimator converges pointwise to a mean-zero normal limit distribution. We use this result to construct asymptotically valid confidence intervals for function values and differences thereof. In addition, we use approximation results for the distribution of the supremum of an empirical process to construct asymptotically valid uniform confidence bands. Our methods do not require undersmoothing, permit the use of data-adaptive estimators of nuisance functions, and our estimator attains the optimal rate of convergence for a twice differentiable function. We illustrate the practical performance of our estimator using numerical studies and an analysis of the effect of air pollution exposure on cardiovascular mortality.
Many existing covariate shift adaptation methods estimate sample weights to be used in the risk estimation in order to mitigate the gap between the source and the target distribution. However, non-parametrically estimating the optimal weights typically involves computationally expensive hyper-parameter tuning that is crucial to the final performance. In this paper, we propose a new non-parametric approach to covariate shift adaptation which avoids estimating weights and has no hyper-parameter to be tuned. Our basic idea is to label unlabeled target data according to the $k$-nearest neighbors in the source dataset. Our analysis indicates that setting $k = 1$ is an optimal choice. Thanks to this property, there is no need to tune any hyper-parameters, unlike other non-parametric methods. Moreover, our method achieves a running time quasi-linear in the sample size with a theoretical guarantee, for the first time in the literature to the best of our knowledge. Our results include sharp rates of convergence for estimating the joint probability distribution of the target data. In particular, the variance of our estimators has the same rate of convergence as for standard parametric estimation despite their non-parametric nature. Our numerical experiments show that proposed method brings drastic reduction in the running time with accuracy comparable to that of the state-of-the-art methods.
This paper launches a thorough discussion on the locality of local neural operator (LNO), which is the core that enables LNO great flexibility on varied computational domains in solving transient partial differential equations (PDEs). We investigate the locality of LNO by looking into its receptive field and receptive range, carrying a main concern about how the locality acts in LNO training and applications. In a large group of LNO training experiments for learning fluid dynamics, it is found that an initial receptive range compatible with the learning task is crucial for LNO to perform well. On the one hand, an over-small receptive range is fatal and usually leads LNO to numerical oscillation; on the other hand, an over-large receptive range hinders LNO from achieving the best accuracy. We deem rules found in this paper general when applying LNO to learn and solve transient PDEs in diverse fields. Practical examples of applying the pre-trained LNOs in flow prediction are presented to confirm the findings further. Overall, with the architecture properly designed with a compatible receptive range, the pre-trained LNO shows commendable accuracy and efficiency in solving practical cases.
We use Stein characterisations to derive new moment-type estimators for the parameters of several multivariate distributions in the i.i.d. case; we also derive the asymptotic properties of these estimators. Our examples include the multivariate truncated normal distribution and several spherical distributions. The estimators are explicit and therefore provide an interesting alternative to the maximum-likelihood estimator. The quality of these estimators is assessed through competitive simulation studies in which we compare their behaviour to the performance of other estimators available in the literature.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.