亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The simultaneous estimation of multiple unknown parameters lies at heart of a broad class of important problems across science and technology. Currently, the state-of-the-art performance in the such problems is achieved by nonparametric empirical Bayes methods. However, these approaches still suffer from two major issues. First, they solve a frequentist problem but do so by following Bayesian reasoning, posing a philosophical dilemma that has contributed to somewhat uneasy attitudes toward empirical Bayes methodology. Second, their computation relies on certain density estimates that become extremely unreliable in some complex simultaneous estimation problems. In this paper, we study these issues in the context of the canonical Gaussian sequence problem. We propose an entirely frequentist alternative to nonparametric empirical Bayes methods by establishing a connection between simultaneous estimation and penalized nonparametric regression. We use flexible regularization strategies, such as shape constraints, to derive accurate estimators without appealing to Bayesian arguments. We prove that our estimators achieve asymptotically optimal regret and show that they are competitive with or can outperform nonparametric empirical Bayes methods in simulations and an analysis of spatially resolved gene expression data.

相關內容

This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hern\'an and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.

We propose a general purpose confidence interval procedure (CIP) for statistical functionals constructed using data from a stationary time series. The procedures we propose are based on derived distribution-free analogues of the $\chi^2$ and Student's $t$ random variables for the statistical functional context, and hence apply in a wide variety of settings including quantile estimation, gradient estimation, M-estimation, CVAR-estimation, and arrival process rate estimation, apart from more traditional statistical settings. Like the method of subsampling, we use overlapping batches of time series data to estimate the underlying variance parameter; unlike subsampling and the bootstrap, however, we assume that the implied point estimator of the statistical functional obeys a central limit theorem (CLT) to help identify the weak asymptotics (called OB-x limits, x=I,II,III) of batched Studentized statistics. The OB-x limits, certain functionals of the Wiener process parameterized by the size of the batches and the extent of their overlap, form the essential machinery for characterizing dependence, and consequently the correctness of the proposed CIPs. The message from extensive numerical experimentation is that in settings where a functional CLT on the point estimator is in effect, using \emph{large overlapping batches} alongside OB-x critical values yields confidence intervals that are often of significantly higher quality than those obtained from more generic methods like subsampling or the bootstrap. We illustrate using examples from CVaR estimation, ARMA parameter estimation, and NHPP rate estimation; R and MATLAB code for OB-x critical values is available at~\texttt{web.ics.purdue.edu/~pasupath/}.

Granger causality is among the widely used data-driven approaches for causal analysis of time series data with applications in various areas including economics, molecular biology, and neuroscience. Two of the main challenges of this methodology are: 1) over-fitting as a result of limited data duration, and 2) correlated process noise as a confounding factor, both leading to errors in identifying the causal influences. Sparse estimation via the LASSO has successfully addressed these challenges for parameter estimation. However, the classical statistical tests for Granger causality resort to asymptotic analysis of ordinary least squares, which require long data duration to be useful and are not immune to confounding effects. In this work, we address this disconnect by introducing a LASSO-based statistic and studying its non-asymptotic properties under the assumption that the true models admit sparse autoregressive representations. We establish fundamental limits for reliable identification of Granger causal influences using the proposed LASSO-based statistic. We further characterize the false positive error probability and test power of a simple thresholding rule for identifying Granger causal effects and provide two methods to set the threshold in a data-driven fashion. We present simulation studies and application to real data to compare the performance of our proposed method to ordinary least squares and existing LASSO-based methods in detecting Granger causal influences, which corroborate our theoretical results.

Difference-in-differences is without a doubt the most widely used method for evaluating the causal effect of a hypothetical intervention in the possible presence of confounding bias due to hidden factors. The approach is typically used when both pre- and post-exposure outcome measurements are available, and one can reasonably assume that the additive association of the unobserved confounder with the outcome is equal in the two exposure arms, and constant over time; a so-called parallel trends assumption. The parallel trends assumption may not be credible in many practical settings, including if the outcome is binary, a count, or polytomous, and more generally, when the unmeasured confounder exhibits non-additive effects on the distribution of the outcome, even if such effects are constant over time. We introduce an alternative approach that replaces the parallel trends assumption with an odds ratio equi-confounding assumption, which states that confounding bias for the causal effect of interest, encoded by an association between treatment and the potential outcome under no-treatment can be identified with a well-specified generalized linear model relating the pre-exposure outcome and the exposure. As the proposed method identifies any causal effect that is conceivably identified in the absence of confounding bias, including nonlinear effects such as quantile treatment effects, the approach is aptly called Universal Difference-in-differences (UDiD). Both fully parametric and more robust semiparametric UDiD estimators are described and illustrated in a real-world application concerning the causal effects of a Zika virus outbreak on birth rate in Brazil.

Statistical inference of the high-dimensional regression coefficients is challenging because the uncertainty introduced by the model selection procedure is hard to account for. A critical question remains unsettled; that is, is it possible and how to embed the inference of the model into the simultaneous inference of the coefficients? To this end, we propose a notion of simultaneous confidence intervals called the sparsified simultaneous confidence intervals. Our intervals are sparse in the sense that some of the intervals' upper and lower bounds are shrunken to zero (i.e., $[0,0]$), indicating the unimportance of the corresponding covariates. These covariates should be excluded from the final model. The rest of the intervals, either containing zero (e.g., $[-1,1]$ or $[0,1]$) or not containing zero (e.g., $[2,3]$), indicate the plausible and significant covariates, respectively. The proposed method can be coupled with various selection procedures, making it ideal for comparing their uncertainty. For the proposed method, we establish desirable asymptotic properties, develop intuitive graphical tools for visualization, and justify its superior performance through simulation and real data analysis.

We initiate the study of repeated game dynamics in the population model, in which we are given a population of $n$ nodes, each with its local strategy, which interact uniformly at random by playing multi-round, two-player games. After each game, the two participants receive rewards according to a given payoff matrix, and may update their local strategies depending on this outcome. In this setting, we ask how the distribution of player strategies evolves with respect to the number of node interactions (time complexity), as well as the number of possible player states (space complexity), determining the stationary properties of such game dynamics. Our main technical results analyze the behavior of a family of Repeated Prisoner's Dilemma dynamics in this model, for which we provide an exact characterization of the stationary distribution, and give bounds on convergence time and on the optimality gap of its expected rewards. Our results follow from a new connection between Repeated Prisoner's Dilemma dynamics in a population, and a class of high-dimensional, weighted Ehrenfest random walks, which we analyze for the first time. The results highlight non-trivial trade-offs between the state complexity of each node's strategy, the convergence of the process, and the expected average reward of nodes in the population. Our approach opens the door towards the characterization of other natural evolutionary game dynamics in the population model.

We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the representation cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than $\sqrt n$-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at $\sqrt n$-rate, but we also consider efficient $\sqrt n$-estimation using novel nonlinear estimators. The general approach is applied in detail to the example of estimating a mean response when the response is not always observed.

Analysis of high-dimensional data, where the number of covariates is larger than the sample size, is a topic of current interest. In such settings, an important goal is to estimate the signal level $\tau^2$ and noise level $\sigma^2$, i.e., to quantify how much variation in the response variable can be explained by the covariates, versus how much of the variation is left unexplained. This thesis considers the estimation of these quantities in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no responses $Y$. Our main research question is: how can one use the unlabeled data to better estimate $\tau^2$ and $\sigma^2$? We consider two frameworks: a linear regression model and a linear projection model in which linearity is not assumed. In the first framework, while linear regression is used, no sparsity assumptions on the coefficients are made. In the second framework, the linearity assumption is also relaxed and we aim to estimate the signal and noise levels defined by the linear projection. We first propose a naive estimator which is unbiased and consistent, under some assumptions, in both frameworks. We then show how the naive estimator can be improved by using zero-estimators, where a zero-estimator is a statistic arising from the unlabeled data, whose expected value is zero. In the first framework, we calculate the optimal zero-estimator improvement and discuss ways to approximate the optimal improvement. In the second framework, such optimality does no longer hold and we suggest two zero-estimators that improve the naive estimator although not necessarily optimally. Furthermore, we show that our approach reduces the variance for general initial estimators and we present an algorithm that potentially improves any initial estimator. Lastly, we consider four datasets and study the performance of our suggested methods.

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司