亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Anecdotally, using an estimated propensity score is superior to the true propensity score in estimating the average treatment effect based on observational data. However, this claim comes with several qualifications: it holds only if propensity score model is correctly specified and the number of covariates $d$ is small relative to the sample size $n$. We revisit this phenomenon by studying the inverse propensity score weighting (IPW) estimator based on a logistic model with a diverging number of covariates. We first show that the IPW estimator based on the estimated propensity score is consistent and asymptotically normal with smaller variance than the oracle IPW estimator (using the true propensity score) if and only if $n \gtrsim d^2$. We then propose a debiased IPW estimator that achieves the same guarantees in the regime $n \gtrsim d^{3/2}$. Our proofs rely on a novel non-asymptotic decomposition of the IPW error along with careful control of the higher order terms.

相關內容

We derive upper bounds for random design linear regression with dependent ($\beta$-mixing) data absent any realizability assumptions. In contrast to the strictly realizable martingale noise regime, no sharp instance-optimal non-asymptotics are available in the literature. Up to constant factors, our analysis correctly recovers the variance term predicted by the Central Limit Theorem -- the noise level of the problem -- and thus exhibits graceful degradation as we introduce misspecification. Past a burn-in, our result is sharp in the moderate deviations regime, and in particular does not inflate the leading order term by mixing time factors.

The finite sample variance of an inverse propensity weighted estimator is derived in the case of discrete control variables with finite support. The obtained expressions generally corroborate widely-cited asymptotic theory showing that estimated propensity scores are superior to true propensity scores in the context of inverse propensity weighting. However, similar analysis of a modified estimator demonstrates that foreknowledge of the true propensity function can confer a statistical advantage when estimating average treatment effects.

The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. We study a family of low-complexity classifiers consisting of thresholding a random one-dimensional feature. The feature is obtained by projecting the data on a random line after embedding it into a higher-dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n, based on its performance on training data, is chosen. We show that this type of classifier is extremely flexible, as it is likely to approximate, to an arbitrary precision, any continuous function on a compact set as well as any boolean function on a compact set that splits the support into measurable subsets. In particular, given full knowledge of the class conditional densities, the error of these low-complexity classifiers would converge to the optimal (Bayes) error as k and n go to infinity. On the other hand, if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity. We also bound the generalization error of our random classifiers. In general, our bounds are better than those for any classifier with VC dimension greater than O (ln n) . In particular, our bounds imply that, unless the number of projections n is extremely large, there is a significant advantageous gap between the generalization error of the random projection approach and that of a linear classifier in the extended space. Asymptotically, as the number of samples approaches infinity, the gap persists for any such n. Thus, there is a potentially large gain in generalization properties by selecting parameters at random, rather than optimization.

Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions.

The goal of unbiased learning to rank (ULTR) is to leverage implicit user feedback for optimizing learning-to-rank systems. Among existing solutions, automatic ULTR algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice. Despite their theoretical soundness, the effectiveness is usually justified under a weak logging policy, where the ranking model can barely rank documents according to their relevance to the query. However, when the logging policy is strong, e.g., an industry-deployed ranking policy, the reported effectiveness cannot be reproduced. In this paper, we first investigate ULTR from a causal perspective and uncover a negative result: existing ULTR algorithms fail to address the issue of propensity overestimation caused by the query-document relevance confounder. Then, we propose a new learning objective based on backdoor adjustment and highlight its differences from conventional propensity models, which reveal the prevalence of propensity overestimation. On top of that, we introduce a novel propensity model called Logging-Policy-aware Propensity (LPP) model and its distinctive two-step optimization strategy, which allows for the joint learning of LPP and ranking models within the automatic ULTR framework, and actualize the unconfounded propensity estimation for ULTR. Extensive experiments on two benchmarks demonstrate the effectiveness and generalizability of the proposed method.

We present a novel extension of the traditional neural network approach to classification tasks, referred to as variational classification (VC). By incorporating latent variable modeling, akin to the relationship between variational autoencoders and traditional autoencoders, we derive a training objective based on the evidence lower bound (ELBO), optimized using an adversarial approach. Our VC model allows for more flexibility in design choices, in particular class-conditional latent priors, in place of the implicit assumptions made in off-the-shelf softmax classifiers. Empirical evaluation on image and text classification datasets demonstrates the effectiveness of our approach in terms of maintaining prediction accuracy while improving other desirable properties such as calibration and adversarial robustness, even when applied to out-of-domain data.

This paper deals with inference in a class of stable but nearly-unstable processes. Autoregressive processes are considered, in which the bridge between stability and instability is expressed by a time-varying companion matrix $A_{n}$ with spectral radius $\rho(A_{n}) < 1$ satisfying $\rho(A_{n}) \rightarrow 1$. This framework is particularly suitable to understand unit root issues by focusing on the inner boundary of the unit circle. Consistency is established for the empirical covariance and the OLS estimation together with asymptotic normality under appropriate hypotheses when $A$, the limit of $A_n$, has a real spectrum, and a particular case is deduced when $A$ also contains complex eigenvalues. The asymptotic process is integrated with either one unit root (located at 1 or $-1$), or even two unit roots located at 1 and $-1$. Finally, a set of simulations illustrate the asymptotic behavior of the OLS. The results are essentially proved by $L^2$ computations and the limit theory of triangular arrays of martingales.

This correspondence studies the wireless powered over-the-air computation (AirComp) for achieving sustainable wireless data aggregation (WDA) by integrating AirComp and wireless power transfer (WPT) into a joint design. In particular, we consider that a multi-antenna hybrid access point (HAP) employs the transmit energy beamforming to charge multiple single-antenna low-power wireless devices (WDs) in the downlink, and the WDs use the harvested energy to simultaneously send their messages to the HAP for AirComp in the uplink. Under this setup, we minimize the computation mean square error (MSE), by jointly optimizing the transmit energy beamforming and the receive AirComp beamforming at the HAP, as well as the transmit power at the WDs, subject to the maximum transmit power constraint at the HAP and the wireless energy harvesting constraints at individual WDs. To tackle the non-convex computation MSE minimization problem, we present an efficient algorithm to find a converged high-quality solution by using the alternating optimization technique. Numerical results show that the proposed joint WPT-AirComp approach significantly reduces the computation MSE, as compared to other benchmark schemes.

Medical image segmentation requires consensus ground truth segmentations to be derived from multiple expert annotations. A novel approach is proposed that obtains consensus segmentations from experts using graph cuts (GC) and semi supervised learning (SSL). Popular approaches use iterative Expectation Maximization (EM) to estimate the final annotation and quantify annotator's performance. Such techniques pose the risk of getting trapped in local minima. We propose a self consistency (SC) score to quantify annotator consistency using low level image features. SSL is used to predict missing annotations by considering global features and local image consistency. The SC score also serves as the penalty cost in a second order Markov random field (MRF) cost function optimized using graph cuts to derive the final consensus label. Graph cut obtains a global maximum without an iterative procedure. Experimental results on synthetic images, real data of Crohn's disease patients and retinal images show our final segmentation to be accurate and more consistent than competing methods.

Sufficient training data is normally required to train deeply learned models. However, the number of pedestrian images per ID in person re-identification (re-ID) datasets is usually limited, since manually annotations are required for multiple camera views. To produce more data for training deeply learned models, generative adversarial network (GAN) can be leveraged to generate samples for person re-ID. However, the samples generated by vanilla GAN usually do not have labels. So in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated images. With MpRL, the generated samples will be used as supplementary of real training data to train a deep model in a semi-supervised learning fashion. Considering data bias between generated and real samples, MpRL utilizes different contributions from predefined training classes. The contribution-based virtual labels are automatically assigned to generated samples to reduce ambiguous prediction in training. Meanwhile, MpRL only relies on predefined training classes without using extra classes. Furthermore, to reduce over-fitting, a regularized manner is applied to MpRL to regularize the learning process. To verify the effectiveness of MpRL, two state-of-the-art convolutional neural networks (CNNs) are adopted in our experiments. Experiments demonstrate that by assigning MpRL to generated samples, we can further improve the person re-ID performance on three datasets i.e., Market-1501, DukeMTMCreID, and CUHK03. The proposed method obtains +6.29%, +6.30% and +5.58% improvements in rank-1 accuracy over a strong CNN baseline respectively, and outperforms the state-of-the- art methods.

北京阿比特科技有限公司