亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate model accordingly. Our non-asymptotic excess risk bounds show that in quite general scenarios, our estimator adapts to the structure of the target distribution as well as the covariate shift. It achieves the minimax optimal error rate up to a logarithmic factor. The use of pseudo-labels in model selection does not have major negative impacts.

相關內容

One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is a widely used model, for which various estimation methods have been developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community detection under SCWA and show that it is community detection consistent. In addition, we investigate model selection in terms of the number of communities and feature selection for the pairwise covariates, and propose two corresponding algorithms. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.

Bayesian nonparametric hierarchical priors provide flexible models for sharing of information within and across groups. We focus on latent feature allocation models, where the data structures correspond to multisets or unbounded sparse matrices. The fundamental development in this regard is the Hierarchical Indian Buffet process (HIBP), devised by Thibaux and Jordan (2007). However, little is known in terms of explicit tractable descriptions of the joint, marginal, posterior and predictive distributions of the HIBP. We provide explicit novel descriptions of these quantities, in the Bernoulli HIBP and general spike and slab HIBP settings, which allows for exact sampling and simpler practical implementation. We then extend these results to the more complex setting of hierarchies of general HIBP (HHIBP). The generality of our framework allows one to recognize important structure that may otherwise be masked in the Bernoulli setting, and involves characterizations via dynamic mixed Poisson random count matrices. Our analysis shows that the standard choice of hierarchical Beta processes for modeling across group sharing is not ideal in the classic Bernoulli HIBP setting proposed by Thibaux and Jordan (2007), or other spike and slab HIBP settings, and we thus indicate tractable alternative priors.

We propose a new data-driven approach for learning the fundamental solutions (Green's functions) of various linear partial differential equations (PDEs) given sample pairs of input-output functions. Building off the theory of functional linear regression (FLR), we estimate the best-fit Green's function and bias term of the fundamental solution in a reproducing kernel Hilbert space (RKHS) which allows us to regularize their smoothness and impose various structural constraints. We derive a general representer theorem for operator RKHSs to approximate the original infinite-dimensional regression problem by a finite-dimensional one, reducing the search space to a parametric class of Green's functions. In order to study the prediction error of our Green's function estimator, we extend prior results on FLR with scalar outputs to the case with functional outputs. Finally, we demonstrate our method on several linear PDEs including the Poisson, Helmholtz, Schr\"{o}dinger, Fokker-Planck, and heat equation. We highlight its robustness to noise as well as its ability to generalize to new data with varying degrees of smoothness and mesh discretization without any additional training.

In practice, the use of rounding is ubiquitous. Although researchers have looked at the implications of rounding continuous random variables, rounding may be applied to functions of discrete random variables as well. For example, to infer on suicide excess deaths after a national emergency, authorities may provide a rounded average of deaths before and after the emergency started. Suicide rates tend to be relatively low around the world and such rounding may seriously affect inference on the change of suicide rate. In this paper, we study the scenario when a rounded to nearest integer average is used as a proxy for a non-negative discrete random variable. Specifically, our interest is in drawing inference on a parameter from the pmf of Y , when we get U = n[Y /n] as a proxy for Y . The probability generating function of U , E(U ), and Var(U ) capture the effect of the coarsening of the support of Y . Also, moments and estimators of distribution parameters are explored for some special cases. We show that under certain conditions, there is little impact from rounding. However, we also find scenarios where rounding can significantly affect statistical inference as demonstrated in three examples. The simple methods we propose are able to partially counter rounding error effects. While for some probability distributions it may be difficult to derive maximum likelihood estimators as a function of U , we provide a framework to obtain an estimator numerically.

To estimate causal effects, analysts performing observational studies in health settings utilize several strategies to mitigate bias due to confounding by indication. There are two broad classes of approaches for these purposes: use of confounders and instrumental variables (IVs). Because such approaches are largely characterized by untestable assumptions, analysts must operate under an indefinite paradigm that these methods will work imperfectly. In this tutorial, we formalize a set of general principles and heuristics for estimating causal effects in the two approaches when the assumptions are potentially violated. This crucially requires reframing the process of observational studies as hypothesizing potential scenarios where the estimates from one approach are less inconsistent than the other. While most of our discussion of methodology centers around the linear setting, we touch upon complexities in non-linear settings and flexible procedures such as target minimum loss-based estimation (TMLE) and double machine learning (DML). To demonstrate the application of our principles, we investigate the use of donepezil off-label for mild cognitive impairment (MCI). We compare and contrast results from confounder and IV methods, traditional and flexible, within our analysis and to a similar observational study and clinical trial.

Fr\'echet global regression is extended to the context of bivariate curve stochastic processes with values in a Riemannian manifold. The proposed regression predictor arises as a reformulation of the standard least-squares parametric linear predictor in terms of a weighted Fr\'echet functional mean. Specifically, in our context, in this reformulation, the Euclidean distance is replaced by the integrated quadratic geodesic distance. The regression predictor is then obtained from the weighted Fr\'echet curve mean, lying in the time-varying geodesic submanifold, generated by the regressor process components involved in the time correlation range. The regularized Fr\'echet weights are computed in the time-varying tangent spaces. The uniform weak-consistency of the regression predictor is proved. Model selection is also addressed. A simulation study is undertaken to illustrate the performance of the spherical curve variable selection algorithm proposed in a multivariate framework.

Recent work in the matrix completion literature has shown that prior knowledge of a matrix's row and column spaces can be successfully incorporated into reconstruction programs to substantially benefit matrix recovery. This paper proposes a novel methodology that exploits more general forms of known matrix structure in terms of subspaces. The work derives reconstruction error bounds that are informative in practice, providing insight to previous approaches in the literature while introducing novel programs that severely reduce sampling complexity. The main result shows that a family of weighted nuclear norm minimization programs incorporating a $M_1 r$-dimensional subspace of $n\times n$ matrices (where $M_1\geq 1$ conveys structural properties of the subspace) allow accurate approximation of a rank $r$ matrix aligned with the subspace from a near-optimal number of observed entries (within a logarithmic factor of $M_1 r)$. The result is robust, where the error is proportional to measurement noise, applies to full rank matrices, and reflects degraded output when erroneous prior information is utilized. Numerical experiments are presented that validate the theoretical behavior derived for several example weighted programs.

Sobol' indices and Shapley effects are attractive methods of assessing how a function depends on its various inputs. The existing literature contains various estimators for these two classes of sensitivity indices, but few estimators of Sobol' indices and no estimators of Shapley effects are computationally tractable for moderate-to-large input dimensions. This article provides a Shapley-effect estimator that is computationally tractable for a moderate-to-large input dimension. The estimator uses a metamodel-based approach by first fitting a Bayesian Additive Regression Trees model which is then used to compute Shapley-effect estimates. This article also establishes posterior contraction rates on a large function class for this Shapley-effect estimator and for the analogous existing Sobol'-index estimator. Finally, this paper explores the performance of these Shapley-effect estimators on four different test functions for moderate-to-large input dimensions and number of observations.

Backward Stochastic Differential Equations (BSDEs) have been widely employed in various areas of social and natural sciences, such as the pricing and hedging of financial derivatives, stochastic optimal control problems, optimal stopping problems and gene expression. Most BSDEs cannot be solved analytically and thus numerical methods must be applied to approximate their solutions. There have been a variety of numerical methods proposed over the past few decades as well as many more currently being developed. For the most part, they exist in a complex and scattered manner with each requiring a variety of assumptions and conditions. The aim of the present work is thus to systematically survey various numerical methods for BSDEs, and in particular, compare and categorize them, for further developments and improvements. To achieve this goal, we focus primarily on the core features of each method based on an extensive collection of 333 references: the main assumptions, the numerical algorithm itself, key convergence properties and advantages and disadvantages, to provide an up-to-date coverage of numerical methods for BSDEs, with insightful summaries of each and a useful comparison and categorization.

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

北京阿比特科技有限公司