亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure. This necessitates the development of models that can simultaneously handle both the network peer-effect (arising from neighborhood interactions) and the effect of high-dimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We propose a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates, which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Consequently, our method is computationally efficient and, under various standard regularity conditions, our estimate attains the classical high-dimensional rate of consistency. In particular, our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical logistic regression, when the true parameter is sparse and the underlying network is not too dense. As a consequence of the general results, we derive the rates of consistency of our estimator for various natural graph ensembles, such as bounded degree graphs, sparse Erd\H{o}s-R\'{e}nyi random graphs, and stochastic block models.

相關內容

A completely randomized experiment allows us to estimate the causal effect by the difference in the averages of the outcome under the treatment and control. But, difference-in-means type estimators behave poorly if the potential outcomes have a heavy-tail, or contain a few extreme observations or outliers. We study an alternative estimator by Rosenbaum that estimates the causal effect by inverting a randomization test using ranks. We study the asymptotic properties of this estimator and develop a framework to compare the efficiencies of different estimators of the treatment effect in the setting of randomized experiments. In particular, we show that the Rosenbaum estimator has variance that is asymptotically, in the worst case, at most about 1.16 times the variance of the difference-in-means estimator, and can be much smaller when the potential outcomes are not light-tailed. We further derive a consistent estimator of the asymptotic standard error for the Rosenbaum estimator which immediately yields a readily computable confidence interval for the treatment effect, thereby alleviating the expensive numerical calculations needed to implement the original proposal of Rosenbaum. Further, we propose a regression adjusted version of the Rosenbaum estimator to incorporate additional covariate information in randomization inference. We prove gain in efficiency by this regression adjustment method under a linear regression model. Finally, we illustrate through simulations that, unlike the difference-in-means based estimators, either unadjusted or regression adjusted, these rank-based estimators are efficient and robust against heavy-tailed distributions, contamination, and various model misspecifications.

Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor variables in data sets. It is often used in machine learning to identify important predictor variables. This task, variable selection, typically amounts to fitting a logistic regression model regularized by a convex combination of $\ell_1$ and $\ell_{2}^{2}$ penalties. Since modern big data sets can contain hundreds of thousands to billions of predictor variables, variable selection methods depend on efficient and robust optimization algorithms to perform well. State-of-the-art algorithms for variable selection, however, were not traditionally designed to handle big data sets; they either scale poorly in size or are prone to produce unreliable numerical results. It therefore remains challenging to perform variable selection on big data sets without access to adequate and costly computational resources. In this paper, we propose a nonlinear primal-dual algorithm that addresses these shortcomings. Specifically, we propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty in $O(T(m,n)\log(1/\epsilon))$ operations, where $\epsilon \in (0,1)$ denotes the tolerance and $T(m,n)$ denotes the number of arithmetic operations required to perform matrix-vector multiplication on a data set with $m$ samples each comprising $n$ features. This result improves on the known complexity bound of $O(\min(m^2n,mn^2)\log(1/\epsilon))$ for first-order optimization methods such as the classic primal-dual hybrid gradient or forward-backward splitting methods.

The connection between Bayesian neural networks and Gaussian processes gained a lot of attention in the last few years, with the flagship result that hidden units converge to a Gaussian process limit when the layers width tends to infinity. Underpinning this result is the fact that hidden units become independent in the infinite-width limit. Our aim is to shed some light on hidden units dependence properties in practical finite-width Bayesian neural networks. In addition to theoretical results, we assess empirically the depth and width impacts on hidden units dependence properties.

Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.

Feature selection is an extensively studied technique in the machine learning literature where the main objective is to identify the subset of features that provides the highest predictive power. However, in causal inference, our goal is to identify the set of variables that are associated with both the treatment variable and outcome (i.e., the confounders). While controlling for the confounding variables helps us to achieve an unbiased estimate of causal effect, recent research shows that controlling for purely outcome predictors along with the confounders can reduce the variance of the estimate. In this paper, we propose an Outcome Adaptive Elastic-Net (OAENet) method specifically designed for causal inference to select the confounders and outcome predictors for inclusion in the propensity score model or in the matching mechanism. OAENet provides two major advantages over existing methods: it performs superiorly on correlated data, and it can be applied to any matching method and any estimates. In addition, OAENet is computationally efficient compared to state-of-the-art methods.

We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.

We propose a general method for distributed Bayesian model choice, using the marginal likelihood, where a data set is split in non-overlapping subsets. These subsets are only accessed locally by individual workers and no data is shared between the workers. We approximate the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The results are combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. Our divide-and-conquer approach enables Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. We derive theoretical error bounds that quantify the resulting trade-off between computational gain and loss in precision. The embarrassingly parallel nature yields important speed-ups when used on massive data sets as illustrated by our real world experiments. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple feature combinations within one run.

Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i.e., models with more MACs typically achieve better accuracy, such as EfficientNet and RegNet. These works try to enlarge all the stages in the model with one unified rule by sampling and statistical methods. However, we observe that some network architectures have similar MACs and accuracies, but their allocations on computations for different stages are quite different. In this paper, we propose to enlarge the capacity of CNN models by improving their width, depth and resolution on stage level. Under the assumption that the top-performing smaller CNNs are a proper subcomponent of the top-performing larger CNNs, we propose an greedy network enlarging method based on the reallocation of computations. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. On EfficientNet, our method consistently outperforms the performance of the original scaling method. In particular, with application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies under the setting of 600M and 4.4B MACs, respectively.

Given a real-valued hypothesis class $\mathcal{H}$, we investigate under what conditions there is a differentially private algorithm which learns an optimal hypothesis from $\mathcal{H}$ given i.i.d. data. Inspired by recent results for the related setting of binary classification (Alon et al., 2019; Bun et al., 2020), where it was shown that online learnability of a binary class is necessary and sufficient for its private learnability, Jung et al. (2020) showed that in the setting of regression, online learnability of $\mathcal{H}$ is necessary for private learnability. Here online learnability of $\mathcal{H}$ is characterized by the finiteness of its $\eta$-sequential fat shattering dimension, ${\rm sfat}_\eta(\mathcal{H})$, for all $\eta > 0$. In terms of sufficient conditions for private learnability, Jung et al. (2020) showed that $\mathcal{H}$ is privately learnable if $\lim_{\eta \downarrow 0} {\rm sfat}_\eta(\mathcal{H})$ is finite, which is a fairly restrictive condition. We show that under the relaxed condition $\lim \inf_{\eta \downarrow 0} \eta \cdot {\rm sfat}_\eta(\mathcal{H}) = 0$, $\mathcal{H}$ is privately learnable, establishing the first nonparametric private learnability guarantee for classes $\mathcal{H}$ with ${\rm sfat}_\eta(\mathcal{H})$ diverging as $\eta \downarrow 0$. Our techniques involve a novel filtering procedure to output stable hypotheses for nonparametric function classes.

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

北京阿比特科技有限公司