亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small excess risk. If the true signal is $t$-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only $O(t\log n)$ samples. However, computationally efficient algorithms have sample complexity linear in (some variant of) the condition number of $\Sigma$. Classical algorithms such as the Lasso can require significantly more samples than necessary even if there is only a single sparse approximate dependency among the covariates. We provide a polynomial-time algorithm that, given $\Sigma$, automatically adapts the Lasso to tolerate a small number of approximate dependencies. In particular, we achieve near-optimal sample complexity for constant sparsity and if $\Sigma$ has few ``outlier'' eigenvalues. Our algorithm fits into a broader framework of feature adaptation for sparse linear regression with ill-conditioned covariates. With this framework, we additionally provide the first polynomial-factor improvement over brute-force search for constant sparsity $t$ and arbitrary covariance $\Sigma$.

相關內容

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for small problems) the approach is more stable and faster than gradient-based methods.

The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are also always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically linearly and slower than the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. We assess the procedures through a real-data application with millions of observations, and in high-dimensional logistic regression, where a large-scale simulation experiment produces concrete evidence for the existence of a simple adjustment to the maximum Jeffreys'-penalized likelihood estimates that delivers high accuracy in terms of signal recovery even in cases where estimates from ML and other recently-proposed corrective methods do not exist.

Analysis of high-dimensional data, where the number of covariates is larger than the sample size, is a topic of current interest. In such settings, an important goal is to estimate the signal level $\tau^2$ and noise level $\sigma^2$, i.e., to quantify how much variation in the response variable can be explained by the covariates, versus how much of the variation is left unexplained. This thesis considers the estimation of these quantities in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no responses $Y$. Our main research question is: how can one use the unlabeled data to better estimate $\tau^2$ and $\sigma^2$? We consider two frameworks: a linear regression model and a linear projection model in which linearity is not assumed. In the first framework, while linear regression is used, no sparsity assumptions on the coefficients are made. In the second framework, the linearity assumption is also relaxed and we aim to estimate the signal and noise levels defined by the linear projection. We first propose a naive estimator which is unbiased and consistent, under some assumptions, in both frameworks. We then show how the naive estimator can be improved by using zero-estimators, where a zero-estimator is a statistic arising from the unlabeled data, whose expected value is zero. In the first framework, we calculate the optimal zero-estimator improvement and discuss ways to approximate the optimal improvement. In the second framework, such optimality does no longer hold and we suggest two zero-estimators that improve the naive estimator although not necessarily optimally. Furthermore, we show that our approach reduces the variance for general initial estimators and we present an algorithm that potentially improves any initial estimator. Lastly, we consider four datasets and study the performance of our suggested methods.

We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration (FQI), suggest a $O(1/\sqrt{n})$ convergence for regret, empirical behavior exhibits \emph{much} faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function $Q^*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q^*$-estimate's pointwise convergence rate, thus speeding it up. The level of exponentiation depends on the level of noise in the \emph{decision-making} problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply $O(1/n)$ regret rates in linear cases and $\exp(-\Omega(n))$ regret rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on $L_p$-convergence rates for estimating $Q^*$ rather than pointwise rates, where $L_2$ guarantees for nonparametric $Q^*$-estimation can be ensured under mild conditions.

Sensor devices have been increasingly used in engineering and health studies recently, and the captured multi-dimensional activity and vital sign signals can be studied in association with health outcomes to inform public health. The common approach is the scalar-on-function regression model, in which health outcomes are the scalar responses while high-dimensional sensor signals are the functional covariates, but how to effectively interpret results becomes difficult. In this study, we propose a new Functional Adaptive Double-Sparsity (FadDoS) estimator based on functional regularization of sparse group lasso with multiple functional predictors, which can achieve global sparsity via functional variable selection and local sparsity via zero-subinterval identification within coefficient functions. We prove that the FadDoS estimator converges at a bounded rate and satisfies the oracle property under mild conditions. Extensive simulation studies confirm the theoretical properties and exhibit excellent performances compared to existing approaches. Application to a Kinect sensor study that utilized an advanced motion sensing device tracking human multiple joint movements and conducted among community-dwelling elderly demonstrates how the FadDoS estimator can effectively characterize the detailed association between joint movements and physical health assessments. The proposed method is not only effective in Kinect sensor analysis but also applicable to broader fields, where multi-dimensional sensor signals are collected simultaneously, to expand the use of sensor devices in health studies and facilitate sensor data analysis.

For smooth optimization problems with a Hermitian positive semi-definite fixed-rank constraint, we consider three existing approaches including the simple Burer--Monteiro method, and Riemannian optimization over quotient geometry and the embedded geometry. These three methods can be all represented via quotient geometry with three Riemannian metrics $g^i(\cdot, \cdot)$ $(i=1,2,3)$. By taking the nonlinear conjugate gradient method (CG) as an example, we show that CG in the factor-based Burer--Monteiro approach is equivalent to Riemannian CG on the quotient geometry with the Bures-Wasserstein metric $g^1$. Riemannian CG on the quotient geometry with the metric $g^3$ is equivalent to Riemannian CG on the embedded geometry. For comparing the three approaches, we analyze the condition number of the Riemannian Hessian near the minimizer under the three different metrics. Under certain assumptions, the condition number from the Bures-Wasserstein metric $g^1$ is significantly different from the other two metrics. Numerical experiments show that the Burer--Monteiro CG method has obviously slower asymptotic convergence rate when the minimizer is rank deficient, which is consistent with the condition number analysis.

Estimating the conditional mean function that relates predictive covariates to a response variable of interest is a fundamental task in statistics. In this paper, we propose some general nonparametric regression approaches that are widely applicable under very mild conditions. The method decomposes a function with a Lipschitz continuous $k$-th derivative into a sum of a $(k-1)$-monotone function and a parametric component. We implement well-established shape-restricted estimation procedures (such as isotonic regression) to handle the "nonparametric" components of the true regression function and combine them with a simple sample-splitting procedure to estimate the parametric components. The resulting estimators inherit several favorable properties from the shape-restricted regression estimators. Notably, it is (practically) tuning parameter-free, converges at the minimax rate, and exhibits a locally adaptive rate when the true regression function is "simple". Finally, a series of numerical studies are presented, confirming these theoretical properties.

We consider Bayesian linear regression with sparsity-inducing prior and design efficient sampling algorithms leveraging posterior contraction properties. A quasi-likelihood with Gaussian spike-and-slab (that is favorable both statistically and computationally) is investigated and two algorithms based on Gibbs sampling and Stochastic Localization are analyzed, both under the same (quite natural) statistical assumptions that also enable valid inference on the sparse planted signal. The benefit of the Stochastic Localization sampler is particularly prominent for data matrix that is not well-designed.

We consider error-correction coding schemes for adversarial wiretap channels (AWTCs) in which the channel can a) read a fraction of the codeword bits up to a bound $r$ and b) flip a fraction of the bits up to a bound $p$. The channel can freely choose the locations of the bit reads and bit flips via a process with unbounded computational power. Codes for the AWTC are of broad interest in the area of information security, as they can provide data resiliency in settings where an attacker has limited access to a storage or transmission medium. We investigate a family of non-linear codes known as pseudolinear codes, which were first proposed by Guruswami and Indyk (FOCS 2001) for constructing list-decodable codes independent of the AWTC setting. Unlike general non-linear codes, pseudolinear codes admit efficient encoders and have succinct representations. We focus on unique decoding and show that random pseudolinear codes can achieve rates up to the binary symmetric channel (BSC) capacity $1-H_2(p)$ for any $p,r$ in the less noisy region: $p<1/2$ and $r<1-H_2(p)$ where $H_2(\cdot)$ is the binary entropy function. Thus, pseudolinear codes are the first known optimal-rate binary code family for the less noisy AWTC that admit efficient encoders. The above result can be viewed as a derandomization result of random general codes in the AWTC setting, which in turn opens new avenues for applying derandomization techniques to randomized constructions of AWTC codes. Our proof applies a novel concentration inequality for sums of random variables with limited independence which may be of interest as an analysis tool more generally.

Existing few-shot learning (FSL) methods assume that there exist sufficient training samples from source classes for knowledge transfer to target classes with few training samples. However, this assumption is often invalid, especially when it comes to fine-grained recognition. In this work, we define a new FSL setting termed few-shot fewshot learning (FSFSL), under which both the source and target classes have limited training samples. To overcome the source class data scarcity problem, a natural option is to crawl images from the web with class names as search keywords. However, the crawled images are inevitably corrupted by large amount of noise (irrelevant images) and thus may harm the performance. To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images. Further, with the cleaned web images as well as the original clean training images, we propose a GCN-based FSL method. For both the LDN and FSL tasks, a novel adaptive aggregation GCN (AdarGCN) model is proposed, which differs from existing GCN models in that adaptive aggregation is performed based on a multi-head multi-level aggregation module. With AdarGCN, how much and how far information carried by each graph node is propagated in the graph structure can be determined automatically, therefore alleviating the effects of both noisy and outlying training samples. Extensive experiments show the superior performance of our AdarGCN under both the new FSFSL and the conventional FSL settings.

北京阿比特科技有限公司