亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.

相關內容

We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited for capturing complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Our rotationally invariant AMP has complexity of the same order as the existing AMP derived under the restrictive assumption of a Gaussian design; our algorithm also recovers this existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.

Graph-based data structures have drawn great attention in recent years. The large and rapidly growing trend on developing graph processing systems focuses mostly on improving the performance by preprocessing the input graph and modifying its layout. These systems usually take several hours to days to complete processing a single graph on high-end machines, let alone the overhead of pre-processing which most of the time can be dominant. Yet for most graph applications the exact answer is not always crucial, and providing a rough estimate of the final result is adequate. Approximate computing is introduced to trade off accuracy of results for computation or energy savings that could not be achieved by conventional techniques alone. In this work, we design, implement and evaluate GraphGuess, inspired from the domain of approximate graph theory and extend it to a general, practical graph processing system. GraphGuess is essentially an approximate graph processing technique with adaptive correction, which can be implemented on top of any graph processing system. We build a vertex-centric processing system based on GraphGuess, where it allows the user to trade off accuracy for better performance. Our experimental studies show that using GraphGuess can significantly reduce the processing time for large scale graphs while maintaining high accuracy.

Information about action costs is critical for real-world AI planning applications. Rather than rely solely on declarative action models, recent approaches also use black-box external action cost estimators, often learned from data, that are applied during the planning phase. These, however, can be computationally expensive, and produce uncertain values. In this paper we suggest a generalization of deterministic planning with action costs that allows selecting between multiple estimators for action cost, to balance computation time against bounded estimation uncertainty. This enables a much richer -- and correspondingly more realistic -- problem representation. Importantly, it allows planners to bound plan accuracy, thereby increasing reliability, while reducing unnecessary computational burden, which is critical for scaling to large problems. We introduce a search algorithm, generalizing $A^*$, that solves such planning problems, and additional algorithmic extensions. In addition to theoretical guarantees, extensive experiments show considerable savings in runtime compared to alternatives.

In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.

Due to the widespread use of complex machine learning models in real-world applications, it is becoming critical to explain model predictions. However, these models are typically black-box deep neural networks, explained post-hoc via methods with known faithfulness limitations. Generalized Additive Models (GAMs) are an inherently interpretable class of models that address this limitation by learning a non-linear shape function for each feature separately, followed by a linear model on top. However, these models are typically difficult to train, require numerous parameters, and are difficult to scale. We propose an entirely new subfamily of GAMs that utilizes basis decomposition of shape functions. A small number of basis functions are shared among all features, and are learned jointly for a given task, thus making our model scale much better to large-scale data with high-dimensional features, especially when features are sparse. We propose an architecture denoted as the Neural Basis Model (NBM) which uses a single neural network to learn these bases. On a variety of tabular and image datasets, we demonstrate that for interpretable machine learning, NBMs are the state-of-the-art in accuracy, model size, and, throughput and can easily model all higher-order feature interactions. Source code is available at //github.com/facebookresearch/nbm-spam.

Based on the manifold hypothesis, real-world data often lie on a low-dimensional manifold, while normalizing flows as a likelihood-based generative model are incapable of finding this manifold due to their structural constraints. So, one interesting question arises: $\textit{"Can we find sub-manifold(s) of data in normalizing flows and estimate the density of the data on the sub-manifold(s)?"}$. In this paper, we introduce two approaches, namely per-pixel penalized log-likelihood and hierarchical training, to answer the mentioned question. We propose a single-step method for joint manifold learning and density estimation by disentangling the transformed space obtained by normalizing flows to manifold and off-manifold parts. This is done by a per-pixel penalized likelihood function for learning a sub-manifold of the data. Normalizing flows assume the transformed data is Gaussianizationed, but this imposed assumption is not necessarily true, especially in high dimensions. To tackle this problem, a hierarchical training approach is employed to improve the density estimation on the sub-manifold. The results validate the superiority of the proposed methods in simultaneous manifold learning and density estimation using normalizing flows in terms of generated image quality and likelihood.

Most obstacle avoidance algorithms are only effective in specific environments, and they have low adaptability to some new environments. In this paper, we propose a trajectory learning (TL)-based obstacle avoidance algorithm, which can learn implicit obstacle avoidance mechanism from trajectories generated by general obstacle avoidance algorithms and achieves better adaptability. Specifically, we define a general data structure to describe the obstacle avoidance mechanism. Based on this structure, we transform the learning of the obstacle avoidance algorithm into a multiclass classification problem about the direction selection. Then, we design an artificial neural network (ANN) to fit multiclass classification function through supervised learning and finally obtain the obstacle avoidance mechanism that generates the observed trajectories. Our algorithm can obtain the obstacle avoidance mechanism similar to that demonstrated in the trajectories, and are adaptable to unseen environments. The automatic learning mechanism simplifies modification and debugging of obstacle avoidance algorithms in applications. Simulation results demonstrate that the proposed algorithm can learn obstacle avoidance strategy from trajectories and achieve better adaptability.

The crude Monte Carlo approximates the integral $$S(f)=\int_a^b f(x)\,\mathrm dx$$ with expected error (deviation) $\sigma(f)N^{-1/2},$ where $\sigma(f)^2$ is the variance of $f$ and $N$ is the number of random samples. If $f\in C^r$ then special variance reduction techniques can lower this error to the level $N^{-(r+1/2)}.$ In this paper, we consider methods of the form $$\overline M_{N,r}(f)=S(L_{m,r}f)+M_n(f-L_{m,r}f),$$ where $L_{m,r}$ is the piecewise polynomial interpolation of $f$ of degree $r-1$ using a partition of the interval $[a,b]$ into $m$ subintervals, $M_n$ is a Monte Carlo approximation using $n$ samples of $f,$ and $N$ is the total number of function evaluations used. We derive asymptotic error formulas for the methods $\overline M_{N,r}$ that use nonadaptive as well as adaptive partitions. Although the convergence rate $N^{-(r+1/2)}$ cannot be beaten, the asymptotic constants make a huge difference. For example, for $\int_0^1(x+d)^{-1}\mathrm dx$ and $r=4$ the best adaptive methods overcome the nonadaptive ones roughly $10^{12}$ times if $d=10^{-4},$ and $10^{29}$ times if $d=10^{-8}.$ In addition, the proposed adaptive methods are easily implementable and can be well used for automatic integration. We believe that the obtained results can be generalized to multivariate integration.

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses---such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)---are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.

北京阿比特科技有限公司