亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards achieving self-certified learning. In this context, learning and certification strategies based on PAC-Bayes bounds are especially attractive due to their ability to leverage all data to learn a posterior and simultaneously certify its risk. In this paper, we assess the progress towards self-certification in probabilistic neural networks learnt by PAC-Bayes inspired objectives. We empirically compare (on 4 classification datasets) classical test set bounds for deterministic predictors and a PAC-Bayes bound for randomised self-certified predictors. We first show that both of these generalisation bounds are not too far from out-of-sample test set errors. We then show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance, while self-certified strategies based on PAC-Bayes bounds do not suffer from this drawback, proving that they might be a suitable choice for the small data regime. We also find that probabilistic neural networks learnt by PAC-Bayes inspired objectives lead to certificates that can be surprisingly competitive with commonly used test set bounds.

相關內容

神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)(Neural Networks)是世界(jie)上三個(ge)(ge)最古老的(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)建模學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)的(de)(de)檔案期刊:國際(ji)神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(INNS)、歐(ou)洲神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(ENNS)和(he)日(ri)本神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)會(hui)(JNNS)。神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)提供了一個(ge)(ge)論壇(tan),以(yi)發(fa)展和(he)培育一個(ge)(ge)國際(ji)社(she)會(hui)的(de)(de)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)者和(he)實踐(jian)者感(gan)興趣的(de)(de)所有(you)方(fang)面的(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)和(he)相關方(fang)法的(de)(de)計(ji)算智能。神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)歡迎高質量論文的(de)(de)提交(jiao),有(you)助(zhu)于全(quan)面的(de)(de)神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)研究,從行為和(he)大腦(nao)建模,學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)習(xi)算法,通(tong)過數(shu)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)和(he)計(ji)算分(fen)析,系(xi)統(tong)的(de)(de)工程(cheng)(cheng)和(he)技術應用,大量使(shi)用神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)的(de)(de)概念和(he)技術。這(zhe)一獨特而廣泛的(de)(de)范(fan)圍(wei)促進(jin)了生(sheng)物(wu)(wu)和(he)技術研究之間的(de)(de)思想交(jiao)流,并有(you)助(zhu)于促進(jin)對生(sheng)物(wu)(wu)啟發(fa)的(de)(de)計(ji)算智能感(gan)興趣的(de)(de)跨學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)科(ke)社(she)區的(de)(de)發(fa)展。因此(ci),神(shen)(shen)(shen)經(jing)(jing)(jing)網絡(luo)(luo)(luo)編委會(hui)代表(biao)(biao)的(de)(de)專(zhuan)家領(ling)域包(bao)括心理學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)生(sheng)物(wu)(wu)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),計(ji)算機科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),工程(cheng)(cheng),數(shu)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),物(wu)(wu)理。該雜志發(fa)表(biao)(biao)文章(zhang)、信件和(he)評(ping)論以(yi)及給編輯的(de)(de)信件、社(she)論、時(shi)事(shi)、軟件調查和(he)專(zhuan)利(li)信息。文章(zhang)發(fa)表(biao)(biao)在五個(ge)(ge)部分(fen)之一:認知科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),神(shen)(shen)(shen)經(jing)(jing)(jing)科(ke)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue),學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)習(xi)系(xi)統(tong),數(shu)學(xue)(xue)(xue)(xue)(xue)(xue)(xue)(xue)和(he)計(ji)算分(fen)析、工程(cheng)(cheng)和(he)應用。 官網地址:

Data are often accommodated on centralized storage servers. This is the case, for instance, in remote sensing and astronomy, where projects produce several petabytes of data every year. While machine learning models are often trained on relatively small subsets of the data, the inference phase typically requires transferring significant amounts of data between the servers and the clients. In many cases, the bandwidth available per user is limited, which then renders the data transfer to be one of the major bottlenecks. In this work, we propose a framework that automatically selects the relevant parts of the input data for a given neural network. The model as well as the associated selection masks are trained simultaneously such that a good model performance is achieved while only a minimal amount of data is selected. During the inference phase, only those parts of the data have to be transferred between the server and the client. We propose both instance-independent and instance-dependent selection masks. The former ones are the same for all instances to be transferred, whereas the latter ones allow for variable transfer sizes per instance. Our experiments show that it is often possible to significantly reduce the amount of data needed to be transferred without affecting the model quality much.

Molecular Machine Learning (ML) bears promise for efficient molecule property prediction and drug discovery. However, labeled molecule data can be expensive and time-consuming to acquire. Due to the limited labeled data, it is a great challenge for supervised-learning ML models to generalize to the giant chemical space. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework that leverages large unlabeled data (~10M unique molecules). In MolCLR pre-training, we build molecule graphs and develop GNN encoders to learn differentiable representations. Three molecule graph augmentations are proposed: atom masking, bond deletion, and subgraph removal. A contrastive estimator maximizes the agreement of augmentations from the same molecule while minimizing the agreement of different molecules. Experiments show that our contrastive learning framework significantly improves the performance of GNNs on various molecular property benchmarks including both classification and regression tasks. Benefiting from pre-training on the large unlabeled database, MolCLR even achieves state-of-the-art on several challenging benchmarks after fine-tuning. Additionally, further investigations demonstrate that MolCLR learns to embed molecules into representations that can distinguish chemically reasonable molecular similarities.

Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

The deep neural network suffers from many fundamental issues in machine learning. For example, it often gets trapped into a local minimum in training, and its prediction uncertainty is hard to be assessed. To address these issues, we propose the so-called kernel-expanded stochastic neural network (K-StoNet) model, which incorporates support vector regression (SVR) as the first hidden layer and reformulates the neural network as a latent variable model. The former maps the input vector into an infinite dimensional feature space via a radial basis function (RBF) kernel, ensuring absence of local minima on its training loss surface. The latter breaks the high-dimensional nonconvex neural network training problem into a series of low-dimensional convex optimization problems, and enables its prediction uncertainty easily assessed. The K-StoNet can be easily trained using the imputation-regularized optimization (IRO) algorithm. Compared to traditional deep neural networks, K-StoNet possesses a theoretical guarantee to asymptotically converge to the global optimum and enables the prediction uncertainty easily assessed. The performances of the new model in training, prediction and uncertainty quantification are illustrated by simulated and real data examples.

The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets), are invertible. Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block. We show that Momentum ResNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of Momentum ResNets. Our analysis reveals that Momentum ResNets can learn any linear mapping up to a multiplicative factor, while ResNets cannot. In a learning to optimize setting, where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail. We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic ``expansion'' assumption, which states that a low-probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.

Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model generalisation ability. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.

Learning powerful data embeddings has become a center piece in machine learning, especially in natural language processing and computer vision domains. The crux of these embeddings is that they are pretrained on huge corpus of data in a unsupervised fashion, sometimes aided with transfer learning. However currently in the graph learning domain, embeddings learned through existing graph neural networks (GNNs) are task dependent and thus cannot be shared across different datasets. In this paper, we present a first powerful and theoretically guaranteed graph neural network that is designed to learn task-independent graph embeddings, thereafter referred to as deep universal graph embedding (DUGNN). Our DUGNN model incorporates a novel graph neural network (as a universal graph encoder) and leverages rich Graph Kernels (as a multi-task graph decoder) for both unsupervised learning and (task-specific) adaptive supervised learning. By learning task-independent graph embeddings across diverse datasets, DUGNN also reaps the benefits of transfer learning. Through extensive experiments and ablation studies, we show that the proposed DUGNN model consistently outperforms both the existing state-of-art GNN models and Graph Kernels by an increased accuracy of 3% - 8% on graph classification benchmark datasets.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10--15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5--10 in the encoder, and a factor of 10--15 in the decoder.

北京阿比特科技有限公司