亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper proposes a theoretical framework on the mechanism of autoencoders. To the encoder part, under the main use of dimensionality reduction, we investigate its two fundamental properties: bijective maps and data disentangling. The general construction methods of an encoder that satisfies either or both of the above two properties are given. The generalization mechanism of autoencoders is modeled. Based on the theoretical framework above, we explain some experimental results of variational autoencoders, denoising autoencoders, and linear-unit autoencoders, with emphasis on the interpretation of the lower-dimensional representation of data via encoders; and the mechanism of image restoration through autoencoders is natural to be understood by those explanations. Compared to PCA and decision trees, the advantages of (generalized) autoencoders on dimensionality reduction and classification are demonstrated, respectively. Convolutional neural networks and randomly weighted neural networks are also interpreted by this framework.

相關內容

自動編碼器是一種人工神經網絡,用于以無監督的方式學習有效的數據編碼。自動編碼器的目的是通過訓練網絡忽略信號“噪聲”來學習一組數據的表示(編碼),通常用于降維。與簡化方面一起,學習了重構方面,在此,自動編碼器嘗試從簡化編碼中生成盡可能接近其原始輸入的表示形式,從而得到其名稱。基本模型存在幾種變體,其目的是迫使學習的輸入表示形式具有有用的屬性。自動編碼器可有效地解決許多應用問題,從面部識別到獲取單詞的語義。

Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset characteristics. However a single metric could be used to correctly perform clustering on multiple datasets of different domains. We propose to do so, providing a framework for learning a transferable metric. We show that we can learn a metric on a labelled dataset, then apply it to cluster a different dataset, using an embedding space that characterises a desired clustering in the generic sense. We learn and test such metrics on several datasets of variable complexity (synthetic, MNIST, SVHN, omniglot) and achieve results competitive with the state-of-the-art while using only a small number of labelled training datasets and shallow networks.

We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's widths increase, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the \textit{mean field} limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a \textit{neuronal embedding}, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths. Using our framework, we prove several properties of large-width multilayer neural networks. Firstly we show that independent and identically distributed initializations cause strong degeneracy effects on the network's learning trajectory when the network's depth is at least four. Secondly we obtain several global convergence guarantees for feedforward multilayer networks under a number of different setups. These include two-layer and three-layer networks with independent and identically distributed initializations, and multilayer networks of arbitrary depths with a special type of correlated initializations that is motivated by the new concept of \textit{bidirectional diversity}. Unlike previous works that rely on convexity, our results admit non-convex losses and hinge on a certain universal approximation property, which is a distinctive feature of infinite-width neural networks and is shown to hold throughout the training process. Aside from being the first known results for global convergence of multilayer networks in the mean field regime, they demonstrate flexibility of our framework and incorporate several new ideas and insights that depart from the conventional convex optimization wisdom.

Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally intensive when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.

Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots. These techniques are more and more used to explain the marginal effect of hyperparameters on the black-box cost function or to quantify the importance of hyperparameters. However, if such methods are naively applied to the experimental data of the HPO process in a post-hoc manner, the underlying sampling bias of the optimizer can distort interpretations. We propose a modified HPO method which efficiently balances the search for the global optimum w.r.t. predictive performance \emph{and} the reliable estimation of IML explanations of an underlying black-box function by coupling Bayesian optimization and Bayesian Algorithm Execution. On benchmark cases of both synthetic objectives and HPO of a neural network, we demonstrate that our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.

Federated Learning (FL) is a distributed machine learning paradigm where clients collaboratively train a model using their local (human-generated) datasets. While existing studies focus on FL algorithm development to tackle data heterogeneity across clients, the important issue of data quality (e.g., label noise) in FL is overlooked. This paper aims to fill this gap by providing a quantitative study on the impact of label noise on FL. We derive an upper bound for the generalization error that is linear in the clients' label noise level. Then we conduct experiments on MNIST and CIFAR-10 datasets using various FL algorithms. Our empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with our theoretical analysis. We further find that label noise slows down the convergence of FL training, and the global model tends to overfit when the noise level is high.

Deep neural networks have achieved tremendous success due to their representation power and adaptation to low-dimensional structures. Their potential for estimating structured regression functions has been recently established in the literature. However, most of the studies require the input dimension to be fixed and consequently ignore the effect of dimension on the rate of convergence and hamper their applications to modern big data with high dimensionality. In this paper, we bridge this gap by analyzing a $k^{th}$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d \gtrsim n$). In the latter case, sparsity assumptions and associated regularization are required in order to obtain optimal rates of convergence. A new challenge in diverging dimension setting is in calculation mean-square error, the covariance terms among estimated additive components are an order of magnitude larger than those of the variances and they can deteriorate statistical properties without proper care. We introduce a critical debiasing technique to amend the problem. We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$. Our proof techniques rely crucially on a novel debiasing technique that makes the covariances of additive components negligible in the mean-square error calculation. In addition, we establish the matching lower bounds.

A powerful feature in mechanism design is the ability to irrevocably commit to the rules of a mechanism. Commitment is achieved by public declaration, which enables players to verify incentive properties in advance and the outcome in retrospect. However, public declaration can reveal superfluous information that the mechanism designer might prefer not to disclose, such as her target function or private costs. Avoiding this may be possible via a trusted mediator; however, the availability of a trusted mediator, especially if mechanism secrecy must be maintained for years, might be unrealistic. We propose a new approach to commitment, and show how to commit to, and run, any given mechanism without disclosing it, while enabling the verification of incentive properties and the outcome -- all without the need for any mediators. Our framework is based on zero-knowledge proofs -- a cornerstone of modern cryptographic theory. Applications include non-mediated bargaining with hidden yet binding offers.

We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results. We start by reviewing theoretical bounds on the performance of approximate dynamic programming algorithms. We then introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation, both online and offline. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. According to our results: (i) high entropy data distributions are well-suited for learning in an offline manner; and (ii) a certain degree of data diversity (data coverage) and data quality (closeness to optimal policy) are jointly desirable for offline learning.

Variational Autoencoders (VAEs) were originally motivated (Kingma & Welling, 2014) as probabilistic generative models in which one performs approximate Bayesian inference. The proposal of $\beta$-VAEs (Higgins et al., 2017) breaks this interpretation and generalizes VAEs to application domains beyond generative modeling (e.g., representation learning, clustering, or lossy data compression) by introducing an objective function that allows practitioners to trade off between the information content ("bit rate") of the latent representation and the distortion of reconstructed data (Alemi et al., 2018). In this paper, we reconsider this rate/distortion trade-off in the context of hierarchical VAEs, i.e., VAEs with more than one layer of latent variables. We identify a general class of inference models for which one can split the rate into contributions from each layer, which can then be tuned independently. We derive theoretical bounds on the performance of downstream tasks as functions of the individual layers' rates and verify our theoretical findings in large-scale experiments. Our results provide guidance for practitioners on which region in rate-space to target for a given application.

Consider a mechanism that cannot observe how many players there are directly, but instead must rely on their self-reports to know how many are participating. Suppose the players can create new identities to report to the auctioneer at some cost $c$. The usual mechanism design paradigm is equivalent to implicitly assuming that $c$ is infinity for all players, while the usual Sybil attacks literature is that it is zero or finite for one player (the attacker) and infinity for everyone else (the 'honest' players). The false-name proof literature largely assumes the cost to be 0. We consider a model with variable costs that unifies these disparate streams. A paradigmatic normal form game can be extended into a Sybil game by having the action space by the product of the feasible set of identities to create action where each player chooses how many players to present as in the game and their actions in the original normal form game. A mechanism is (dominant) false-name proof if it is (dominant) incentive-compatible for all the players to self-report as at most one identity. We study mechanisms proposed in the literature motivated by settings where anonymity and self-identification are the norms, and show conditions under which they are not Sybil-proof. We characterize a class of dominant Sybil-proof mechanisms for reward sharing and show that they achieve the efficiency upper bound. We consider the extension when agents can credibly commit to the strategy of their sybils and show how this can break mechanisms that would otherwise be false-name proof.

北京阿比特科技有限公司