亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A core principle in statistical learning is that smoothness of target functions allows to break the curse of dimensionality. However, learning a smooth function seems to require enough samples close to one another to get meaningful estimate of high-order derivatives, which would be hard in machine learning problems where the ratio between number of data and input dimension is relatively small. By deriving new lower bounds on the generalization error, this paper formalizes such an intuition, before investigating the role of constants and transitory regimes which are usually not depicted beyond classical learning theory statements while they play a dominant role in practice.

相關內容

We explore a few common models on how correlations affect information. The main model considered is the Shannon mutual information $I(S:R_1,\cdots, R_i)$ over distributions with marginals $P_{S,R_i}$ fixed for each $i$, with the analogy in which $S$ is the stimulus and $R_i$'s are neurons. We work out basic models in details, using algebro-geometric tools to write down discriminants that separate distributions with distinct qualitative behaviours in the probability simplex into toric chambers and evaluate the volumes of them algebraically. As a byproduct, we provide direct translation between a decomposition of mutual information inspired by a series expansion and one from partial information decomposition (PID) problems, characterising the synergistic terms of the former. We hope this paper serves for communication between communities especially mathematics and theoretical neuroscience on the topic. KEYWORDS: information theory, algebraic statistics, mathematical neuroscience, partial information decomposition

Using nonlinear projections and preserving structure in model order reduction (MOR) are currently active research fields. In this paper, we provide a novel differential geometric framework for model reduction on smooth manifolds, which emphasizes the geometric nature of the objects involved. The crucial ingredient is the construction of an embedding for the low-dimensional submanifold and a compatible reduction map, for which we discuss several options. Our general framework allows capturing and generalizing several existing MOR techniques, such as structure preservation for Lagrangian- or Hamiltonian dynamics, and using nonlinear projections that are, for instance, relevant in transport-dominated problems. The joint abstraction can be used to derive shared theoretical properties for different methods, such as an exact reproduction result. To connect our framework to existing work in the field, we demonstrate that various techniques for data-driven construction of nonlinear projections can be included in our framework.

Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and adversarial robustness, with our proposed Omnipotent Adversarial Training (OAT) strategy. OAT consists of two innovative methodologies to address the imperfection in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance issue, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under complex combinations of data imbalance and label noise scenarios. The code can be found in //github.com/GuanlinLee/OAT.

Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is transformed by a monotonic nonlinearity to emit measurable fitness. Here we demonstrate that minimizing contrastive loss functions, such as the Bradley-Terry loss, is a simple and flexible technique for extracting the sparse latent function implied by global epistasis. We argue by way of a fitness-epistasis uncertainty principle that the nonlinearities in global epistasis models can produce observed fitness functions that do not admit sparse representations, and thus may be inefficient to learn from observations when using a Mean Squared Error (MSE) loss (a common practice). We show that contrastive losses are able to accurately estimate a ranking function from limited data even in regimes where MSE is ineffective. We validate the practical utility of this insight by showing contrastive loss functions result in consistently improved performance on benchmark tasks.

We consider the application of the generalized Convolution Quadrature (gCQ) to approximate the solution of an important class of sectorial problems. The gCQ is a generalization of Lubich's Convolution Quadrature (CQ) that allows for variable steps. The available stability and convergence theory for the gCQ requires non realistic regularity assumptions on the data, which do not hold in many applications of interest, such as the approximation of subdiffusion equations. It is well known that for non smooth enough data the original CQ, with uniform steps, presents an order reduction close to the singularity. We generalize the analysis of the gCQ to data satisfying realistic regularity assumptions and provide sufficient conditions for stability and convergence on arbitrary sequences of time points. We consider the particular case of graded meshes and show how to choose them optimally, according to the behaviour of the data. An important advantage of the gCQ method is that it allows for a fast and memory reduced implementation. We describe how the fast and oblivious gCQ can be implemented and illustrate our theoretical results with several numerical experiments.

Most studies of adaptive algorithm behavior consider performance measures based on mean values such as the mean-square error. The derived models are useful for understanding the algorithm behavior under different environments and can be used for design. Nevertheless, from a practical point of view, the adaptive filter user has only one realization of the algorithm to obtain the desired result. This letter derives a model for the variance of the squared-error sample curve of the least-mean-square (LMS) adaptive algorithm, so that the achievable cancellation level can be predicted based on the properties of the steady-state squared error. The derived results provide the user with useful design guidelines.

Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely on some form of discretization. In order to alleviate such limitations, we present CORAL, a new method that leverages coordinate-based networks for solving PDEs on general geometries. CORAL is designed to remove constraints on the input mesh, making it applicable to any spatial sampling and geometry. Its ability extends to diverse problem domains, including PDE solving, spatio-temporal forecasting, and inverse problems like geometric design. CORAL demonstrates robust performance across multiple resolutions and performs well in both convex and non-convex domains, surpassing or performing on par with state-of-the-art models.

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

In structure learning, the output is generally a structure that is used as supervision information to achieve good performance. Considering the interpretation of deep learning models has raised extended attention these years, it will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks (RNNs) whose inner mechanism is still not clearly understood. We find that Finite State Automaton (FSA) that processes sequential data has more interpretable inner mechanism and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. We first give the graphical illustration of FSA for human beings to follow, which shows the interpretability. From the FSA's point of view, we then analyze how the performance of RNNs are affected by the number of gates, as well as the semantic meaning behind the transition of numerical hidden states. Our results suggest that RNNs with simple gated structure such as Minimal Gated Unit (MGU) is more desirable and the transitions in FSA leading to specific classification result are associated with corresponding words which are understandable by human beings.

北京阿比特科技有限公司