亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Online learning holds the promise of enabling efficient long-term credit assignment in recurrent neural networks. However, current algorithms fall short of offline backpropagation by either not being scalable or failing to learn long-range dependencies. Here we present a high-performance online learning algorithm that merely doubles the memory and computational requirements of a single inference pass. We achieve this by leveraging independent recurrent modules in multi-layer networks, an architectural motif that has recently been shown to be particularly powerful. Experiments on synthetic memory problems and on the challenging long-range arena benchmark suite reveal that our algorithm performs competitively, establishing a new standard for what can be achieved through online learning. This ability to learn long-range dependencies offers a new perspective on learning in the brain and opens a promising avenue in neuromorphic computing.

相關內容

This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models. Using the Western Mediterranean Wetland Birds dataset, we investigated the use of AutoKeras, an automated machine learning framework, to automate neural architecture search and hyperparameter tuning. Comparative analysis validates our hypothesis that the AutoKeras-derived model consistently outperforms traditional models like MobileNet, ResNet50 and VGG16. Our approach and findings underscore the transformative potential of automated deep learning for advancing bioacoustics research and models. In fact, the automated techniques eliminate the need for manual feature engineering and model design while improving performance. This study illuminates best practices in sampling, evaluation and reporting to enhance reproducibility in this nascent field. All the code used is available at https: //github.com/giuliotosato/AutoKeras-bioacustic Keywords: AutoKeras; automated deep learning; audio classification; Wetlands Bird dataset; comparative analysis; bioacoustics; validation dataset; multi-class classification; spectrograms.

Quantum Extreme Learning Machines (QELMs) have emerged as a promising framework for quantum machine learning. Their appeal lies in the rich feature map induced by the dynamics of a quantum substrate - the quantum reservoir - and the efficient post-measurement training via linear regression. Here we study the expressivity of QELMs by decomposing the prediction of QELMs into a Fourier series. We show that the achievable Fourier frequencies are determined by the data encoding scheme, while Fourier coefficients depend on both the reservoir and the measurement. Notably, the expressivity of QELMs is fundamentally limited by the number of Fourier frequencies and the number of observables, while the complexity of the prediction hinges on the reservoir. As a cautionary note on scalability, we identify four sources that can lead to the exponential concentration of the observables as the system size grows (randomness, hardware noise, entanglement, and global measurements) and show how this can turn QELMs into useless input-agnostic oracles. Our analysis elucidates the potential and fundamental limitations of QELMs, and lays the groundwork for systematically exploring quantum reservoir systems for other machine learning tasks.

Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. But how efficient are neural networks at extracting features from higher-order cumulants? We study this question in the spiked cumulant model, where the statistician needs to recover a privileged direction or "spike" from the order-$p\ge 4$ cumulants of~$d$-dimensional inputs. We first characterise the fundamental statistical and computational limits of recovering the spike by analysing the number of samples~$n$ required to strongly distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. We find that statistical distinguishability requires $n\gtrsim d$ samples, while distinguishing the two distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide class of algorithms, i.e. those covered by the low-degree conjecture. These results suggest the existence of a wide statistical-to-computational gap in this problem. Numerical experiments show that neural networks learn to distinguish the two distributions with quadratic sample complexity, while "lazy" methods like random features are not better than random guessing in this regime. Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.

Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while keeping risks and cautionary points in clear view.

Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term 'dependence' and its declination to be used vaguely and indiscriminately for qualifying a variety of disparate notions, leading to numerous incongruities. For example, the classical Pearson's, Spearman's or Kendall's correlations are widely regarded as 'dependence measures' of major interest, in spite of returning 0 in some cases of deterministic relationships between the variables at play, evidently not measuring dependence at all. Arguing that research on such a fundamental topic would benefit from a slightly more rigid framework, this paper suggests a general definition of the dependence between two random variables defined on the same probability space. Natural enough for aligning with intuition, that definition is still sufficiently precise for allowing unequivocal identification of a 'universal' representation of the dependence structure of any bivariate distribution. Links between this representation and familiar concepts are highlighted, and ultimately, the idea of a dependence measure based on that universal representation is explored and shown to satisfy Renyi's postulates.

This work studies nonparametric Bayesian estimation of the intensity function of an inhomogeneous Poisson point process in the important case where the intensity depends on covariates, based on the observation of a single realisation of the point pattern over a large area. It is shown how the presence of covariates allows to borrow information from far away locations in the observation window, enabling consistent inference in the growing domain asymptotics. In particular, optimal posterior contraction rates under both global and point-wise loss functions are derived. The rates in global loss are obtained under conditions on the prior distribution resembling those in the well established theory of Bayesian nonparametrics, here combined with concentration inequalities for functionals of stationary processes to control certain random covariate-dependent loss functions appearing in the analysis. The local rates are derived with an ad-hoc study that builds on recent advances in the theory of P\'olya tree priors, extended to the present multivariate setting with a novel construction that makes use of the random geometry induced by the covariates.

Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.

Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

北京阿比特科技有限公司