亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon across both synthetic and real datasets, offering a novel viewpoint for elucidating the grokking phenomenon by characterizing it through the lens of frequency dynamics during the training process. Our empirical frequency-based analysis sheds new light on understanding the grokking phenomenon and its underlying mechanisms.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial auditory attention can be reflected in the topological distribution of EEG energy across different frequency bands. This insight motivates us to propose Prototype Training, a neuroscience-inspired method for Sp-AAD. This method constructs prototypes with enhanced energy distribution representations and reduced trial-specific characteristics, enabling the model to better capture auditory attention features. To implement prototype training, an EEGWaveNet that employs the wavelet transform of EEG is further proposed. Detailed experiments indicate that the EEGWaveNet with prototype training outperforms other competitive models on various datasets, and the effectiveness of the proposed method is also validated. As a training method independent of model architecture, prototype training offers new insights into the field of Sp-AAD.

We study the semistability of quiver representations from an algorithmic perspective. We present efficient algorithms for several fundamental computational problems on the semistability of quiver representations: deciding semistability and $\sigma$-semistability, finding maximizers of King's criterion, and finding the Harder-Narasimhan filtration. We also investigate a class of polyhedral cones defined by the linear system in King's criterion, which we call King cones. We demonstrate that the King cones for rank-one representations can be encoded by submodular flow polytopes, allowing us to decide the $\sigma$-semistability of rank-one representations in strongly polynomial time. Our argument employs submodularity in quiver representations, which may be of independent interest.

In this paper, we introduce the finite difference weighted essentially non-oscillatory (WENO) scheme based on the neural network for hyperbolic conservation laws. We employ the supervised learning and design two loss functions, one with the mean squared error and the other with the mean squared logarithmic error, where the WENO3-JS weights are computed as the labels. Each loss function consists of two components where the first component compares the difference between the weights from the neural network and WENO3-JS weights, while the second component matches the output weights of the neural network and the linear weights. The former of the loss function enforces the neural network to follow the WENO properties, implying that there is no need for the post-processing layer. Additionally the latter leads to better performance around discontinuities. As a neural network structure, we choose the shallow neural network (SNN) for computational efficiency with the Delta layer consisting of the normalized undivided differences. These constructed WENO3-SNN schemes show the outperformed results in one-dimensional examples and improved behavior in two-dimensional examples, compared with the simulations from WENO3-JS and WENO3-Z.

We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance and how one designs a neural network with diverse approximating tasks.

Linear causal disentanglement is a recent method in causal representation learning to describe a collection of observed variables via latent variables with causal dependencies between them. It can be viewed as a generalization of both independent component analysis and linear structural equation models. We study the identifiability of linear causal disentanglement, assuming access to data under multiple contexts, each given by an intervention on a latent variable. We show that one perfect intervention on each latent variable is sufficient and in the worst case necessary to recover parameters under perfect interventions, generalizing previous work to allow more latent than observed variables. We give a constructive proof that computes parameters via a coupled tensor decomposition. For soft interventions, we find the equivalence class of latent graphs and parameters that are consistent with observed data, via the study of a system of polynomial equations. Our results hold assuming the existence of non-zero higher-order cumulants, which implies non-Gaussianity of variables.

We introduce the sequence classification problem CIViC Evidence to the field of medical NLP. CIViC Evidence denotes the multi-label classification problem of assigning labels of clinical evidence to abstracts of scientific papers which have examined various combinations of genomic variants, cancer types, and treatment approaches. We approach CIViC Evidence using different language models: We fine-tune pretrained checkpoints of BERT and RoBERTa on the CIViC Evidence dataset and challenge their performance with models of the same architecture which have been pretrained on domain-specific text. In this context, we find that BiomedBERT and BioLinkBERT can outperform BERT on CIViC Evidence (+0.8% and +0.9% absolute improvement in class-support weighted F1 score). All transformer-based models show a clear performance edge when compared to a logistic regression trained on bigram tf-idf scores (+1.5 - 2.7% improved F1 score). We compare the aforementioned BERT-like models to OpenAI's GPT-4 in a few-shot setting (on a small subset of our original test dataset), demonstrating that, without additional prompt-engineering or fine-tuning, GPT-4 performs worse on CIViC Evidence than our six fine-tuned models (66.1% weighted F1 score compared to 71.8% for the best fine-tuned model). However, performance gets reasonably close to the benchmark of a logistic regression model trained on bigram tf-idf scores (67.7% weighted F1 score).

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.

Implicit visual knowledge in a large latent diffusion model (LLDM) pre-trained on natural images is rich and hypothetically universal to natural and medical images. To test this hypothesis from a practical perspective, we propose a novel framework for undersampled MRI Reconstruction by Prompting a large latent Diffusion model (MRPD). While the existing methods trained on MRI datasets are typically of limited generalizability toward diverse data acquisition scenarios, MRPD supports unsupervised and universally adaptive MRI reconstruction. For unsupervised reconstruction, MRSampler guides LLDM with a random-phase-modulated hard-to-soft control. With any single- or multiple-source MRI dataset, MRPD's performance is boosted universally by a lightweight MRAdapter that only finetunes the LLDM's autoencoder. Experiments on FastMRI and IXI show that MRPD is the only model that supports both MRI database-free and database-available scenarios and attains the best generalizability towards out-of-domain (OOD) samplings, contrasts, and organs among compared unsupervised, supervised, and MRI diffusion methods. To our knowledge, MRPD is the first method that empirically shows the universal prowess of an LLDM pre-trained on vast natural images for MRI. Our official implementation is at //github.com/Z7Gao/MRPD.

We present a geometric framework for the processing of SPD-valued data that preserves subspace structures and is based on the efficient computation of extreme generalized eigenvalues. This is achieved through the use of the Thompson geometry of the semidefinite cone. We explore a particular geodesic space structure in detail and establish several properties associated with it. Finally, we review a novel inductive mean of SPD matrices based on this geometry.

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

北京阿比特科技有限公司