国产精品亚洲综合久久_国精产品W灬源码网站1688_日韩久久一区二区波多野结衣_91精品国产91久久久久久黑人_国产色噜噜噜在线精品_日韩一区二区无码专区_99久久精品午夜一区二区无码

We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

相關內容

Neural Networks

關注 1648

神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)（Neural Networks）是世(shi)界上三個(ge)最古老的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)建模學(xue)會(hui)的(de)(de)(de)檔案期刊(kan):國際神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(INNS)、歐洲神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(ENNS)和(he)(he)日(ri)本神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(JNNS)。神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)提供了(le)一(yi)個(ge)論壇，以(yi)發展(zhan)和(he)(he)培(pei)育(yu)一(yi)個(ge)國際社(she)會(hui)的(de)(de)(de)學(xue)者(zhe)和(he)(he)實踐者(zhe)感(gan)(gan)興趣的(de)(de)(de)所有(you)方(fang)面的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)和(he)(he)相(xiang)關方(fang)法的(de)(de)(de)計(ji)算(suan)(suan)智能(neng)。神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)歡迎高(gao)質量(liang)(liang)論文(wen)的(de)(de)(de)提交，有(you)助于(yu)全(quan)面的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)研(yan)究，從行為和(he)(he)大腦建模，學(xue)習算(suan)(suan)法，通過數(shu)學(xue)和(he)(he)計(ji)算(suan)(suan)分(fen)(fen)析，系統的(de)(de)(de)工程和(he)(he)技(ji)術(shu)應用，大量(liang)(liang)使用神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)的(de)(de)(de)概(gai)念和(he)(he)技(ji)術(shu)。這一(yi)獨(du)特而(er)廣(guang)泛的(de)(de)(de)范(fan)圍促進了(le)生物(wu)和(he)(he)技(ji)術(shu)研(yan)究之間的(de)(de)(de)思想交流，并(bing)有(you)助于(yu)促進對生物(wu)啟(qi)發的(de)(de)(de)計(ji)算(suan)(suan)智能(neng)感(gan)(gan)興趣的(de)(de)(de)跨學(xue)科(ke)(ke)社(she)區的(de)(de)(de)發展(zhan)。因此，神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)編委會(hui)代表(biao)的(de)(de)(de)專(zhuan)(zhuan)家(jia)領域(yu)包括心理(li)學(xue)，神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)生物(wu)學(xue)，計(ji)算(suan)(suan)機科(ke)(ke)學(xue)，工程，數(shu)學(xue)，物(wu)理(li)。該雜志發表(biao)文(wen)章、信(xin)(xin)件(jian)和(he)(he)評論以(yi)及給編輯(ji)的(de)(de)(de)信(xin)(xin)件(jian)、社(she)論、時(shi)事(shi)、軟件(jian)調查和(he)(he)專(zhuan)(zhuan)利信(xin)(xin)息。文(wen)章發表(biao)在五(wu)個(ge)部分(fen)(fen)之一(yi):認知科(ke)(ke)學(xue)，神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)科(ke)(ke)學(xue)，學(xue)習系統，數(shu)學(xue)和(he)(he)計(ji)算(suan)(suan)分(fen)(fen)析、工程和(he)(he)應用。官網(wang)(wang)(wang)地址：

MoDELS · 狀態空間 · Better · Networking · Analysis ·

2024 年 4 月 15 日

State Space Model for New-Generation Network Alternative to Transformers: A Survey

Xiao Wang,Shiao Wang,Yuhe Ding,Yuehang Li,Wentao Wu,Yao Rong,Weizhe Kong,Ju Huang,Shihao Li,Haoxiang Yang,Ziwen Wang,Bo Jiang,Chenglong Li,Yaowei Wang,Yonghong Tian,Jin Tang

from arxiv, The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: //github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.

MoDELS · Performer · HTTPS · 監督 · 組合性 ·

2023 年 12 月 4 日

Data Management For Large Language Models: A Survey

Zige Wang,Wanjun Zhong,Yufei Wang,Qi Zhu,Fei Mi,Baojun Wang,Lifeng Shang,Xin Jiang,Qun Liu

from arxiv, Work in progress

Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at //github.com/ZigeW/data_management_LLM.

圖 · Neural Networks · MoDELS · Networks · Learning ·

2023 年 8 月 16 日

The Expressive Power of Graph Neural Networks: A Survey

Bingxu Zhang,Changjun Fan,Shixuan Liu,Kuihua Huang,Xiang Zhao,Jincai Huang,Zhong Liu

Graph neural networks (GNNs) are effective machine learning models for many graph-related applications. Despite their empirical success, many research efforts focus on the theoretical limitations of GNNs, i.e., the GNNs expressive power. Early works in this domain mainly focus on studying the graph isomorphism recognition ability of GNNs, and recent works try to leverage the properties such as subgraph counting and connectivity learning to characterize the expressive power of GNNs, which are more practical and closer to real-world. However, no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a first survey for models for enhancing expressive power under different forms of definition. Concretely, the models are reviewed based on three categories, i.e., Graph feature enhancement, Graph topology enhancement, and GNNs architecture enhancement.

圖 · Processing（編程語言） · NLP · Neural Networks · 圖形處理器 ·

2021 年 6 月 10 日

Graph Neural Networks for Natural Language Processing: A Survey

Lingfei Wu,Yu Chen,Kai Shen,Xiaojie Guo,Hanning Gao,Shucheng Li,Jian Pei,Bo Long

from arxiv, 127 pages

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.

Performer · Extensibility · 學成 · 有偏 · 近似 ·

2021 年 4 月 30 日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Youjiang Xu,Linchao Zhu,Lu Jiang,Yi Yang

from arxiv, Accepted to CVPR 2021

It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.

圖 · 結構化學習 · 穩健性 · 學成 · GNN ·

2021 年 3 月 4 日

Deep Graph Structure Learning for Robust Representations: A Survey

Yanqiao Zhu,Weizhi Xu,Jinghao Zhang,Qiang Liu,Shu Wu,Liang Wang

from arxiv, 8 pages, in submission to IJCAI 2021 (Survey Track)

Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.

GNN · 圖 · Neural Networks · 深度模型 · Networks ·

2020 年 12 月 31 日

Explainability in Graph Neural Networks: A Taxonomic Survey

Hao Yuan,Haiyang Yu,Shurui Gui,Shuiwang Ji

Deep learning methods are achieving ever-increasing performance on many artificial intelligence tasks. A major limitation of deep models is that they are not amenable to interpretability. This limitation can be circumvented by developing post hoc techniques to explain the predictions, giving rise to the area of explainability. Recently, explainability of deep models on images and texts has achieved significant progress. In the area of graph data, graph neural networks (GNNs) and their explainability are experiencing rapid developments. However, there is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations. In this survey, we provide a unified and taxonomic view of current GNN explainability methods. Our unified and taxonomic treatments of this subject shed lights on the commonalities and differences of existing methods and set the stage for further methodological developments. To facilitate evaluations, we generate a set of benchmark graph datasets specifically for GNN explainability. We summarize current datasets and metrics for evaluating GNN explainability. Altogether, this work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.

可理解性 · 多峰值 · MoDELS · Extensibility · Performer ·

2020 年 2 月 15 日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Huaishao Luo,Lei Ji,Botian Shi,Haoyang Huang,Nan Duan,Tianrui Li,Xilin Chen,Ming Zhou

We propose UniViLM: a Unified Video and Language pre-training Model for multimodal understanding and generation. Motivated by the recent success of BERT based pre-training technique for NLP and image-language tasks, VideoBERT and CBT are proposed to exploit BERT model for video and language pre-training using narrated instructional videos. Different from their works which only pre-train understanding task, we propose a unified video-language pre-training model for both understanding and generation tasks. Our model comprises of 4 components including two single-modal encoders, a cross encoder and a decoder with the Transformer backbone. We first pre-train our model to learn the universal representation for both video and language on a large instructional video dataset. Then we fine-tune the model on two multimodal tasks including understanding task (text-based video retrieval) and generation task (multimodal video captioning). Our extensive experiments show that our method can improve the performance of both understanding and generation tasks and achieves the state-of-the art results.

圖 · Extensibility · 知識圖譜 · 學成 · 零試學習 ·

2019 年 3 月 27 日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Michael Kampffmeyer,Yinbo Chen,Xiaodan Liang,Hao Wang,Yujia Zhang,Eric P. Xing

from arxiv, The first two authors contributed equally. Code at //github.com/cyvius96/adgpm. To appear in CVPR 2019

Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches.

三維重建 · 3D · Networks · Networking · Neural Networks ·

2018 年 12 月 10 日

Occupancy Networks: Learning 3D Reconstruction in Function Space

Lars Mescheder,Michael Oechsle,Michael Niemeyer,Sebastian Nowozin,Andreas Geiger

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.