亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

相關內容

神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)(Neural Networks)是世(shi)界上三個(ge)最古老的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)建模學(xue)會(hui)的(de)(de)(de)檔案期刊(kan):國際神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(INNS)、歐洲神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(ENNS)和(he)(he)日(ri)本神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)學(xue)會(hui)(JNNS)。神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)提供了(le)一(yi)個(ge)論壇,以(yi)發展(zhan)和(he)(he)培(pei)育(yu)一(yi)個(ge)國際社(she)會(hui)的(de)(de)(de)學(xue)者(zhe)和(he)(he)實踐者(zhe)感(gan)(gan)興趣的(de)(de)(de)所有(you)方(fang)面的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)和(he)(he)相(xiang)關方(fang)法的(de)(de)(de)計(ji)算(suan)(suan)智能(neng)。神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)歡迎高(gao)質量(liang)(liang)論文(wen)的(de)(de)(de)提交,有(you)助于(yu)全(quan)面的(de)(de)(de)神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)研(yan)究,從行為和(he)(he)大腦建模,學(xue)習算(suan)(suan)法,通過數(shu)學(xue)和(he)(he)計(ji)算(suan)(suan)分(fen)(fen)析,系統的(de)(de)(de)工程和(he)(he)技(ji)術(shu)應用,大量(liang)(liang)使用神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)的(de)(de)(de)概(gai)念和(he)(he)技(ji)術(shu)。這一(yi)獨(du)特而(er)廣(guang)泛的(de)(de)(de)范(fan)圍促進了(le)生物(wu)和(he)(he)技(ji)術(shu)研(yan)究之間的(de)(de)(de)思想交流,并(bing)有(you)助于(yu)促進對生物(wu)啟(qi)發的(de)(de)(de)計(ji)算(suan)(suan)智能(neng)感(gan)(gan)興趣的(de)(de)(de)跨學(xue)科(ke)(ke)社(she)區的(de)(de)(de)發展(zhan)。因此,神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)網(wang)(wang)(wang)絡(luo)(luo)(luo)(luo)編委會(hui)代表(biao)的(de)(de)(de)專(zhuan)(zhuan)家(jia)領域(yu)包括心理(li)學(xue),神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)生物(wu)學(xue),計(ji)算(suan)(suan)機科(ke)(ke)學(xue),工程,數(shu)學(xue),物(wu)理(li)。該雜志發表(biao)文(wen)章、信(xin)(xin)件(jian)和(he)(he)評論以(yi)及給編輯(ji)的(de)(de)(de)信(xin)(xin)件(jian)、社(she)論、時(shi)事(shi)、軟件(jian)調查和(he)(he)專(zhuan)(zhuan)利信(xin)(xin)息。文(wen)章發表(biao)在五(wu)個(ge)部分(fen)(fen)之一(yi):認知科(ke)(ke)學(xue),神(shen)經(jing)(jing)(jing)(jing)(jing)(jing)科(ke)(ke)學(xue),學(xue)習系統,數(shu)學(xue)和(he)(he)計(ji)算(suan)(suan)分(fen)(fen)析、工程和(he)(he)應用。 官網(wang)(wang)(wang)地址:

In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: //github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.

Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at //github.com/ZigeW/data_management_LLM.

Graph neural networks (GNNs) are effective machine learning models for many graph-related applications. Despite their empirical success, many research efforts focus on the theoretical limitations of GNNs, i.e., the GNNs expressive power. Early works in this domain mainly focus on studying the graph isomorphism recognition ability of GNNs, and recent works try to leverage the properties such as subgraph counting and connectivity learning to characterize the expressive power of GNNs, which are more practical and closer to real-world. However, no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a first survey for models for enhancing expressive power under different forms of definition. Concretely, the models are reviewed based on three categories, i.e., Graph feature enhancement, Graph topology enhancement, and GNNs architecture enhancement.

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.

It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.

Graph Neural Networks (GNNs) are widely used for analyzing graph-structured data. Most GNN methods are highly sensitive to the quality of graph structures and usually require a perfect graph structure for learning informative embeddings. However, the pervasiveness of noise in graphs necessitates learning robust representations for real-world problems. To improve the robustness of GNN models, many studies have been proposed around the central concept of Graph Structure Learning (GSL), which aims to jointly learn an optimized graph structure and corresponding representations. Towards this end, in the presented survey, we broadly review recent progress of GSL methods for learning robust representations. Specifically, we first formulate a general paradigm of GSL, and then review state-of-the-art methods classified by how they model graph structures, followed by applications that incorporate the idea of GSL in other graph tasks. Finally, we point out some issues in current studies and discuss future directions.

Deep learning methods are achieving ever-increasing performance on many artificial intelligence tasks. A major limitation of deep models is that they are not amenable to interpretability. This limitation can be circumvented by developing post hoc techniques to explain the predictions, giving rise to the area of explainability. Recently, explainability of deep models on images and texts has achieved significant progress. In the area of graph data, graph neural networks (GNNs) and their explainability are experiencing rapid developments. However, there is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations. In this survey, we provide a unified and taxonomic view of current GNN explainability methods. Our unified and taxonomic treatments of this subject shed lights on the commonalities and differences of existing methods and set the stage for further methodological developments. To facilitate evaluations, we generate a set of benchmark graph datasets specifically for GNN explainability. We summarize current datasets and metrics for evaluating GNN explainability. Altogether, this work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.

We propose UniViLM: a Unified Video and Language pre-training Model for multimodal understanding and generation. Motivated by the recent success of BERT based pre-training technique for NLP and image-language tasks, VideoBERT and CBT are proposed to exploit BERT model for video and language pre-training using narrated instructional videos. Different from their works which only pre-train understanding task, we propose a unified video-language pre-training model for both understanding and generation tasks. Our model comprises of 4 components including two single-modal encoders, a cross encoder and a decoder with the Transformer backbone. We first pre-train our model to learn the universal representation for both video and language on a large instructional video dataset. Then we fine-tune the model on two multimodal tasks including understanding task (text-based video retrieval) and generation task (multimodal video captioning). Our extensive experiments show that our method can improve the performance of both understanding and generation tasks and achieves the state-of-the art results.

Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches.

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

北京阿比特科技有限公司