斗破苍穹第四季25集免费观看-在线欧美视频一区二区三区

We propose an alternative approach to neural network training using the monotone vector field, an idea inspired by the seminal work of Juditsky and Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve parameter estimation problems for generalized linear models (GLM) by reducing the original non-convex problem to a convex problem of solving a monotone variational inequality (VI). Our approach leads to computationally efficient procedures that converge fast and offer guarantee in some special cases, such as training a single-layer neural network or fine-tuning the last layer of the pre-trained model. Our approach can be used for more efficient fine-tuning of a pre-trained model while freezing the bottom layers, an essential step for deploying many machine learning models such as large language models (LLM). We demonstrate its applicability in training fully-connected (FC) neural networks, graph neural networks (GNN), and convolutional neural networks (CNN) and show the competitive or better performance of our approach compared to stochastic gradient descent methods on both synthetic and real network data prediction tasks regarding various performance metrics.

相關內容

Neural Networks

關注 1648

神經網絡（Neural Networks）是世界上三個最古老的神經建模學會的檔案期刊:國際神經網絡學會(INNS)、歐洲神經網絡學會(ENNS)和日本神經網絡學會(JNNS)。神經網絡提供了一個論壇，以發展和培育一個國際社會的學者和實踐者感興趣的所有方面的神經網絡和相關方法的計算智能。神經網絡歡迎高質量論文的提交，有助于全面的神經網絡研究，從行為和大腦建模，學習算法，通過數學和計算分析，系統的工程和技術應用，大量使用神經網絡的概念和技術。這一獨特而廣泛的范圍促進了生物和技術研究之間的思想交流，并有助于促進對生物啟發的計算智能感興趣的跨學科社區的發展。因此，神經網絡編委會代表的專家領域包括心理學，神經生物學，計算機科學，工程，數學，物理。該雜志發表文章、信件和評論以及給編輯的信件、社論、時事、軟件調查和專利信息。文章發表在五個部分之一:認知科學，神經科學，學習系統，數學和計算分析、工程和應用。官網地址：

Neural Networks · Networking · Learning · 反向傳播 · Machine Learning ·

2024 年 4 月 23 日

Training all-mechanical neural networks for task learning through in situ backpropagation

Shuaifeng Li,Xiaoming Mao

from arxiv, 25 pages, 5 figures

Recent advances unveiled physical neural networks as promising machine learning platforms, offering faster and more energy-efficient information processing. Compared with extensively-studied optical neural networks, the development of mechanical neural networks (MNNs) remains nascent and faces significant challenges, including heavy computational demands and learning with approximate gradients. Here, we introduce the mechanical analogue of in situ backpropagation to enable highly efficient training of MNNs. We demonstrate that the exact gradient can be obtained locally in MNNs, enabling learning through their immediate vicinity. With the gradient information, we showcase the successful training of MNNs for behavior learning and machine learning tasks, achieving high accuracy in regression and classification. Furthermore, we present the retrainability of MNNs involving task-switching and damage, demonstrating the resilience. Our findings, which integrate the theory for training MNNs and experimental and numerical validations, pave the way for mechanical machine learning hardware and autonomous self-learning material systems.

Neural Networks · Networking · Networks · Learning · INFORMS ·

2024 年 4 月 23 日

Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks

Julia Gygax,Friedemann Zenke

from arxiv, 25 pages, 7 figures + 3 supplementary figures

Training spiking neural networks to approximate complex functions is essential for studying information processing in the brain and neuromorphic computing. Yet, the binary nature of spikes constitutes a challenge for direct gradient-based training. To sidestep this problem, surrogate gradients have proven empirically successful, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to lack of support for automatic differentiation, are impractical for training deep spiking neural networks, yet provide gradients equivalent to surrogate gradients in single neurons. On the other hand, we examine stochastic automatic differentiation, which is compatible with discrete randomness but has never been applied to spiking neural network training. We find that the latter provides the missing theoretical basis for surrogate gradients in stochastic spiking neural networks. We further show that surrogate gradients in deterministic networks correspond to a particular asymptotic case and numerically confirm the effectiveness of surrogate gradients in stochastic multi-layer spiking neural networks. Finally, we illustrate that surrogate gradients are not conservative fields and, thus, not gradients of a surrogate loss. Our work provides the missing theoretical foundation for surrogate gradients and an analytically well-founded solution for end-to-end training of stochastic spiking neural networks.

機器人 · GROUP · 設計 · 可辨認的 · 可理解性 ·

2024 年 4 月 22 日

A participatory design approach to using social robots for elderly care

Barbara Sienkiewicz,Zuzanna Radosz-Knawa,Bipin Indurkhya

from arxiv, ARSO 2024

We present our ongoing research on applying a participatory design approach to using social robots for elderly care. Our approach involves four different groups of stakeholders: the elderly, (non-professional) caregivers, medical professionals, and psychologists. We focus on card sorting and storyboarding techniques to elicit the concerns of the stakeholders towards deploying social robots for elderly care. This is followed by semi-structured interviews to assess their attitudes towards social robots individually. Then we are conducting two-stage workshops with different elderly groups to understand how to engage them with the technology and to identify the challenges in this task.

估計/估計量 · Performance · tuning · 泛函 · 優化器 ·

2024 年 4 月 21 日

A nonstandard application of cross-validation to estimate density functionals

José E. Chacón,Carlos Tenreiro

from arxiv, 20 pages main text + 14 pages, 2 figures supplementary material

Cross-validation is usually employed to evaluate the performance of a given statistical methodology. When such a methodology depends on a number of tuning parameters, cross-validation proves to be helpful to select the parameters that optimize the estimated performance. In this paper, however, a very different and nonstandard use of cross-validation is investigated. Instead of focusing on the cross-validated parameters, the main interest is switched to the estimated value of the error criterion at optimal performance. It is shown that this approach is able to provide consistent and efficient estimates of some density functionals, with the noteworthy feature that these estimates do not rely on the choice of any further tuning parameter, so that, in that sense, they can be considered to be purely empirical. Here, a base case of application of this new paradigm is developed in full detail, while many other possible extensions are hinted as well.

語言模型化 · MoDELS · 大語言模型 · Integration · Nuance ·

2024 年 4 月 20 日

Augmenting emotion features in irony detection with Large language modeling

Yucheng Lin,Yuhan Xia,Yunfei Long

from arxiv, 11 pages, 3 tables, 2 figures. Accepted by the 25th Chinese Lexical Semantics Workshop

This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, our methodology augments the detection process by integrating subtle emotional cues, augmented through LLMs, into three benchmark pre-trained NLP models - BERT, T5, and GPT-2 - which are widely recognized as foundational in irony detection. We assessed our method using the SemEval-2018 Task 3 dataset and observed substantial enhancements in irony detection capabilities.

圖像修復 · 標注 · MoDELS · 監督 · Learning ·

2024 年 4 月 19 日

COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images

Dmytro Shvetsov,Joonas Ariva,Marharyta Domnich,Raul Vicente,Dmytro Fishman

from arxiv, This work has been accepted to be presented to The 2nd World Conference on eXplainable Artificial Intelligence (xAI 2024), July 17-19, 2024 - Valletta, Malta

Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.

Attention · 可辨認的 · 模態 · state-of-the-art · Networking ·

2024 年 4 月 18 日

A multi-modal approach for identifying schizophrenia using cross-modal attention

Gowtham Premananth,Yashish M. Siriwardena,Philip Resnik,Carol Espy-Wilson

from arxiv, Accepted to Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2024

This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.

Networking · 優化器 · Neural Networks · CNN · Performer ·

2024 年 4 月 18 日

PyTOaCNN: Topology optimization using an adaptive convolutional neural network in Python

Khaish Singh Chadha,Prabhat Kumar

from arxiv, 24 pages

This paper introduces an adaptive convolutional neural network (CNN) architecture capable of automating various topology optimization (TO) problems with diverse underlying physics. The proposed architecture has an encoder-decoder-type structure with dense layers added at the bottleneck region to capture complex geometrical features. The network is trained using datasets obtained by the problem-specific open-source TO codes. Tensorflow and Keras are the main libraries employed to develop and to train the model. Effectiveness and robustness of the proposed adaptive CNN model are demonstrated through its performance in compliance minimization problems involving constant and design-dependent loads and in addressing bulk modulus optimization. Once trained, the model takes user's input of the volume fraction as an image and instantly generates an output image of optimized design. The proposed CNN produces high-quality results resembling those obtained via open-source TO codes with negligible performance and volume fraction errors. The paper includes complete associated Python code (Appendix A) for the proposed CNN architecture and explains each part of the code to facilitate reproducibility and ease of learning.

Learning · 通道 · 估計/估計量 · 分離的 · 前饋 ·

2024 年 4 月 17 日

Tight bounds on Pauli channel learning without entanglement

Senrui Chen,Changhun Oh,Sisi Zhou,Hsin-Yuan Huang,Liang Jiang

from arxiv, 20 pages, 2 figure; v2: add Fig.2 comparing our lower bound with noisy-entanglement-assisted upper bound, close to accepted version

Quantum entanglement is a crucial resource for learning properties from nature, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize states, measurements, and operations that are separable between the main system of interest and an ancillary system. Interestingly, we show that these algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. Within this setting, we prove a tight lower bound for Pauli channel learning without entanglement that closes the gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ copies of the Pauli channel. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for Pauli noise characterization.

Networking · Neural Networks · Learning · MoDELS · 經驗風險 ·

2024 年 4 月 17 日

Learning time-scales in two-layers neural networks

Rapha?l Berthier,Andrea Montanari,Kangjie Zhou

from arxiv, 64 pages, 15 figures

Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.