国产白浆一区二区无码视频在线_一区二区三区人妻美穴又白_欧美成人免费全部观看在线_欧美一区二区成人久久片爱_亚洲精品一级在线观看_爱爱视频一区二区三区_69国产热成人精品视频免费

State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data. Unlike traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), SSMs offer a structured and stable approach to sequence modelling, leveraging principles from control theory and dynamical systems. However, a key challenge in sequence modelling is compressing long-term dependencies into a compact hidden state representation without losing critical information. In this paper, we develop a rigorous mathematical framework for understanding memory compression in selective state space models. We introduce a selective gating mechanism that dynamically filters and updates the hidden state based on input relevance, allowing for efficient memory compression. We formalize the trade-off between memory efficiency and information retention using information-theoretic tools, such as mutual information and rate-distortion theory. Our analysis provides theoretical bounds on the amount of information that can be compressed without sacrificing model performance. We also derive theorems that prove the stability and convergence of the hidden state in selective SSMs, ensuring reliable long-term memory retention. Computational complexity analysis reveals that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models. Through empirical validation on sequence modelling tasks such as time-series forecasting and natural language processing, we demonstrate that selective SSMs achieve state-of-the-art performance while using less memory and computational resources.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 優化器 · 層 · 解碼 · 噪聲 ·

2024 年 11 月 14 日

Sum Rate Maximization for Movable Antenna-Aided Downlink RSMA Systems

Cixiao Zhang,Size Peng,Yin Xu,Qingqing Wu,Xiaowu Ou,Xinghao Guo,Dazhi He,Wenjun Zhang

Rate splitting multiple access (RSMA) is regarded as a crucial and powerful physical layer (PHY) paradigm for next-generation communication systems. Particularly, users employ successive interference cancellation (SIC) to decode part of the interference while treating the remainder as noise. However, conventional RSMA systems rely on fixed-position antenna arrays, limiting their ability to fully exploit spatial diversity. This constraint reduces beamforming gain and significantly impairs RSMA performance. To address this problem, we propose a movable antenna (MA)-aided RSMA scheme that allows the antennas at the base station (BS) to dynamically adjust their positions. Our objective is to maximize the system sum rate of common and private messages by jointly optimizing the MA positions, beamforming matrix, and common rate allocation. To tackle the formulated non-convex problem, we apply fractional programming (FP) and develop an efficient two-stage, coarse-to-fine-grained searching (CFGS) algorithm to obtain high-quality solutions. Numerical results demonstrate that, with optimized antenna adjustments, the MA-enabled system achieves substantial performance and reliability improvements in RSMA over fixed-position antenna setups.

變換 · 數據集 · 多樣性 · Learning · 樣本 ·

2024 年 11 月 12 日

Isometric Transformations for Image Augmentation in Mueller Matrix Polarimetry

Christopher Hahne,Omar Rodriguez-Nunez,éléa Gros,Théotim Lucas,Ekkehard Hewer,Tatiana Novikova,Theoni Maragkou,Philippe Schucht,Richard McKinley

from arxiv, preprint

Mueller matrix polarimetry captures essential information about polarized light interactions with a sample, presenting unique challenges for data augmentation in deep learning due to its distinct structure. While augmentations are an effective and affordable way to enhance dataset diversity and reduce overfitting, standard transformations like rotations and flips do not preserve the polarization properties in Mueller matrix images. To this end, we introduce a versatile simulation framework that applies physically consistent rotations and flips to Mueller matrices, tailored to maintain polarization fidelity. Our experimental results across multiple datasets reveal that conventional augmentations can lead to misleading results when applied to polarimetric data, underscoring the necessity of our physics-based approach. In our experiments, we first compare our polarization-specific augmentations against real-world captures to validate their physical consistency. We then apply these augmentations in a semantic segmentation task, achieving substantial improvements in model generalization and performance. This study underscores the necessity of physics-informed data augmentation for polarimetric imaging in deep learning (DL), paving the way for broader adoption and more robust applications across diverse research in the field. In particular, our framework unlocks the potential of DL models for polarimetric datasets with limited sample sizes. Our code implementation is available at github.com/hahnec/polar_augment.

MoDELS · 圖卷積網絡 · entity · 圖卷積 · 圖卷積神經網絡/圖卷積網絡 ·

2024 年 11 月 12 日

Entity-Aware Self-Attention and Contextualized GCN for Enhanced Relation Extraction in Long Sentences

Xin Wang,Xinyi Bai

Relation extraction as an important natural Language processing (NLP) task is to identify relations between named entities in text. Recently, graph convolutional networks over dependency trees have been widely used to capture syntactic features and achieved attractive performance. However, most existing dependency-based approaches ignore the positive influence of the words outside the dependency trees, sometimes conveying rich and useful information on relation extraction. In this paper, we propose a novel model, Entity-aware Self-attention Contextualized GCN (ESC-GCN), which efficiently incorporates syntactic structure of input sentences and semantic context of sequences. To be specific, relative position self-attention obtains the overall semantic pairwise correlation related to word position, and contextualized graph convolutional networks capture rich intra-sentence dependencies between words by adequately pruning operations. Furthermore, entity-aware attention layer dynamically selects which token is more decisive to make final relation prediction. In this way, our proposed model not only reduces the noisy impact from dependency trees, but also obtains easily-ignored entity-related semantic representation. Extensive experiments on various tasks demonstrate that our model achieves encouraging performance as compared to existing dependency-based and sequence-based models. Specially, our model excels in extracting relations between entities of long sentences.

似然 · 近似 · GROUP · MoDELS · 確切的 ·

2024 年 11 月 11 日

Exact Gradient Evaluation for Adaptive Quadrature Approximate Marginal Likelihood in Mixed Models for Grouped Data

Alex Stringer

A method is introduced for approximate marginal likelihood inference via adaptive Gaussian quadrature in mixed models with a single grouping factor. The core technical contribution is an algorithm for computing the exact gradient of the approximate log-marginal likelihood. This leads to efficient maximum likelihood via quasi-Newton optimization that is demonstrated to be faster than existing approaches based on finite-differenced gradients or derivative-free optimization. The method is specialized to Bernoulli mixed models with multivariate, correlated Gaussian random effects; here computations are performed using an inverse log-Cholesky parameterization of the Gaussian density that involves no matrix decomposition during model fitting, while Wald confidence intervals are provided for variance parameters on the original scale. Simulations give evidence of these intervals attaining nominal coverage if enough quadrature points are used, for data comprised of a large number of very small groups exhibiting large between-group heterogeneity. The Laplace approximation is well-known to give especially poor coverage and high bias for data comprised of a large number of small groups. Adaptive quadrature mitigates this, and the methods in this paper improve the computational feasibility of this more accurate method. All results may be reproduced using code available at \url{//github.com/awstringer1/aghmm-paper-code}.

基 · 回合 · MoDELS · 操作 · 設計 ·

2024 年 11 月 11 日

Orchestration Framework for Open System Models with Autonomous RISs and Oblivious Base Stations

Victor Croisfelt,Francesco Devoti,Fabio Saggese,Vincenzo Sciancalepore,Xavier Costa-Pérez,Petar Popovski

from arxiv, 16 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

Autonomous reconfigurable intelligent surface (RIS) offers the potential to simplify deployment by reducing the need for real-time remote control between a base station (BS) and an RIS. However, we highlight two major challenges posed by autonomy. The first is implementation complexity, as autonomy requires hybrid RISs (HRISs) equipped with additional on-board hardware to monitor the propagation environment and conduct local channel estimation (CHEST), a process known as probing. The second challenge, termed probe distortion, reflects a form of the observer effect: during probing, an HRIS can inadvertently alter the propagation environment, potentially disrupting the operations of other communicating devices. While implementation complexity has been extensively studied, probe distortion remains largely unexplored. To further assess the potential of autonomous RISs, this paper comprehensively and pragmatically studies fundamental trade-offs posed by these challenges. We examine the robustness of an HRIS-assisted massive multiple-input multiple-output (mMIMO) system under minimal design choices that reflect the essential elements and stringent conditions, including (a) two extremes of implementation complexity realized through minimalist operational designs of two HRIS hardware architectures, and (b) an oblivious BS that fully embraces probe distortion. To make our analysis possible, we propose a physical-layer orchestration framework that aligns HRIS and mMIMO operations. We provide empirical evidence showing that autonomous RIS holds promise even under these strict conditions and propose new research directions, particularly for advancing the understanding of probe distortion.

MIMO · 向量化 · 可約的 · 成比例 · 方陣 ·

2024 年 11 月 11 日

A Superdirective Beamforming Approach with Impedance Coupling and Field Coupling for Compact Antenna Arrays

Liangcheng Han,Haifan Yin,Mengying Gao,Jingcheng Xie

from arxiv, 16 pages,14 Figures, Accepted by IEEE Open Journal of the Communications Society

In most multiple-input multiple-output (MIMO) communication systems, antennas are spaced at least half a wavelength apart to reduce mutual coupling. In this configuration, the maximum array gain is equal to the number of antennas. However, when the antenna spacing is significantly reduced, the array gain of a compact array can become proportional to the square of the number of antennas, greatly exceeding that of traditional MIMO systems. Achieving this "superdirectivity" requires complex calculations of the excitation coefficients (beamforming vector), which is a challenging task. In this paper, we address this problem with a novel double coupling-based superdirective beamforming method. In particular, we categorize the antenna coupling effects to impedance coupling and field coupling. By characterizing these two coupling in model, we derive the beamforming vector for superdirective arrays. We prove that the field coupling matrix has the unique solution for an antenna array, and itself has the ability to fully characterize the distorted coupling field. Based on this proven theorem, we propose a method that accurately calculates the coupling matrix using only a number of angle sampling points on the order of the number of antennas. Moreover, a prototype of an independently-controlled superdirective antenna array is developed. Full-wave electromagnetic simulations and real-world experiments validate the effectiveness of our proposed approaches, and superdirectivity is achieved in reality by a compact array with 4 and 8 dipole antennas.

MoDELS · 語言模型化 · Learning · 逼真度 · 講稿 ·

2024 年 4 月 11 日

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu,Jerry Wei,Fangyu Liu,Chenglei Si,Yanzhe Zhang,Jinmeng Rao,Steven Zheng,Daiyi Peng,Diyi Yang,Denny Zhou,Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.

蒸餾 · MoDELS · 聯邦學習 · 學成 · 歸納偏好 ·

2021 年 6 月 9 日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Zhuangdi Zhu,Junyuan Hong,Jiayu Zhou

Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.

節點分類 · 學成 · GNN · 圖 · 結點 ·

2020 年 3 月 26 日

A Collective Learning Framework to Boost GNN Expressiveness

Mengyue Hang,Jennifer Neville,Bruno Ribeiro

Graph Neural Networks (GNNs) have recently been used for node and graph classification tasks with great success, but GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels. In this work, we consider the task of inductive node classification using GNNs in supervised and semi-supervised settings, with the goal of incorporating label dependencies. Because current GNNs are not universal (i.e., most-expressive) graph representations, we propose a general collective learning approach to increase the representation power of any existing GNN. Our framework combines ideas from collective classification with self-supervised learning, and uses a Monte Carlo approach to sampling embeddings for inductive learning across graphs. We evaluate performance on five real-world network datasets and demonstrate consistent, significant improvement in node classification accuracy, for a variety of state-of-the-art GNNs.

平滑 · 注意力機制 · 反向傳播 · 維特比算法 · 正則化項 ·

2018 年 2 月 20 日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arthur Mensch,Mathieu Blondel

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.