在线亚洲91SE亚洲综合在线,欧美成人亚洲国产中文精品

Training neural networks that require adversarial optimization, such as generative adversarial networks (GANs) and unsupervised domain adaptations (UDAs), suffers from instability. This instability problem comes from the difficulty of the minimax optimization, and there have been various approaches in GANs and UDAs to overcome this problem. In this study, we tackle this problem theoretically through a functional analysis. Specifically, we show the convergence property of the minimax problem by the gradient descent over the infinite-dimensional spaces of continuous functions and probability measures under certain conditions. Using this setting, we can discuss GANs and UDAs comprehensively, which have been studied independently. In addition, we show that the conditions necessary for the convergence property are interpreted as stabilization techniques of adversarial training such as the spectral normalization and the gradient penalty.

相關內容

Minimax

關注 0

Networking · 查準率/準確率 · 模型評估 · 推斷 · 可約的 ·

2024 年 1 月 25 日

Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

Yaniv Blumenfeld,Itay Hubara,Daniel Soudry

The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g., weights, activations, and gradients). However, current hardware still relies on high-accuracy core operations. Most significant is the operation of accumulating products. This high-precision accumulation operation is gradually becoming the main computational bottleneck. This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance. In this work, we present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators, with no significant degradation in accuracy. Lastly, we show that as we decrease the accumulation precision further, using fine-grained gradient approximations can improve the DNN accuracy.

矩 · 近似 · 確切的 · 環 · 動力系統 ·

2024 年 1 月 25 日

Exact and Approximate Moment Derivation for Probabilistic Loops With Non-Polynomial Assignments

Andrey Kofnov,Marcel Moosbrugger,Miroslav Stankovi?,Ezio Bartocci,Efstathia Bura

from arxiv, Published in ACM Transactions on Modeling and Computer Simulation (TOMACS). Extended version of the conference paper 'Moment-based Invariants for Probabilistic Loops with Non-polynomial Assignments' published at QEST 2022 (Best paper award, see also the preprint arxiv.org/abs/2205.02577). arXiv admin note: substantial text overlap with arXiv:2205.02577

Many stochastic continuous-state dynamical systems can be modeled as probabilistic programs with nonlinear non-polynomial updates in non-nested loops. We present two methods, one approximate and one exact, to automatically compute, without sampling, moment-based invariants for such probabilistic programs as closed-form solutions parameterized by the loop iteration. The exact method applies to probabilistic programs with trigonometric and exponential updates and is embedded in the Polar tool. The approximate method for moment computation applies to any nonlinear random function as it exploits the theory of polynomial chaos expansion to approximate non-polynomial updates as the sum of orthogonal polynomials. This translates the dynamical system to a non-nested loop with polynomial updates, and thus renders it conformable with the Polar tool that computes the moments of any order of the state variables. We evaluate our methods on an extensive number of examples ranging from modeling monetary policy to several physical motion systems in uncertain environments. The experimental results demonstrate the advantages of our approach with respect to the current state-of-the-art.

Lipschitz · Networking · Neural Networks · 約束 · 激活函數 ·

2024 年 1 月 25 日

Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted Activations

Patricia Pauli,Aaron Havens,Alexandre Araujo,Siddharth Garg,Farshad Khorrami,Frank Allg?wer,Bin Hu

from arxiv, accepted as a conference paper at ICLR 2024

Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation requires the activation functions to be slope-restricted on $[0,1]$, preventing its further use for more general activation functions such as GroupSort, MaxMin, and Householder. One can rewrite MaxMin activations for example as residual ReLU networks. However, a direct application of LipSDP to the resultant residual ReLU networks is conservative and even fails in recovering the well-known fact that the MaxMin activation is 1-Lipschitz. Our paper bridges this gap and extends LipSDP beyond slope-restricted activation functions. To this end, we provide novel quadratic constraints for GroupSort, MaxMin, and Householder activations via leveraging their underlying properties such as sum preservation. Our proposed analysis is general and provides a unified approach for estimating $\ell_2$ and $\ell_\infty$ Lipschitz bounds for a rich class of neural network architectures, including non-residual and residual neural networks and implicit models, with GroupSort, MaxMin, and Householder activations. Finally, we illustrate the utility of our approach with a variety of experiments and show that our proposed SDPs generate less conservative Lipschitz bounds in comparison to existing approaches.

MoDELS · 穩健性 · 代價 · Networking · Neural Networks ·

2024 年 1 月 24 日

A Systematic Approach to Robustness Modelling for Deep Convolutional Neural Networks

Charles Meyers,Mohammad Reza Saleh Sedghpour,Tommy L?fstedt,Erik Elmroth

Convolutional neural networks have shown to be widely applicable to a large number of fields when large amounts of labelled data are available. The recent trend has been to use models with increasingly larger sets of tunable parameters to increase model accuracy, reduce model loss, or create more adversarially robust models -- goals that are often at odds with one another. In particular, recent theoretical work raises questions about the ability for even larger models to generalize to data outside of the controlled train and test sets. As such, we examine the role of the number of hidden layers in the ResNet model, demonstrated on the MNIST, CIFAR10, CIFAR100 datasets. We test a variety of parameters including the size of the model, the floating point precision, and the noise level of both the training data and the model output. To encapsulate the model's predictive power and computational cost, we provide a method that uses induced failures to model the probability of failure as a function of time and relate that to a novel metric that allows us to quickly determine whether or not the cost of training a model outweighs the cost of attacking it. Using this approach, we are able to approximate the expected failure rate using a small number of specially crafted samples rather than increasingly larger benchmark datasets. We demonstrate the efficacy of this technique on both the MNIST and CIFAR10 datasets using 8-, 16-, 32-, and 64-bit floating-point numbers, various data pre-processing techniques, and several attacks on five configurations of the ResNet model. Then, using empirical measurements, we examine the various trade-offs between cost, robustness, latency, and reliability to find that larger models do not significantly aid in adversarial robustness despite costing significantly more to train.

Learning · 可約的 · 邊 · Networking · MoDELS ·

2024 年 1 月 24 日

Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks

Zheng Lin,Guangyu Zhu,Yiqin Deng,Xianhao Chen,Yue Gao,Kaibin Huang,Yuguang Fang

from arxiv, 15 pages, 13 figures

The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL), allowing multiple client devices to offload substantial training workloads to an edge server via layer-wise model split. By observing that existing PSL schemes incur excessive training latency and large volume of data transmissions, we propose an innovative PSL framework, namely, efficient parallel split learning (EPSL), to accelerate model training. To be specific, EPSL parallelizes client-side model training and reduces the dimension of local gradients for back propagation (BP) via last-layer gradient aggregation, leading to a significant reduction in server-side training and communication latency. Moreover, by considering the heterogeneous channel conditions and computing capabilities at client devices, we jointly optimize subchannel allocation, power control, and cut layer selection to minimize the per-round latency. Simulation results show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy compared with the state-of-the-art benchmarks, and the tailored resource management and layer split strategy can considerably reduce latency than the counterpart without optimization.

PAC學習理論 · 通道 · 離散化 · 解碼 · Learning ·

2024 年 1 月 24 日

PAC Learnability for Reliable Communication over Discrete Memoryless Channels

Jiakun Liu,Wenyi Zhang,H. Vincent Poor

from arxiv, 10 pages, 4 figures

In practical communication systems, knowledge of channel models is often absent, and consequently, transceivers need be designed based on empirical data. In this work, we study data-driven approaches to reliably choosing decoding metrics and code rates that facilitate reliable communication over unknown discrete memoryless channels (DMCs). Our analysis is inspired by the PAC learning theory and does not rely on any assumptions on the statistical characteristics of DMCs. We show that a naive plug-in algorithm for choosing decoding metrics is likely to fail for finite training sets. We propose an alternative algorithm called the virtual sample algorithm and establish a non-asymptotic lower bound on its performance. The virtual sample algorithm is then used as a building block for constructing a learning algorithm that chooses a decoding metric and a code rate using which a transmitter and a receiver can reliably communicate at a rate arbitrarily close to the channel mutual information. Therefore, we conclude that DMCs are PAC learnable.

THz · MIMO-OFDM · Performer · MIMO · 6G ·

2024 年 1 月 22 日

Performance Analysis of 6G Multiuser Massive MIMO-OFDM THz Wireless Systems with Hybrid Beamforming under Intercarrier Interference

Md Saheed Ullah,Zulqarnain Bin Ashraf,Sudipta Chandra Sarker

6G networks are expected to provide more diverse capabilities than their predecessors and are likely to support applications beyond current mobile applications, such as virtual and augmented reality (VR/AR), AI, and the Internet of Things (IoT). In contrast to typical multiple-input multiple-output (MIMO) systems, THz MIMO precoding cannot be conducted totally at baseband using digital precoders due to the restricted number of signal mixers and analog-to-digital converters that can be supported due to their cost and power consumption. In this thesis, we analyzed the performance of multiuser massive MIMO-OFDM THz wireless systems with hybrid beamforming. Carrier frequency offset (CFO) is one of the most well-known disturbances for OFDM. For practicality, we accounted for CFO, which results in Intercarrier Interference. Incorporating the combined impact of molecular absorption, high sparsity, and multi-path fading, we analyzed a three-dimensional wideband THz channel and the carrier frequency offset in multi-carrier systems. With this model, we first presented a two-stage wideband hybrid beamforming technique comprising Riemannian manifolds optimization for analog beamforming and then a zero-forcing (ZF) approach for digital beamforming. We adjusted the objective function to reduce complexity, and instead of maximizing the bit rate, we determined parameters by minimizing interference. Numerical results demonstrate the significance of considering ICI for practical implementation for the THz system. We demonstrated how our change in problem formulation minimizes latency without compromising results. We also evaluated spectral efficiency by varying the number of RF chains and antennas. The spectral efficiency grows as the number of RF chains and antennas increases, but the spectral efficiency of antennas declines when the number of users increases.

INFORMS · 可辨認的 · Networking · Neural Networks · 黑盒 ·

2021 年 10 月 4 日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Yang Zhang,Ashkan Khakzar,Yawei Li,Azade Farshad,Seong Tae Kim,Nassir Navab

from arxiv, Accepted in NeurIPS 2021 (Neural Information Processing Systems)

One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

entity · 鄰域聚合 · 實體對齊 · 圖 · Networking ·

2019 年 11 月 20 日

Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation

Zequn Sun,Chengming Wang,Wei Hu,Muhao Chen,Jian Dai,Wei Zhang,Yuzhong Qu

from arxiv, Accepted by the 34th AAAI Conference on Artificial Intelligence (AAAI 2020)

Graph neural networks (GNNs) have emerged as a powerful paradigm for embedding-based entity alignment due to their capability of identifying isomorphic subgraphs. However, in real knowledge graphs (KGs), the counterpart entities usually have non-isomorphic neighborhood structures, which easily causes GNNs to yield different representations for them. To tackle this problem, we propose a new KG alignment network, namely AliNet, aiming at mitigating the non-isomorphism of neighborhood structures in an end-to-end manner. As the direct neighbors of counterpart entities are usually dissimilar due to the schema heterogeneity, AliNet introduces distant neighbors to expand the overlap between their neighborhood structures. It employs an attention mechanism to highlight helpful distant neighbors and reduce noises. Then, it controls the aggregation of both direct and distant neighborhood information using a gating mechanism. We further propose a relation loss to refine entity representations. We perform thorough experiments with detailed ablation studies and analyses on five entity alignment datasets, demonstrating the effectiveness of AliNet.

Neural Networks · 目標跟蹤 · 學成 · Networking · RNN ·

2018 年 1 月 6 日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Li Wang,Ting Liu,Bing Wang,Xulei Yang,Gang Wang

Recently, deep learning has achieved very promising results in visual object tracking. Deep neural networks in existing tracking methods require a lot of training data to learn a large number of parameters. However, training data is not sufficient for visual object tracking as annotations of a target object are only available in the first frame of a test sequence. In this paper, we propose to learn hierarchical features for visual object tracking by using tree structure based Recursive Neural Networks (RNN), which have fewer parameters than other deep neural networks, e.g. Convolutional Neural Networks (CNN). First, we learn RNN parameters to discriminate between the target object and background in the first frame of a test sequence. Tree structure over local patches of an exemplar region is randomly generated by using a bottom-up greedy search strategy. Given the learned RNN parameters, we create two dictionaries regarding target regions and corresponding local patches based on the learned hierarchical features from both top and leaf nodes of multiple random trees. In each of the subsequent frames, we conduct sparse dictionary coding on all candidates to select the best candidate as the new target location. In addition, we online update two dictionaries to handle appearance changes of target objects. Experimental results demonstrate that our feature learning algorithm can significantly improve tracking performance on benchmark datasets.