In this paper, we propose a new event memory architecture (MemNet) for recurrent neural networks, which is universal for different types of time series data such as scalar, multivariate or symbolic. Unlike other external neural memory architectures, it stores key-value pairs, which separate the information for addressing and for content to improve the representation, as in the digital archetype. Moreover, the key-value pairs also avoid the compromise between memory depth and resolution that applies to memories constructed by the model state. One of the MemNet key characteristics is that it requires only linear adaptive mapping functions while implementing a nonlinear operation on the input data. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing, providing state-of-the-art results in all application domains such as the chaotic time series, the symbolic operation tasks, and the question-answering tasks (bAbI). Finally, controlled by five linear layers, MemNet requires a much smaller number of training parameters than other external memory networks as well as the transformer network. The space complexity of MemNet equals a single self-attention layer. It greatly improves the efficiency of the attention mechanism and opens the door for IoT applications.
In this paper, we aim at maximizing the weighted sum-rate (WSR) of rate splitting multiple access (RSMA) in multi-user multi-antenna transmission networks through the joint optimization of rate allocation and beamforming. Unlike conventional methods like weighted minimum mean square error (WMMSE) and standard fractional programming (FP), which tackle the non-convex WSR problem iteratively using disciplined convex subproblems and optimization toolboxes, our work pioneers a novel toolbox-free approach. For the first time, we identify the optimal beamforming structure and common rate allocation for WSR maximization in RSMA by leveraging FP and Lagrangian duality. Then we propose an algorithm based on FP and fixed point iteration to optimize the beamforming and common rate allocation without the need for optimization toolboxes. Our numerical results demonstrate that the proposed algorithm attains the same performance as standard FP and classical WMMSE methods while significantly reducing computational time.
In this work, we seek to simulate rare transitions between metastable states using score-based generative models. An efficient method for generating high-quality transition paths is valuable for the study of molecular systems since data is often difficult to obtain. We develop two novel methods for path generation in this paper: a chain-based approach and a midpoint-based approach. The first biases the original dynamics to facilitate transitions, while the second mirrors splitting techniques and breaks down the original transition into smaller transitions. Numerical results of generated transition paths for the M\"uller potential and for Alanine dipeptide demonstrate the effectiveness of these approaches in both the data-rich and data-scarce regimes.
The recent surge of utilizing deep neural networks for geometric processing and shape modeling has opened up exciting avenues. However, there is a conspicuous lack of research efforts on using powerful neural representations to extend the capabilities of parametric surfaces, which are the prevalent surface representations in product design, CAD/CAM, and computer animation. We present Neural Parametric Surfaces, the first piecewise neural surface representation that allows coarse patch layouts of arbitrary $n$-sided surface patches to model complex surface geometries with high precision, offering greater flexibility over traditional parametric surfaces. By construction, this new surface representation guarantees $G^0$ continuity between adjacent patches and empirically achieves $G^1$ continuity, which cannot be attained by existing neural patch-based methods. The key ingredient of our neural parametric surface is a learnable feature complex $\mathcal{C}$ that is embedded in a high-dimensional space $\mathbb{R}^D$ and topologically equivalent to the patch layout of the surface; each face cell of the complex is defined by interpolating feature vectors at its vertices. The learned feature complex is mapped by an MLP-encoded function $f:\mathcal{C} \rightarrow \mathcal{S}$ to produce the neural parametric surface $\mathcal{S}$. We present a surface fitting algorithm that optimizes the feature complex $\mathcal{C}$ and trains the neural mapping $f$ to reconstruct given target shapes with high accuracy. We further show that the proposed representation along with a compact-size neural net can learn a plausible shape space from a shape collection, which can be used for shape interpolation or shape completion from noisy and incomplete input data. Extensive experiments show that neural parametric surfaces offer greater modeling capabilities than traditional parametric surfaces.
This paper presents a case for exemplar parallelism of neural networks using Go as parallelization framework. Further it is shown that also limited multi-core hardware systems are feasible for these parallelization tasks, as notebooks and single board computer systems. The main question was how much speedup can be generated when using concurrent Go goroutines specifically. A simple concurrent feedforward network for MNIST digit recognition with the programming language Go was created to find the answer. The first findings when using a notebook (Lenovo Yoga 2) showed a speedup of 252% when utilizing 4 goroutines. Testing a single board computer (Banana Pi M3) delivered more convincing results: 320% with 4 goroutines, and 432% with 8 goroutines.
This paper presents the geometric aspect of the autoencoder framework, which, despite its importance, has been relatively less recognized. Given a set of high-dimensional data points that approximately lie on some lower-dimensional manifold, an autoencoder learns the \textit{manifold} and its \textit{coordinate chart}, simultaneously. This geometric perspective naturally raises inquiries like "Does a finite set of data points correspond to a single manifold?" or "Is there only one coordinate chart that can represent the manifold?". The responses to these questions are negative, implying that there are multiple solution autoencoders given a dataset. Consequently, they sometimes produce incorrect manifolds with severely distorted latent space representations. In this paper, we introduce recent geometric approaches that address these issues.
In order to overcome the expressive limitations of graph neural networks (GNNs), we propose the first method that exploits vector flows over graphs to develop globally consistent directional and asymmetric aggregation functions. We show that our directional graph networks (DGNs) generalize convolutional neural networks (CNNs) when applied on a grid. Whereas recent theoretical works focus on understanding local neighbourhoods, local structures and local isomorphism with no global information flow, our novel theoretical framework allows directional convolutional kernels in any graph. First, by defining a vector field in the graph, we develop a method of applying directional derivatives and smoothing by projecting node-specific messages into the field. Then we propose the use of the Laplacian eigenvectors as such vector field, and we show that the method generalizes CNNs on an n-dimensional grid, and is provably more discriminative than standard GNNs regarding the Weisfeiler-Lehman 1-WL test. Finally, we bring the power of CNN data augmentation to graphs by providing a means of doing reflection, rotation and distortion on the underlying directional field. We evaluate our method on different standard benchmarks and see a relative error reduction of 8\% on the CIFAR10 graph dataset and 11% to 32% on the molecular ZINC dataset. An important outcome of this work is that it enables to translate any physical or biological problems with intrinsic directional axes into a graph network formalism with an embedded directional field.
Deep neural network architectures have traditionally been designed and explored with human expertise in a long-lasting trial-and-error process. This process requires huge amount of time, expertise, and resources. To address this tedious problem, we propose a novel algorithm to optimally find hyperparameters of a deep network architecture automatically. We specifically focus on designing neural architectures for medical image segmentation task. Our proposed method is based on a policy gradient reinforcement learning for which the reward function is assigned a segmentation evaluation utility (i.e., dice index). We show the efficacy of the proposed method with its low computational cost in comparison with the state-of-the-art medical image segmentation networks. We also present a new architecture design, a densely connected encoder-decoder CNN, as a strong baseline architecture to apply the proposed hyperparameter search algorithm. We apply the proposed algorithm to each layer of the baseline architectures. As an application, we train the proposed system on cine cardiac MR images from Automated Cardiac Diagnosis Challenge (ACDC) MICCAI 2017. Starting from a baseline segmentation architecture, the resulting network architecture obtains the state-of-the-art results in accuracy without performing any trial-and-error based architecture design approaches or close supervision of the hyperparameters changes.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax