Neural networks are popular and useful in many fields, but they have the problem of giving high confidence responses for examples that are away from the training data. This makes the neural networks very confident in their prediction while making gross mistakes, thus limiting their reliability for safety-critical applications such as autonomous driving, space exploration, etc. This paper introduces a novel neuron generalization that has the standard dot-product-based neuron and the {\color{black} radial basis function (RBF)} neuron as two extreme cases of a shape parameter. Using a rectified linear unit (ReLU) as the activation function results in a novel neuron that has compact support, which means its output is zero outside a bounded domain. To address the difficulties in training the proposed neural network, it introduces a novel training method that takes a pretrained standard neural network that is fine-tuned while gradually increasing the shape parameter to the desired value. The theoretical findings of the paper are a bound on the gradient of the proposed neuron and a proof that a neural network with such neurons has the universal approximation property. This means that the network can approximate any continuous and integrable function with an arbitrary degree of accuracy. The experimental findings on standard benchmark datasets show that the proposed approach has smaller test errors than state-of-the-art competing methods and outperforms the competing methods in detecting out-of-distribution samples on two out of three datasets.
In this paper, we propose a new deep unfolding neural network based on the ADMM algorithm for analysis Compressed Sensing. The proposed network jointly learns a redundant analysis operator for sparsification and reconstructs the signal of interest. We compare our proposed network with a state-of-the-art unfolded ISTA decoder, that also learns an orthogonal sparsifier. Moreover, we consider not only image, but also speech datasets as test examples. Computational experiments demonstrate that our proposed network outperforms the state-of-the-art deep unfolding network, consistently for both real-world image and speech datasets.
A quasi-score linearity test for continuous and count network autoregressive models is developed. We establish the asymptotic distribution of the test when the network dimension is fixed or increasing, under the null hypothesis of linearity and Pitman's local alternatives. When the parameters are identifiable, the test statistic approximates a chi-square and noncentral chi-square asymptotic distribution, respectively. These results still hold true when the parameters tested belong to the boundary of their space. When we deal with non-identifiable parameters, a suitable test is proposed and its asymptotic distribution is established when the network dimension is fixed. Since, in general, critical values of such test cannot be tabulated, the empirical computation of the p-values is implemented using a feasible bound. Bootstrap approximations are also provided. Moreover, consistency and asymptotic normality of the quasi maximum likelihood estimator is established for continuous and count nonlinear network autoregressions, under standard smoothness conditions. A simulation study and two data examples complement this work.
The presence of mislabeled observations in data is a notoriously challenging problem in statistics and machine learning, associated with poor generalization properties for both traditional classifiers and, perhaps even more so, flexible classifiers like neural networks. Here we propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations. The combined penalties result in improved generalization properties and strong robustness against overfitting in different settings of mislabeled training data and also against variation in initial parameter values when training. We provide a theoretical justification for our proposed method derived for a simple case of logistic regression. We demonstrate the double regularization model, here denoted by DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling. We also illustrate that DRFit identifies mislabeled data points with very good precision. This provides strong support for DRFit as a practical of-the-shelf classifier, since, without any sacrifice in performance, we get a classifier that simultaneously reduces overfitting against mislabeling and gives an accurate measure of the trustworthiness of the labels.
The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments
Due to the limitations of realizing artificial neural networks on prevalent von Neumann architectures, recent studies have presented neuromorphic systems based on spiking neural networks (SNNs) to reduce power and computational cost. However, conventional analog voltage-domain integrate-and-fire (I&F) neuron circuits, based on either current mirrors or op-amps, pose serious issues such as nonlinearity or high power consumption, thereby degrading either inference accuracy or energy efficiency of the SNN. To achieve excellent energy efficiency and high accuracy simultaneously, this paper presents a low-power highly linear time-domain I&F neuron circuit. Designed and simulated in a 28nm CMOS process, the proposed neuron leads to more than 4.3x lower error rate on the MNIST inference over the conventional current-mirror-based neurons. In addition, the power consumed by the proposed neuron circuit is simulated to be 0.230uW per neuron, which is orders of magnitude lower than the existing voltage-domain neurons.
Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using $\ell_1$-norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the difference of their distributions using a kernel based method. Finally, the desired ANN is learned based on the information from both the ground-truth and teacher, progressively. The effectiveness of the proposed method for learning ANN with higher performance is then well-verified on several benchmarks. For instance, the ANN-50 trained using the proposed PKKD method obtains a 76.8\% top-1 accuracy on ImageNet dataset, which is 0.6\% higher than that of the ResNet-50.
Graph neural network (GNN) has shown superior performance in dealing with graphs, which has attracted considerable research attention recently. However, most of the existing GNN models are primarily designed for graphs in Euclidean spaces. Recent research has proven that the graph data exhibits non-Euclidean latent anatomy. Unfortunately, there was rarely study of GNN in non-Euclidean settings so far. To bridge this gap, in this paper, we study the GNN with attention mechanism in hyperbolic spaces at the first attempt. The research of hyperbolic GNN has some unique challenges: since the hyperbolic spaces are not vector spaces, the vector operations (e.g., vector addition, subtraction, and scalar multiplication) cannot be carried. To tackle this problem, we employ the gyrovector spaces, which provide an elegant algebraic formalism for hyperbolic geometry, to transform the features in a graph; and then we propose the hyperbolic proximity based attention mechanism to aggregate the features. Moreover, as mathematical operations in hyperbolic spaces could be more complicated than those in Euclidean spaces, we further devise a novel acceleration strategy using logarithmic and exponential mappings to improve the efficiency of our proposed model. The comprehensive experimental results on four real-world datasets demonstrate the performance of our proposed hyperbolic graph attention network model, by comparisons with other state-of-the-art baseline methods.
Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel convolutional self-attention network (CSAN), which offers SAN the abilities to 1) capture neighboring dependencies, and 2) model the interaction between multiple attention heads. Experimental results on WMT14 English-to-German translation task demonstrate that the proposed approach outperforms both the strong Transformer baseline and other existing works on enhancing the locality of SAN. Comparing with previous work, our model does not introduce any new parameters.
The classification of sentences is very challenging, since sentences contain the limited contextual information. In this paper, we proposed an Attention-Gated Convolutional Neural Network (AGCNN) for sentence classification, which generates attention weights from the feature's context windows of different sizes by using specialized convolution encoders. It makes full use of limited contextual information to extract and enhance the influence of important features in predicting the sentence's category. Experimental results demonstrated that our model can achieve up to 3.1% higher accuracy than standard CNN models, and gain competitive results over the baselines on four out of the six tasks. Besides, we designed an activation function, namely, Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed that NLReLU can outperform ReLU and is comparable to other well-known activation functions on AGCNN.
For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.