Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of 1) multi-scale dynamic weighted gated dilated convolutional pyramids network (FurcaPy), 2) gated TCN with intra-parallel convolutional components (FurcaPa), 3) weight-shared multi-scale gated TCN (FurcaSh), 4) dilated TCN with gated difference-convolutional component (FurcaSu), that all these networks take the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. For the objective, we propose to train the network by directly optimizing utterance level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the the public WSJ0-2mix data corpus results in 18.4dB SDR improvement, which shows our proposed networks can leads to performance improvement on the speaker separation task.
In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.
In this paper, we investigate the physical layer security capabilities of reconfigurable intelligent surface (RIS) empowered wireless systems. In more detail, we consider a general system model, in which the links between the transmitter (TX) and the RIS as well as the links between the RIS and the legitimate receiver are modeled as mixture Gamma (MG) random variables (RVs). Moreover, the link between the TX and eavesdropper is also modeled as a MG RV. Building upon this system model, we derive the probability of zero-secrecy capacity as well as the probability of information leakage. Finally, we extract the average secrecy rate for both cases of TX having full and partial channel state information knowledge.
Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the trade-off between generation speed and quality by comparing two attention based architectures, as well as the potential of consistency distillation to reduce the number of diffusion steps. Both the faster architecture and consistency models demonstrate performance surpassing many competing models, with generation time up to two orders of magnitude faster than PC-JeDi and three orders of magnitude faster than Delphes.
In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of $1/\sqrt{n}$ (up to a logarithmic factor). Our statistical analysis implies that the key aspect behind this result is the proper choice of the initial inner weights and the adjustment of the outer weights via gradient descent. This indicates that we can also simply use linear least squares to choose the outer weights. We prove a corresponding theoretical result and compare our new linear least squares neural network estimate with standard neural network estimates via simulated data. Our simulations show that our theoretical considerations lead to an estimate with an improved performance in many cases.
Deep neural operators (DNOs) have been utilized to approximate nonlinear mappings between function spaces. However, DNOs face the challenge of increased dimensionality and computational cost associated with unaligned observation data. In this study, we propose a hybrid Decoder-DeepONet operator regression framework to handle unaligned data effectively. Additionally, we introduce a Multi-Decoder-DeepONet, which utilizes an average field of training data as input augmentation. The consistencies of the frameworks with the operator approximation theory are provided, on the basis of the universal approximation theorem. Two numerical experiments, Darcy problem and flow-field around an airfoil, are conducted to validate the efficiency and accuracy of the proposed methods. Results illustrate the advantages of Decoder-DeepONet and Multi-Decoder-DeepONet in handling unaligned observation data and showcase their potentials in improving prediction accuracy.
In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language
Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept. In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.