Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. To shed light on the hidden layers of DNN, we study supervised learning by a DNN of width $N$ and depth $L$ consisting of $NL$ perceptrons with $c$ inputs by a statistical mechanics approach called the teacher-student setting. We consider an ensemble of student machines that exactly reproduce $M$ sets of $N$ dimensional input/output relations provided by a teacher machine. We show that the problem becomes exactly solvable in what we call as 'dense limit': $N \gg c \gg 1$ and $M \gg 1$ with fixed $\alpha=M/c$ using the replica method developed in (H. Yoshino, (2020)). We also study the model numerically performing simple greedy MC simulations. Simulations reveal that learning by the DNN is quite heterogeneous in the network space: configurations of the teacher and the student machines are more correlated within the layers closer to the input/output boundaries while the central region remains much less correlated due to the over-parametrization in qualitative agreement with the theoretical prediction. We evaluate the generalization-error of the DNN with various depth $L$ both theoretically and numerically. Remarkably both the theory and simulation suggest generalization-ability of the student machines, which are only weakly correlated with the teacher in the center, does not vanish even in the deep limit $L \gg 1$ where the system becomes heavily over-parametrized. We also consider the impact of effective dimension $D(\leq N)$ of data by incorporating the hidden manifold model (S. Goldt et. al., (2020)) into our model. The theory implies that the loop corrections to the dense limit become enhanced by either decreasing the width $N$ or decreasing the effective dimension $D$ of the data. Simulation suggests both lead to significant improvements in generalization-ability.
Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and physics-informed Hamiltonian neural networks learning H\'enon-Heiles stellar orbits and the swing of a video recorded pendulum clock. Such \textit{learned diversity} provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems.
We explore the potential of Time-Bin Conference Key Agreement (TB CKA) protocol as a means to achieve consensus among multiple parties. We provide an explanation of the underlying physical implementation, i.e. TB CKA fundamentals and illustrate how this process can be seen as a natural realization of the global common coin primitive. Next, we present how TB CKA could be embodied in classical consensus algorithms to create hybrid classical-quantum solutions to the Byzantine Agreement problem.
Gaussian elimination (GE) is the most used dense linear solver. Error analysis of GE with selected pivoting strategies on well-conditioned systems can focus on studying the behavior of growth factors. Although exponential growth is possible with GE with partial pivoting (GEPP), growth tends to stay much smaller in practice. Support for this behavior was provided last year by Huang and Tikhomirov's average-case analysis of GEPP, which showed GEPP growth factors stay at most polynomial with very high probability when using small Gaussian perturbations. GE with complete pivoting (GECP) has also seen a lot of recent interest, with recent improvements to lower bounds on worst-case GECP growth provided by Edelman and Urschel earlier this year. We are interested in studying how GEPP and GECP behave on the same linear systems as well as studying large growth on particular subclasses of matrices, including orthogonal matrices. We will also study systems when GECP leads to larger growth than GEPP, which will lead to new empirical lower bounds on how much worse GECP can behave compared to GEPP in terms of growth. We also present an empirical study on a family of exponential GEPP growth matrices whose polynomial behavior in small neighborhoods limits to the initial GECP growth factor.
Deep neural networks (DNNs) have garnered significant attention in various fields of science and technology in recent years. Activation functions define how neurons in DNNs process incoming signals for them. They are essential for learning non-linear transformations and for performing diverse computations among successive neuron layers. In the last few years, researchers have investigated the approximation ability of DNNs to explain their power and success. In this paper, we explore the approximation ability of DNNs using a different activation function, called SignReLU. Our theoretical results demonstrate that SignReLU networks outperform rational and ReLU networks in terms of approximation performance. Numerical experiments are conducted comparing SignReLU with the existing activations such as ReLU, Leaky ReLU, and ELU, which illustrate the competitive practical performance of SignReLU.
Step selection functions (SSFs) are flexible models to jointly describe animals' movement and habitat preferences. Their popularity has grown rapidly and extensions have been developed to increase their utility, including various distributions to describe movement constraints, interactions to allow movements to depend on local environmental features, and random effects and latent states to account for within- and among-individual variability. Although the SSF is a relatively simple statistical model, its presentation has not been consistent in the literature, leading to confusion about model flexibility and interpretation. We believe that part of the confusion has arisen from the conflation of the SSF model with the methods used for parameter estimation. Notably, conditional logistic regression can be used to fit SSFs in exponential form, and this approach is often presented interchangeably with the actual model (the SSF itself). However, reliance on conditional logistic regression reduces model flexibility, and suggests a misleading interpretation of step selection analysis as being equivalent to a case-control study. In this review, we explicitly distinguish between model formulation and inference technique, presenting a coherent framework to fit SSFs based on numerical integration and maximum likelihood estimation. We provide an overview of common numerical integration techniques, and explain how they relate to step selection analyses. This framework unifies different model fitting techniques for SSFs, and opens the way for improved inference. In particular, it makes it straightforward to model movement with distributions outside the exponential family, and to apply different SSF formulations to a data set and compare them with AIC. By separating the model formulation from the inference technique, we hope to clarify many important concepts in step selection analysis.
Parkinson's Disease (PD) is associated with gait movement disorders, such as bradykinesia, stiffness, tremors and postural instability, caused by progressive dopamine deficiency. Today, some approaches have implemented learning representations to quantify kinematic patterns during locomotion, supporting clinical procedures such as diagnosis and treatment planning. These approaches assumes a large amount of stratified and labeled data to optimize discriminative representations. Nonetheless these considerations may restrict the approaches to be operable in real scenarios during clinical practice. This work introduces a self-supervised generative representation to learn gait-motion-related patterns, under the pretext of video reconstruction and an anomaly detection framework. This architecture is trained following a one-class weakly supervised learning to avoid inter-class variance and approach the multiple relationships that represent locomotion. The proposed approach was validated with 14 PD patients and 23 control subjects, and trained with the control population only, achieving an AUC of 95%, homocedasticity level of 70% and shapeness level of 70% in the classification task considering its generalization.
Neural networks have been able to generate high-quality single-sentence speech with substantial expressiveness. However, it remains a challenge concerning paragraph-level speech synthesis due to the need for coherent acoustic features while delivering fluctuating speech styles. Meanwhile, training these models directly on over-length speech leads to a deterioration in the quality of synthesis speech. To address these problems, we propose a high-quality and expressive paragraph speech synthesis system with a multi-step variational autoencoder. Specifically, we employ multi-step latent variables to capture speech information at different grammatical levels before utilizing these features in parallel to generate speech waveform. We also propose a three-step training method to improve the decoupling ability. Our model was trained on a single-speaker French audiobook corpus released at Blizzard Challenge 2023. Experimental results underscore the significant superiority of our system over baseline models.
Diffractive deep neural networks (D2NNs) are composed of successive transmissive layers optimized using supervised deep learning to all-optically implement various computational tasks between an input and output field-of-view (FOV). Here, we present a pyramid-structured diffractive optical network design (which we term P-D2NN), optimized specifically for unidirectional image magnification and demagnification. In this P-D2NN design, the diffractive layers are pyramidally scaled in alignment with the direction of the image magnification or demagnification. Our analyses revealed the efficacy of this P-D2NN design in unidirectional image magnification and demagnification tasks, producing high-fidelity magnified or demagnified images in only one direction, while inhibiting the image formation in the opposite direction - confirming the desired unidirectional imaging operation. Compared to the conventional D2NN designs with uniform-sized successive diffractive layers, P-D2NN design achieves similar performance in unidirectional magnification tasks using only half of the diffractive degrees of freedom within the optical processor volume. Furthermore, it maintains its unidirectional image magnification/demagnification functionality across a large band of illumination wavelengths despite being trained with a single illumination wavelength. With this pyramidal architecture, we also designed a wavelength-multiplexed diffractive network, where a unidirectional magnifier and a unidirectional demagnifier operate simultaneously in opposite directions, at two distinct illumination wavelengths. The efficacy of the P-D2NN architecture was also validated experimentally using monochromatic terahertz illumination, successfully matching our numerical simulations. P-D2NN offers a physics-inspired strategy for designing task-specific visual processors.
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.