Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.
Single-particle entangled states (SPES) can offer a more secure way of encoding and processing quantum information than their multi-particle counterparts. The SPES generated via a 2D alternate quantum-walk setup from initially separable states can be either 3-way or 2-way entangled. This letter shows that the generated genuine three-way and nonlocal two-way SPES can be used as cryptographic keys to securely encode two distinct messages simultaneously. We detail the message encryption-decryption steps and show the resilience of the 3-way and 2-way SPES-based cryptographic protocols against eavesdropper attacks like intercept-and-resend and man-in-the-middle. We also detail how these protocols can be experimentally realized using single photons, with the three degrees of freedom being OAM, path, and polarization. These have unparalleled security for quantum communication tasks. The ability to simultaneously encode two distinct messages using the generated SPES showcases the versatility and efficiency of the proposed cryptographic protocol. This capability could significantly improve the throughput of quantum communication systems.
In backbone networks, it is fundamental to quickly protect traffic against any unexpected event, such as failures or congestions, which may impact Quality of Service (QoS). Standard solutions based on Segment Routing (SR), such as Topology-Independent Loop-Free Alternate (TI-LFA), are used in practice to handle failures, but no distributed solutions exist for distributed and tactical congestion mitigation. A promising approach leveraging SR has been recently proposed to quickly steer traffic away from congested links over alternative paths. As the pre-computation of alternative paths plays a paramount role to efficiently mitigating congestions, we investigate the associated path computation problem aiming at maximizing the amount of traffic that can be rerouted as well as the resilience against any 1-link failure. In particular, we focus on two variants of this problem. First, we maximize the residual flow after all possible failures. We show that the problem is NP-Hard, and we solve it via a Benders decomposition algorithm. Then, to provide a practical and scalable solution, we solve a relaxed variant problem, that maximizes, instead of flow, the number of surviving alternative paths after all possible failures. We provide a polynomial algorithm. Through numerical experiments, we compare the two variants and show that they allow to increase the amount of rerouted traffic and the resiliency of the network after any 1-link failure.
When they occur, azimuthal thermoacoustic oscillations can detrimentally affect the safe operation of gas turbines and aeroengines. We develop a real-time digital twin of azimuthal thermoacoustics of a hydrogen-based annular combustor. The digital twin seamlessly combines two sources of information about the system (i) a physics-based low-order model; and (ii) raw and sparse experimental data from microphones, which contain both aleatoric noise and turbulent fluctuations. First, we derive a low-order thermoacoustic model for azimuthal instabilities, which is deterministic. Second, we propose a real-time data assimilation framework to infer the acoustic pressure, the physical parameters, and the model and measurement biases simultaneously. This is the bias-regularized ensemble Kalman filter (r-EnKF), for which we find an analytical solution that solves the optimization problem. Third, we propose a reservoir computer, which infers both the model bias and measurement bias to close the assimilation equations. Fourth, we propose a real-time digital twin of the azimuthal thermoacoustic dynamics of a laboratory hydrogen-based annular combustor for a variety of equivalence ratios. We find that the real-time digital twin (i) autonomously predicts azimuthal dynamics, in contrast to bias-unregularized methods; (ii) uncovers the physical acoustic pressure from the raw data, i.e., it acts as a physics-based filter; (iii) is a time-varying parameter system, which generalizes existing models that have constant parameters, and capture only slow-varying variables. The digital twin generalizes to all equivalence ratios, which bridges the gap of existing models. This work opens new opportunities for real-time digital twinning of multi-physics problems.
Physics-informed neural networks (PINN) is a extremely powerful paradigm used to solve equations encountered in scientific computing applications. An important part of the procedure is the minimization of the equation residual which includes, when the equation is time-dependent, a time sampling. It was argued in the literature that the sampling need not be uniform but should overweight initial time instants, but no rigorous explanation was provided for these choice. In this paper we take some prototypical examples and, under standard hypothesis concerning the neural network convergence, we show that the optimal time sampling follows a truncated exponential distribution. In particular we explain when the time sampling is best to be uniform and when it should not be. The findings are illustrated with numerical examples on linear equation, Burgers' equation and the Lorenz system.
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analyzing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.
In cluster-randomized experiments, there is emerging interest in exploring the causal mechanism in which a cluster-level treatment affects the outcome through an intermediate outcome. Despite an extensive development of causal mediation methods in the past decade, only a few exceptions have been considered in assessing causal mediation in cluster-randomized studies, all of which depend on parametric model-based estimators. In this article, we develop the formal semiparametric efficiency theory to motivate several doubly-robust methods for addressing several mediation effect estimands corresponding to both the cluster-average and the individual-level treatment effects in cluster-randomized experiments--the natural indirect effect, natural direct effect, and spillover mediation effect. We derive the efficient influence function for each mediation effect, and carefully parameterize each efficient influence function to motivate practical strategies for operationalizing each estimator. We consider both parametric working models and data-adaptive machine learners to estimate the nuisance functions, and obtain semiparametric efficient causal mediation estimators in the latter case. Our methods are illustrated via extensive simulations and two completed cluster-randomized experiments.
A neural architecture with randomly initialized weights, in the infinite width limit, is equivalent to a Gaussian Random Field whose covariance function is the so-called Neural Network Gaussian Process kernel (NNGP). We prove that a reproducing kernel Hilbert space (RKHS) defined by the NNGP contains only functions that can be approximated by the architecture. To achieve a certain approximation error the required number of neurons in each layer is defined by the RKHS norm of the target function. Moreover, the approximation can be constructed from a supervised dataset by a random multi-layer representation of an input vector, together with training of the last layer's weights. For a 2-layer NN and a domain equal to an $n-1$-dimensional sphere in ${\mathbb R}^n$, we compare the number of neurons required by Barron's theorem and by the multi-layer features construction. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k^{-n-\frac{2}{3}}$ where $k$ is an order of an eigenvalue, then our theorem guarantees a more succinct neural network approximation than Barron's theorem. We also make some computational experiments to verify our theoretical findings. Our experiments show that realistic neural networks easily learn target functions even when both theorems do not give any guarantees.
Correspondence analysis (CA) is a popular technique to visualize the relationship between two categorical variables. CA uses the data from a two-way contingency table and is affected by the presence of outliers. The supplementary points method is a popular method to handle outliers. Its disadvantage is that the information from entire rows or columns is removed. However, outliers can be caused by cells only. In this paper, a reconstitution algorithm is introduced to cope with such cells. This algorithm can reduce the contribution of cells in CA instead of deleting entire rows or columns. Thus the remaining information in the row and column involved can be used in the analysis. The reconstitution algorithm is compared with two alternative methods for handling outliers, the supplementary points method and MacroPCA. It is shown that the proposed strategy works well.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of several anatomical structures (ranging from the large organs to thin vessels) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training class-specific models. To this end, we propose a two-stage, coarse-to-fine approach that will first use a 3D FCN to roughly define a candidate region, which will then be used as input to a second 3D FCN. This reduces the number of voxels the second FCN has to classify to ~10% and allows it to focus on more detailed segmentation of the organs and vessels. We utilize training and validation sets consisting of 331 clinical CT images and test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans, targeting three anatomical organs (liver, spleen, and pancreas). In challenging organs such as the pancreas, our cascaded approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. We compare with a 2D FCN method on a separate dataset of 240 CT scans with 18 classes and achieve a significantly higher performance in small organs and vessels. Furthermore, we explore fine-tuning our models to different datasets. Our experiments illustrate the promise and robustness of current 3D FCN based semantic segmentation of medical images, achieving state-of-the-art results. Our code and trained models are available for download: //github.com/holgerroth/3Dunet_abdomen_cascade.