An underlying mechanism for successful deep learning (DL) with a limited deep architecture and dataset, namely VGG-16 on CIFAR-10, was recently presented based on a quantitative method to measure the quality of a single filter in each layer. In this method, each filter identifies small clusters of possible output labels, with additional noise selected as labels out of the clusters. This feature is progressively sharpened with the layers, resulting in an enhanced signal-to-noise ratio (SNR) and higher accuracy. In this study, the suggested universal mechanism is verified for VGG-16 and EfficientNet-B0 trained on the CIFAR-100 and ImageNet datasets with the following main results. First, the accuracy progressively increases with the layers, whereas the noise per filter typically progressively decreases. Second, for a given deep architecture, the maximal error rate increases approximately linearly with the number of output labels. Third, the average filter cluster size and the number of clusters per filter at the last convolutional layer adjacent to the output layer are almost independent of the number of dataset labels in the range [3, 1,000], while a high SNR is preserved. The presented DL mechanism suggests several techniques, such as applying filter's cluster connections (AFCC), to improve the computational complexity and accuracy of deep architectures and furthermore pinpoints the simplification of pre-existing structures while maintaining their accuracies.
The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL, the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing AI-targeted SE tools. On the other hand, tools tackling more general engineering issues in DL are actively used and referred to under the umbrella term of ``MLOps tools''. Furthermore, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous MSR research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 70% of the GitHub repositories mined contained at least one conventional SE tool. Software configuration management tools are the most adopted, while the opposite applies to maintenance tools. Substantially fewer MLOps tools were in use, with only 9 tools out of a sample of 80 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling by open-source projects, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.
Purpose: To develop an open-source, fully-automatic deep learning algorithm, DeepGPET, for choroid region segmentation in optical coherence tomography (OCT) data. Methods: We used a dataset of 715 OCT B-scans (82 subjects, 115 eyes) from 3 clinical studies related to systemic disease. Ground truth segmentations were generated using a clinically validated, semi-automatic choroid segmentation method, Gaussian Process Edge Tracing (GPET). We finetuned a UNet with MobileNetV3 backbone pre-trained on ImageNet. Standard segmentation agreement metrics, as well as derived measures of choroidal thickness and area, were used to evaluate DeepGPET, alongside qualitative evaluation from a clinical ophthalmologist. Results: DeepGPET achieves excellent agreement with GPET on data from 3 clinical studies (AUC=0.9994, Dice=0.9664; Pearson correlation of 0.8908 for choroidal thickness and 0.9082 for choroidal area), while reducing the mean processing time per image on a standard laptop CPU from 34.49s ($\pm$15.09) using GPET to 1.25s ($\pm$0.10) using DeepGPET. Both methods performed similarly according to a clinical ophthalmologist, who qualitatively judged a subset of segmentations by GPET and DeepGPET, based on smoothness and accuracy of segmentations. Conclusions: DeepGPET, a fully-automatic, open-source algorithm for choroidal segmentation, will enable researchers to efficiently extract choroidal measurements, even for large datasets. As no manual interventions are required, DeepGPET is less subjective than semi-automatic methods and could be deployed in clinical practice without necessitating a trained operator.
This work presents a comparative study to numerically compute impulse approximate controls for parabolic equations with various boundary conditions. Theoretical controllability results have been recently investigated using a logarithmic convexity estimate at a single time based on a Carleman commutator approach. We propose a numerical algorithm for computing the impulse controls with minimal $L^2$-norms by adapting a penalized Hilbert Uniqueness Method (HUM) combined with a Conjugate Gradient (CG) method. We consider static boundary conditions (Dirichlet and Neumann) and dynamic boundary conditions. Some numerical experiments based on our developed algorithm are given to validate and compare the theoretical impulse controllability results.
The fundamental computational issues in Bayesian inverse problems (BIPs) governed by partial differential equations (PDEs) stem from the requirement of repeated forward model evaluations. A popular strategy to reduce such cost is to replace expensive model simulations by computationally efficient approximations using operator learning, motivated by recent progresses in deep learning. However, using the approximated model directly may introduce a modeling error, exacerbating the already ill-posedness of inverse problems. Thus, balancing between accuracy and efficiency is essential for the effective implementation of such approaches. To this end, we develop an adaptive operator learning framework that can reduce modeling error gradually by forcing the surrogate to be accurate in local areas. This is accomplished by fine-tuning the pre-trained approximate model during the inversion process with adaptive points selected by a greedy algorithm, which requires only a few forward model evaluations. To validate our approach, we adopt DeepOnet to construct the surrogate and use unscented Kalman inversion (UKI) to approximate the solution of BIPs, respectively. Furthermore, we present rigorous convergence guarantee in the linear case using the framework of UKI. We test the approach on several benchmarks, including the Darcy flow, the heat source inversion problem, and the reaction diffusion problems. Numerical results demonstrate that our method can significantly reduce computational costs while maintaining inversion accuracy.
We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.