Disordered many-body systems exhibit a wide range of emergent phenomena across different scales. These complex behaviors can be utilized for various information processing tasks such as error correction, learning, and optimization. Despite the empirical success of utilizing these systems for intelligent tasks, the underlying principles that govern their emergent intelligent behaviors remain largely unknown. In this thesis, we aim to characterize such emergent intelligence in disordered systems through statistical physics. We chart a roadmap for our efforts in this thesis based on two axes: learning mechanisms (long-term memory vs. working memory) and learning dynamics (artificial vs. natural). Throughout our journey, we uncover relationships between learning mechanisms and physical dynamics that could serve as guiding principles for designing intelligent systems. We hope that our investigation into the emergent intelligence of seemingly disparate learning systems can expand our current understanding of intelligence beyond neural systems and uncover a wider range of computational substrates suitable for AI applications.
The aim of this study was to develop a model to accurately identify corresponding points between organ segmentations of different patients for radiotherapy applications. A model for simultaneous correspondence and interpolation estimation in 3D shapes was trained with head and neck organ segmentations from planning CT scans. We then extended the original model to incorporate imaging information using two approaches: 1) extracting features directly from image patches, and 2) including the mean square error between patches as part of the loss function. The correspondence and interpolation performance were evaluated using the geodesic error, chamfer distance and conformal distortion metrics, as well as distances between anatomical landmarks. Each of the models produced significantly better correspondences than the baseline non-rigid registration approach. The original model performed similarly to the model with direct inclusion of image features. The best performing model configuration incorporated imaging information as part of the loss function which produced more anatomically plausible correspondences. We will use the best performing model to identify corresponding anatomical points on organs to improve spatial normalisation, an important step in outcome modelling, or as an initialisation for anatomically informed registrations. All our code is publicly available at //github.com/rrr-uom-projects/Unsup-RT-Corr-Net
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
For randomized trials that use text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by trained human raters. This process, the current standard, is both time-consuming and limiting: even the largest human coding efforts are typically constrained to measure only a small set of dimensions across a subsample of available texts. In this work, we present an inferential framework that can be used to increase the power of an impact assessment, given a fixed human-coding budget, by taking advantage of any ``untapped" observations -- those documents not manually scored due to time or resource constraints -- as a supplementary resource. Our approach, a methodological combination of causal inference, survey sampling methods, and machine learning, has four steps: (1) select and code a sample of documents; (2) build a machine learning model to predict the human-coded outcomes from a set of automatically extracted text features; (3) generate machine-predicted scores for all documents and use these scores to estimate treatment impacts; and (4) adjust the final impact estimates using the residual differences between human-coded and machine-predicted outcomes. As an extension to this approach, we also develop a strategy for identifying an optimal subset of documents to code in Step 1 in order to further enhance precision. Through an extensive simulation study based on data from a recent field trial in education, we show that our proposed approach can be used to reduce the scope of a human-coding effort while maintaining nominal power to detect a significant treatment impact.
The numerical solution of continuum damage mechanics (CDM) problems suffers from convergence-related challenges during the material softening stage, and consequently existing iterative solvers are subject to a trade-off between computational expense and solution accuracy. In this work, we present a novel unified arc-length (UAL) method, and we derive the formulation of the analytical tangent matrix and governing system of equations for both local and non-local gradient damage problems. Unlike existing versions of arc-length solvers that monolithically scale the external force vector, the proposed method treats the latter as an independent variable and determines the position of the system on the equilibrium path based on all the nodal variations of the external force vector. This approach renders the proposed solver substantially more efficient and robust than existing solvers used in CDM problems. We demonstrate the considerable advantages of the proposed algorithm through several benchmark 1D problems with sharp snap-backs and 2D examples under various boundary conditions and loading scenarios. The proposed UAL approach exhibits a superior ability of overcoming critical increments along the equilibrium path. Moreover, the proposed UAL method is 1-2 orders of magnitude faster than force-controlled arc-length and monolithic Newton-Raphson solvers.
The eigenvalue method, suggested by the developer of the extensively used Analytic Hierarchy Process methodology, exhibits right-left asymmetry: the priorities derived from the right eigenvector do not necessarily coincide with the priorities derived from the reciprocal left eigenvector. This paper offers a comprehensive numerical experiment to compare the two eigenvector-based weighting procedures and their reasonable alternative of the row geometric mean with respect to four measures. The underlying pairwise comparison matrices are constructed randomly with different dimensions and levels of inconsistency. The disagreement between the two eigenvectors turns out to be not always a monotonic function of these important characteristics of the matrix. The ranking contradictions can affect alternatives with relatively distant priorities. The row geometric mean is found to be almost at the midpoint between the right and inverse left eigenvectors, making it a straightforward compromise between them.
Pneumonia is the leading infectious cause of infant death in the world. When identified early, it is possible to alter the prognosis of the patient, one could use imaging exams to help in the diagnostic confirmation. Performing and interpreting the exams as soon as possible is vital for a good treatment, with the most common exam for this pathology being chest X-ray. The objective of this study was to develop a software that identify the presence or absence of pneumonia in chest radiographs. The software was developed as a computational model based on machine learning using transfer learning technique. For the training process, images were collected from a database available online with children's chest X-rays images taken at a hospital in China. After training, the model was then exposed to new images, achieving relevant results on identifying such pathology, reaching 98% sensitivity and 97.3% specificity for the sample used for testing. It can be concluded that it is possible to develop a software that identifies pneumonia in chest X-ray images.
We consider a sharp interface formulation for the multi-phase Mullins-Sekerka flow. The flow is characterized by a network of curves evolving such that the total surface energy of the curves is reduced, while the areas of the enclosed phases are conserved. Making use of a variational formulation, we introduce a fully discrete finite element method. Our discretization features a parametric approximation of the moving interfaces that is independent of the discretization used for the equations in the bulk. The scheme can be shown to be unconditionally stable and to satisfy an exact volume conservation property. Moreover, an inherent tangential velocity for the vertices on the discrete curves leads to asymptotically equidistributed vertices, meaning no remeshing is necessary in practice. Several numerical examples, including a convergence experiment for the three-phase Mullins-Sekerka flow, demonstrate the capabilities of the introduced method.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.