The quality of training datasets for deep neural networks is a key factor contributing to the accuracy of resulting models. This effect is amplified in difficult tasks such as object detection. Dealing with errors in datasets is often limited to accepting that some fraction of examples are incorrect, estimating their confidence, and either assigning appropriate weights or ignoring uncertain ones during training. In this work, we propose a different approach. We introduce the Confident Learning for Object Detection (CLOD) algorithm for assessing the quality of each label in object detection datasets, identifying missing, spurious, mislabeled, and mislocated bounding boxes and suggesting corrections. By focusing on finding incorrect examples in the training datasets, we can eliminate them at the root. Suspicious bounding boxes can be reviewed to improve the quality of the dataset, leading to better models without further complicating their already complex architectures. The proposed method is able to point out nearly 80% of artificially disturbed bounding boxes with a false positive rate below 0.1. Cleaning the datasets by applying the most confident automatic suggestions improved mAP scores by 16% to 46%, depending on the dataset, without any modifications to the network architectures. This approach shows promising potential in rectifying state-of-the-art object detection datasets.
A mesh motion model based on deep operator networks is presented. The model is trained on and evaluated against a biharmonic mesh motion model on a fluid-structure interaction benchmark problem and further evaluated in a setting where biharmonic mesh motion fails. The performance of the proposed mesh motion model is comparable to the biharmonic mesh motion on the test problems.
Conventional neural network elastoplasticity models are often perceived as lacking interpretability. This paper introduces a two-step machine learning approach that returns mathematical models interpretable by human experts. In particular, we introduce a surrogate model where yield surfaces are expressed in terms of a set of single-variable feature mappings obtained from supervised learning. A post-processing step is then used to re-interpret the set of single-variable neural network mapping functions into mathematical form through symbolic regression. This divide-and-conquer approach provides several important advantages. First, it enables us to overcome the scaling issue of symbolic regression algorithms. From a practical perspective, it enhances the portability of learned models for partial differential equation solvers written in different programming languages. Finally, it enables us to have a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning. Numerical examples have been provided, along with an open-source code to enable third-party validation.
Accurate modeling of moving boundaries and interfaces is a difficulty present in many situations of computational mechanics. We use the eXtreme Mesh deformation approach (X-Mesh) to simulate the interaction between two immiscible flows using the finite element method, while maintaining an accurate and sharp description of the interface without remeshing. In this new approach, the mesh is locally deformed to conform to the interface at all times, which can result in degenerated elements. The surface tension between the two fluids is added by imposing the pressure jump condition at the interface, which, when combined with the X-Mesh framework, allows us to have an exactly sharp interface. If a numerical scheme fails to properly balance surface tension and pressure gradients, it leads to numerical artefacts called spurious or parasitic currents. The method presented here is well balanced and reduces such currents down to the level of machine precision.
Latitude on the choice of initialisation is a shared feature between one-step extended state-space and multi-step methods. The paper focuses on lattice Boltzmann schemes, which can be interpreted as examples of both previous categories of numerical schemes. We propose a modified equation analysis of the initialisation schemes for lattice Boltzmann methods, determined by the choice of initial data. These modified equations provide guidelines to devise and analyze the initialisation in terms of order of consistency with respect to the target Cauchy problem and time smoothness of the numerical solution. In detail, the larger the number of matched terms between modified equations for initialisation and bulk methods, the smoother the obtained numerical solution. This is particularly manifest for numerical dissipation. Starting from the constraints to achieve time smoothness, which can quickly become prohibitive for they have to take the parasitic modes into consideration, we explain how the distinct lack of observability for certain lattice Boltzmann schemes -- seen as dynamical systems on a commutative ring -- can yield rather simple conditions and be easily studied as far as their initialisation is concerned. This comes from the reduced number of initialisation schemes at the fully discrete level. These theoretical results are successfully assessed on several lattice Boltzmann methods.
In recent years, deep learning has gained increasing popularity in the fields of Partial Differential Equations (PDEs) and Reduced Order Modeling (ROM), providing domain practitioners with new powerful data-driven techniques such as Physics-Informed Neural Networks (PINNs), Neural Operators, Deep Operator Networks (DeepONets) and Deep-Learning based ROMs (DL-ROMs). In this context, deep autoencoders based on Convolutional Neural Networks (CNNs) have proven extremely effective, outperforming established techniques, such as the reduced basis method, when dealing with complex nonlinear problems. However, despite the empirical success of CNN-based autoencoders, there are only a few theoretical results supporting these architectures, usually stated in the form of universal approximation theorems. In particular, although the existing literature provides users with guidelines for designing convolutional autoencoders, the subsequent challenge of learning the latent features has been barely investigated. Furthermore, many practical questions remain unanswered, e.g., the number of snapshots needed for convergence or the neural network training strategy. In this work, using recent techniques from sparse high-dimensional function approximation, we fill some of these gaps by providing a new practical existence theorem for CNN-based autoencoders when the parameter-to-solution map is holomorphic. This regularity assumption arises in many relevant classes of parametric PDEs, such as the parametric diffusion equation, for which we discuss an explicit application of our general theory.
Recent advances in deep generative models have greatly expanded the potential to create realistic synthetic health datasets. These synthetic datasets aim to preserve the characteristics, patterns, and overall scientific conclusions derived from sensitive health datasets without disclosing patient identity or sensitive information. Thus, synthetic data can facilitate safe data sharing that supports a range of initiatives including the development of new predictive models, advanced health IT platforms, and general project ideation and hypothesis development. However, many questions and challenges remain, including how to consistently evaluate a synthetic dataset's similarity and predictive utility in comparison to the original real dataset and risk to privacy when shared. Additional regulatory and governance issues have not been widely addressed. In this primer, we map the state of synthetic health data, including generation and evaluation methods and tools, existing examples of deployment, the regulatory and ethical landscape, access and governance options, and opportunities for further development.
Regression with random data objects is becoming increasingly common in modern data analysis. Unfortunately, like the traditional regression setting with Euclidean data, random response regression is not immune to the trouble caused by unusual observations. A metric Cook's distance extending the classical Cook's distances of Cook (1977) to general metric-valued response objects is proposed. The performance of the metric Cook's distance in both Euclidean and non-Euclidean response regression with Euclidean predictors is demonstrated in an extensive experimental study. A real data analysis of county-level COVID-19 transmission in the United States also illustrates the usefulness of this method in practice.
We introduce a general abstract framework for database repairing in which the repair notions are defined using formal logic. We differentiate between integrity constraints and the so-called query constraints. The former are used to model consistency and desirable properties of the data (such as functional dependencies and independencies), while the latter relates two database instances according to their answers for the query constraints. The framework also admits a distinction between hard and soft queries, allowing to preserve the answers of a core set of queries as well as defining a distance between instances based on query answers. We exemplify how various notions of repairs from the literature can be modelled in our unifying framework. Furthermore, we initiate a complexity-theoretic analysis of the problems of consistent query answering, repair computation, and existence of repair within the new framework. We present both coNP- and NP-hard cases that illustrate the interplay between computationally hard problems and more flexible repair notions. We show general upper bounds in NP and the second level of the polynomial hierarchy. Finally, we relate the existence of a repair to model checking of existential second-order logic.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.