In the present work, we describe a framework for modeling how models can be built that integrates concepts and methods from a wide range of fields. The information schism between the real-world and that which can be gathered and considered by any individual information processing agent is characterized and discussed, which is followed by the presentation of a series of the adopted requisites while developing the modeling approach. The issue of mapping from datasets into models is subsequently addressed, as well as some of the respectively implied difficulties and limitations. Based on these considerations, an approach to meta modeling how models are built is then progressively developed. First, the reference M^* meta model framework is presented, which relies critically in associating whole datasets and respective models in terms of a strict equivalence relation. Among the interesting features of this model are its ability to bridge the gap between data and modeling, as well as paving the way to an algebra of both data and models which can be employed to combine models into hierarchical manner. After illustrating the M* model in terms of patterns derived from regular lattices, the reported modeling approach continues by discussing how sampling issues, error and overlooked data can be addressed, leading to the $M^{<\epsilon>}$ variant. The situation in which the data needs to be represented in terms of respective probability densities is treated next, yielding the $M^{<\sigma>}$ meta model, which is then illustrated respectively to a real-world dataset (iris flowers data). Several considerations about how the developed framework can provide insights about data clustering, complexity, collaborative research, deep learning, and creativity are then presented, followed by overall conclusions.
We consider the problem of optimizing the distribution operations at a drone hub that dispatches drones to different geographic locations generating stochastic demands for medical supplies. Drone delivery is an innovative method that introduces many benefits, such as low-contact delivery, thereby reducing the spread of pandemic and vaccine-preventable diseases. While we focus on medical supply delivery for this work, drone delivery is suitable for many other items, including food, postal parcels, and e-commerce. In this paper, our goal is to address drone delivery challenges related to the stochastic demands of different geographic locations. We consider different classes of demand related to geographic locations that require different flight ranges, which is directly related to the amount of charge held in a drone battery. We classify the stochastic demands based on their distance from the drone hub, use a Markov decision process to model the problem, and perform computational tests using realistic data representing a prominent drone delivery company. We solve the problem using a reinforcement learning method and show its high performance compared with the exact solution found using dynamic programming. Finally, we analyze the results and provide insights for managing the drone hub operations.
Cognitive Diagnosis Models (CDMs) are a special family of discrete latent variable models widely used in educational, psychological and social sciences. In many applications of CDMs, certain hierarchical structures among the latent attributes are assumed by researchers to characterize their dependence structure. Specifically, a directed acyclic graph is used to specify hierarchical constraints on the allowable configurations of the discrete latent attributes. In this paper, we consider the important yet unaddressed problem of testing the existence of latent hierarchical structures in CDMs. We first introduce the concept of testability of hierarchical structures in CDMs and present sufficient conditions. Then we study the asymptotic behaviors of the likelihood ratio test (LRT) statistic, which is widely used for testing nested models. Due to the irregularity of the problem, the asymptotic distribution of LRT becomes nonstandard and tends to provide unsatisfactory finite sample performance under practical conditions. We provide statistical insights on such failures, and propose to use parametric bootstrap to perform the testing. We also demonstrate the effectiveness and superiority of parametric bootstrap for testing the latent hierarchies over non-parametric bootstrap and the na\"ive Chi-squared test through comprehensive simulations and an educational assessment dataset.
Automation and robotisation of the agricultural sector are seen as a viable solution to socio-economic challenges faced by this industry. This technology often relies on intelligent perception systems providing information about crops, plants and the entire environment. The challenges faced by traditional 2D vision systems can be addressed by modern 3D vision systems which enable straightforward localisation of objects, size and shape estimation, or handling of occlusions. So far, the use of 3D sensing was mainly limited to indoor or structured environments. In this paper, we evaluate modern sensing technologies including stereo and time-of-flight cameras for 3D perception of shape in agriculture and study their usability for segmenting out soft fruit from background based on their shape. To that end, we propose a novel 3D deep neural network which exploits the organised nature of information originating from the camera-based 3D sensors. We demonstrate the superior performance and efficiency of the proposed architecture compared to the state-of-the-art 3D networks. Through a simulated study, we also show the potential of the 3D sensing paradigm for object segmentation in agriculture and provide insights and analysis of what shape quality is needed and expected for further analysis of crops. The results of this work should encourage researchers and companies to develop more accurate and robust 3D sensing technologies to assure their wider adoption in practical agricultural applications.
Earth-observing satellite instruments obtain a massive number of observations every day. For example, tens of millions of sea surface temperature (SST) observations on a global scale are collected daily by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. Despite their size, such datasets are incomplete and noisy, necessitating spatial statistical inference to obtain complete, high-resolution fields with quantified uncertainties. Such inference is challenging due to the high computational cost, the nonstationary behavior of environmental processes on a global scale, and land barriers affecting the dependence of SST. In this work, we develop a multi-resolution approximation (M-RA) of a Gaussian process (GP) whose nonstationary, global covariance function is obtained using local fits. The M-RA requires domain partitioning, which can be set up application-specifically. In the SST case, we partition the domain purposefully to account for and weaken dependence across land barriers. Our M-RA implementation is tailored to distributed-memory computation in high-performance-computing environments. We analyze a MODIS SST dataset consisting of more than 43 million observations, to our knowledge the largest dataset ever analyzed using a probabilistic GP model. We show that our nonstationary model based on local fits provides substantially improved predictive performance relative to a stationary approach.
Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are typically controlled by free parameters that are estimated from data by maximum-likelihood estimation. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models, VAEs and normalising flows, from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.
This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at //github.com/mgong2/DA_Infer .
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.