Deciding on the unimodality of a dataset is an important problem in data analysis and statistical modeling. It allows to obtain knowledge about the structure of the dataset, ie. whether data points have been generated by a probability distribution with a single or more than one peaks. Such knowledge is very useful for several data analysis problems, such as for deciding on the number of clusters and determining unimodal projections. We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset. The method operates on the empirical cumulative density function (ecdf) of the dataset. It attempts to build a piecewise linear approximation of the ecdf that is unimodal and models the data sufficiently in the sense that the data corresponding to each linear segment follows the uniform distribution. A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model. We present experimental results in order to assess the ability of the method to decide on unimodality and perform comparisons with the well-known dip-test approach. In addition, in the case of unimodal datasets we evaluate the Uniform Mixture Models provided by the proposed method using the test set log-likelihood and the two-sample Kolmogorov-Smirnov (KS) test.
In this article we formulate and implement a computational multiphase periporomechanics model for unguided fracturing in unsaturated porous media. The same governing equation for the solid phase applies on and off cracks. Crack formation in this framework is autonomous, requiring no prior estimates of crack topology. As a new contribution, an energy-based criterion for arbitrary crack formation is formulated using the peridynamic effective force state for unsaturated porous media. Unsaturated fluid flow in the fracture space is modeled in a simplified way in line with the nonlocal formulation of unsaturated fluid flow in bulk. The formulated unsaturated fracturing periporomechanics is numerically implemented through a fractional step algorithm in time and a two-phase mixed meshless method in space. The two-stage operator split converts the coupled periporomechanics problem into an undrained deformation and fracture problem and an unsaturated fluid flow in the deformed skeleton configuration. Numerical simulations of in-plane open and shear cracking are conducted to validate the accuracy and robustness of the fracturing unsaturated periporomechanics model. Then numerical examples of wing cracking and non-planar cracking in unsaturated soil specimens are presented to demonstrate the efficacy of the proposed multiphase periporomechanics model for unguided cracking in unsaturated porous media.
Examinations of any experiment involving living organisms require justifications of the need and moral defensibleness of the study. Statistical planning, design and sample size calculation of the experiment are no less important review criteria than general medical and ethical points to consider. Errors made in the statistical planning and data evaluation phase can have severe consequences on both results and conclusions. They might proliferate and thus impact future trials-an unintended outcome of fundamental research with profound ethical consequences. Therefore, any trial must be efficient in both a medical and statistical way in answering the questions of interests to be considered as approvable. Unified statistical standards are currently missing for animal review boards in Germany. In order to accompany, we developed a biometric form to be filled and handed in with the proposal at the local authority on animal welfare. It addresses relevant points to consider for biostatistical planning of animal experiments and can help both the applicants and the reviewers in overseeing the entire experiment(s) planned. Furthermore, the form might also aid in meeting the current standards set by the 3+3R's principle of animal experimentation Replacement, Reduction, Refinement as well as Robustness, Registration and Reporting. The form has already been in use by the local authority of animal welfare in Berlin, Germany. In addition, we provide reference to our user guide giving more detailed explanation and examples for each section of the biometric form. Unifying the set of biostatistical aspects will help both the applicants and the reviewers to equal standards and increase quality of preclinical research projects, also for translational, multicenter, or international studies.
Cities are complex products of human culture, characterised by a startling diversity of visible traits. Their form is constantly evolving, reflecting changing human needs and local contingencies, manifested in space by many urban patterns. Urban Morphology laid the foundation for understanding many such patterns, largely relying on qualitative research methods to extract distinct spatial identities of urban areas. However, the manual, labour-intensive and subjective nature of such approaches represents an impediment to the development of a scalable, replicable and data-driven urban form characterisation. Recently, advances in Geographic Data Science and the availability of digital mapping products, open the opportunity to overcome such limitations. And yet, our current capacity to systematically capture the heterogeneity of spatial patterns remains limited in terms of spatial parameters included in the analysis and hardly scalable due to the highly labour-intensive nature of the task. In this paper, we present a method for numerical taxonomy of urban form derived from biological systematics, which allows the rigorous detection and classification of urban types. Initially, we produce a rich numerical characterisation of urban space from minimal data input, minimizing limitations due to inconsistent data quality and availability. These are street network, building footprint, and morphological tessellation, a spatial unit derivative of Voronoi tessellation, obtained from building footprints. Hence, we derive homogeneous urban tissue types and, by determining overall morphological similarity between them, generate a hierarchical classification of urban form. After framing and presenting the method, we test it on two cities - Prague and Amsterdam - and discuss potential applications and further developments.
This work presents a new recursive robust filtering approach for feature-based 3D registration. Unlike the common state-of-the-art alignment algorithms, the proposed method has four advantages that have not yet occurred altogether in any previous solution. For instance, it is able to deal with inherent noise contaminating sensory data; it is robust to uncertainties caused by noisy feature localisation; it also combines the advantages of both (Formula presented.) and (Formula presented.) norms for a higher performance and a more prospective prevention of local minima. The result is an accurate and stable rigid body transformation. The latter enables a thorough control over the convergence regarding the alignment as well as a correct assessment of the quality of registration. The mathematical rationale behind the proposed approach is explained, and the results are validated on physical and synthetic data.
Online allocation problems with resource constraints have a rich history in operations research. In this paper, we introduce the \emph{regularized online allocation problem}, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources. The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints. Our primary motivation is allowing decision makers to trade off separable objectives such as the economic efficiency of an allocation with ancillary, non-separable objectives such as the fairness or equity of an allocation. We design an algorithm that is simple, fast, and attains good performance with both stochastic i.i.d.~and adversarial inputs. In particular, our algorithm is asymptotically optimal under stochastic i.i.d. input models and attains a fixed competitive ratio that depends on the regularizer when the input is adversarial. Furthermore, the algorithm and analysis do not require convexity or concavity of the reward function and the consumption function, which allows more model flexibility. Numerical experiments confirm the effectiveness of the proposed algorithm and of regularization in an internet advertising application.
Quantifying the inconsistency of a database is motivated by various goals including reliability estimation for new datasets and progress indication in data cleaning. Another goal is to attribute to individual tuples a level of responsibility to the overall inconsistency, and thereby prioritize tuples in the explanation or inspection of dirt. Therefore, inconsistency quantification and attribution have been a subject of much research in Knowledge Representation and, more recently, in Databases. As in many other fields, a conventional responsibility sharing mechanism is the Shapley value from cooperative game theory. In this paper, we carry out a systematic investigation of the complexity of the Shapley value in common inconsistency measures for functional-dependency (FD) violations. For several measures we establish a full classification of the FD sets into tractable and intractable classes with respect to Shapley-value computation. We also study the complexity of approximation in intractable cases.
In this paper we study the asymptotic theory for spectral analysis of stationary random fields, including linear and nonlinear fields. Asymptotic properties of Fourier coefficients and periodograms, including limiting distributions of Fourier coefficients, and the uniform consistency of kernel spectral density estimators are obtained under various mild conditions on moments and dependence structures. The validity of the aforementioned asymptotic results for estimated spatial fields is also established.
The autoregressive process is one of the fundamental and most important models that analyze a time series. Theoretical results and practical tools for fitting an autoregressive process with i.i.d. innovations are well-established. However, when the innovations are white noise but not i.i.d., those tools fail to generate a consistent confidence interval for the autoregressive coefficients. Focus on an autoregressive process with \textit{dependent} and \textit{non-stationary} innovations, this paper provides a consistent result and a Gaussian approximation theorem for the Yule-Walker estimator. Moreover, it introduces the second order wild bootstrap that constructs a consistent confidence interval for the estimator. Numerical experiments confirm the validity of the proposed algorithm with different kinds of white noise innovations. Meanwhile, the classical method(e.g., AR(Sieve) bootstrap) fails to generate a correct confidence interval when the innovations are dependent. According to Kreiss et al. \cite{10.1214/11-AOS900} and the Wold decomposition, assuming a real-life time series satisfies an autoregressive process is reasonable. However, innovations in that process are more likely to be white noises instead of i.i.d.. Therefore, our method should provide a practical tool that handles real-life problems.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Biomedical image segmentation is an important task in many medical applications. Segmentation methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling datasets of medical images requires significant expertise and time, and is infeasible at large scales. To tackle the lack of labeled data, researchers use techniques such as hand-engineered preprocessing steps, hand-tuned architectures, and data augmentation. However, these techniques involve costly engineering efforts, and are typically dataset-specific. We present an automated data augmentation method for medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans, focusing on the one-shot segmentation scenario -- a practical challenge in many medical applications. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transforms from the images, and use the model along with the labeled example to synthesize additional labeled training examples for supervised segmentation. Each transform is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. Augmenting the training of a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at //github.com/xamyzhao/brainstorm.