Despite the substantial demand for high-quality, large-area building maps, no established open-source workflow for generating 2D and 3D maps currently exists. This study introduces an automated, open-source workflow for large-scale 2D and 3D building mapping utilizing airborne LiDAR data. Uniquely, our workflow operates entirely unsupervised, eliminating the need for any training procedures. We have integrated a specifically tailored DTM generation algorithm into our workflow to prevent errors in complex urban landscapes, especially around highways and overpasses. Through fine rasterization of LiDAR point clouds, we've enhanced building-tree differentiation, reduced errors near water bodies, and augmented computational efficiency by introducing a new planarity calculation. Our workflow offers a practical and scalable solution for the mass production of rasterized 2D and 3D building maps from raw airborne LiDAR data. Also, we elaborate on the influence of parameters and potential error sources to provide users with practical guidance. Our method's robustness has been rigorously optimized and tested using an extensive dataset (> 550 km$^2$), and further validated through comparison with deep learning-based and hand-digitized products. Notably, through these unparalleled, large-scale comparisons, we offer a valuable analysis of large-scale building maps generated via different methodologies, providing insightful evaluations of the effectiveness of each approach. We anticipate that our highly scalable building mapping workflow will facilitate the production of reliable 2D and 3D building maps, fostering advances in large-scale urban analysis. The code will be released upon publication.
Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member. We provide a theoretical analysis based on a PAC-Bayes bound which guarantees that if we fit such a labeling on unlabeled data, and the true labels on the training data, we obtain low negative log-likelihood and high ensemble diversity on testing samples. Empirically, through detailed experiments, we find that for low to moderately-sized training sets, our ensembles are more diverse and provide better calibration than standard ensembles, sometimes significantly.
Flow-based models are widely used in generative tasks, including normalizing flow, where a neural network transports from a data distribution $P$ to a normal distribution. This work develops a flow-based model that transports from $P$ to an arbitrary $Q$ where both distributions are only accessible via finite samples. We propose to learn the dynamic optimal transport between $P$ and $Q$ by training a flow neural network. The model is trained to find an invertible transport map between $P$ and $Q$ optimally by minimizing the transport cost. The trained optimal transport flow allows for performing many downstream tasks, including infinitesimal density ratio estimation and distribution interpolation in the latent space for generative models. The effectiveness of the proposed model on high-dimensional data is empirically demonstrated in mutual information estimation, energy-based generative models, and image-to-image translation.
This paper introduces gemact, a Python package for actuarial modelling based on the collective risk model. The library supports applications to risk costing and risk transfer, loss aggregation, and loss reserving. We add new probability distributions to those available in scipy, including the (a, b, 0) and (a, b, 1) discrete distributions, copulas of the Archimedean family, the Gaussian, the Student t and the Fundamental copulas. We provide an implementation of the AEP algorithm for calculating the cumulative distribution function of the sum of dependent, non-negative random variables, given their dependency structure specified with a copula. The theoretical framework is introduced at the beginning of each section to give the reader with a sufficient understanding of the underlying actuarial models.
This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN model (mePFRNN) is that it requires the same number of parameters to process environments of different sizes. Thus, the proposed mePFRNN architecture consumes less memory to store parameters compared to the previously proposed PFRNN model. To demonstrate the performance of our model, we test it on symmetric and noisy environments that are incredibly challenging for filtering algorithms. In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters.
Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.
B-spline-based hidden Markov models employ B-splines to specify the emission distributions, offering a more flexible modelling approach to data than conventional parametric HMMs. We introduce a Bayesian framework for inference, enabling the simultaneous estimation of all unknown model parameters including the number of states. A parsimonious knot configuration of the B-splines is identified by the use of a trans-dimensional Markov chain sampling algorithm, while model selection regarding the number of states can be performed based on the marginal likelihood within a parallel sampling framework. Using extensive simulation studies, we demonstrate the superiority of our methodology over alternative approaches as well as its robustness and scalability. We illustrate the explorative use of our methods for data on activity in animals, i.e. whitetip-sharks. The flexibility of our Bayesian approach also facilitates the incorporation of more realistic assumptions and we demonstrate this by developing a novel hierarchical conditional HMM to analyse human activity for circadian and sleep modelling.
We give a fully polynomial-time randomized approximation scheme (FPRAS) for two terminal reliability in directed acyclic graphs.
We devise a projection-free iterative scheme for the approximation of harmonic maps that provides a second-order accuracy of the constraint violation and is unconditionally energy stable. A corresponding error estimate is valid under a mild but necessary discrete regularity condition. The method is based on the application of a BDF2 scheme and the considered problem serves as a model for partial differential equations with holonomic constraint. The performance of the method is illustrated via the computation of stationary harmonic maps and bending isometries.
This paper presents a physics and data co-driven surrogate modeling method for efficient rare event simulation of civil and mechanical systems with high-dimensional input uncertainties. The method fuses interpretable low-fidelity physical models with data-driven error corrections. The hypothesis is that a well-designed and well-trained simplified physical model can preserve salient features of the original model, while data-fitting techniques can fill the remaining gaps between the surrogate and original model predictions. The coupled physics-data-driven surrogate model is adaptively trained using active learning, aiming to achieve a high correlation and small bias between the surrogate and original model responses in the critical parametric region of a rare event. A final importance sampling step is introduced to correct the surrogate model-based probability estimations. Static and dynamic problems with input uncertainties modeled by random field and stochastic process are studied to demonstrate the proposed method.