Cultural values vary significantly around the world. Despite a large heterogeneity, similarities across national cultures are to be expected. This paper studies cross-country culture heterogeneity via the joint inference of copula graphical models. To this end, a random graph generative model is introduced, with a latent space that embeds cultural relatedness across countries. Taking world-wide country-specific survey data as the primary source of information, the modelling framework allows to integrate external data, both at the level of cultural traits and of their interdependence. In this way, we are able to identify several dimensions of culture.
Porous media processes involve various physical phenomena such as mechanical deformation, transport, and fluid flow. Accurate simulations must capture the strong couplings between these phenomena. Choosing an efficient solver for the multiphysics problem usually entails the decoupling into subproblems related to separate physical phenomena. Then, the suitable solvers for each subproblem and the iteration scheme must be chosen. The wide range of options for the solver components makes finding the optimum difficult and time-consuming; moreover, solvers come with numerical parameters that need to be optimized. As a further complication, the solver performance may depend on the physical regime of the simulation model, which may vary with time. Switching a solver with respect to the dominant process can be beneficial, but the threshold of when to switch solver is unclear and complicated to analyze. We address this challenge by developing a machine learning framework that automatically searches for the optimal solver for a given multiphysics simulation setup, based on statistical data from previously solved problems. For a series of problems, exemplified by successive time steps in a time-dependent simulation, the framework updates and improves its decision model online during the simulation. We show how it outperforms preselected state-of-the-art solvers for test problem setups. The examples are based on simulations of poromechanics and simulations of flow and transport. For the quasi-static linear Biot model, we demonstrate automated tuning of numerical solver parameters by showing how the L-parameter of the so-called Fixed-Stress preconditioner can be optimized. Motivated by a test example where the main heat transfer mechanism changes between convection and diffusion, we discuss how the solver selector can dynamically switch solvers when the dominant physical phenomenon changes with time.
More than half of the world's population is exposed to the risk of mosquito-borne diseases, which leads to millions of cases and hundreds of thousands of deaths every year. Analyzing this type of data is often complex and poses several interesting challenges, mainly due to the vast geographic area, the peculiar temporal behavior, and the potential correlation between infections. Motivation stems from the analysis of tropical diseases data, namely, the number of cases of two arboviruses, dengue and chikungunya, transmitted by the same mosquito, for all the 145 microregions in Southeast Brazil from 2018 to 2022. As a contribution to the literature on multivariate disease data, we develop a flexible Bayesian multivariate spatio-temporal model where temporal dependence is defined for areal clusters. The model features a prior distribution for the random partition of areal data that incorporates neighboring information, thus encouraging maps with few contiguous clusters and discouraging clusters with disconnected areas. The model also incorporates an autoregressive structure and terms related to seasonal patterns into temporal components that are disease and cluster-specific. It also considers a multivariate directed acyclic graph autoregressive structure to accommodate spatial and inter-disease dependence, facilitating the interpretation of spatial correlation. We explore properties of the model by way of simulation studies and show results that prove our proposal compares well to competing alternatives. Finally, we apply the model to the motivating dataset with a twofold goal: clustering areas where the temporal trend of certain diseases are similar, and exploring the potential existence of temporal and/or spatial correlation between two diseases transmitted by the same mosquito.
Optimal experimental design (OED) has far-reaching impacts in many scientific domains. We study OED over a continuous-valued design space, a setting that occurs often in practice. Optimization of a distributional function over an infinite-dimensional probability measure space is conceptually distinct from the discrete OED tasks that are conventionally tackled. We propose techniques based on optimal transport and Wasserstein gradient flow. A practical computational approach is derived from the Monte Carlo simulation, which transforms the infinite-dimensional optimization problem to a finite-dimensional problem over Euclidean space, to which gradient descent can be applied. We discuss first-order criticality and study the convexity properties of the OED objective. We apply our algorithm to the tomography inverse problem, where the solution reveals optimal sensor placements for imaging.
As people's aesthetic preferences for images are far from understood, image aesthetic assessment is a challenging artificial intelligence task. The range of factors underlying this task is almost unlimited, but we know that some aesthetic attributes affect those preferences. In this study, we present a multi-task convolutional neural network that takes into account these attributes. The proposed neural network jointly learns the attributes along with the overall aesthetic scores of images. This multi-task learning framework allows for effective generalization through the utilization of shared representations. Our experiments demonstrate that the proposed method outperforms the state-of-the-art approaches in predicting overall aesthetic scores for images in one benchmark of image aesthetics. We achieve near-human performance in terms of overall aesthetic scores when considering the Spearman's rank correlations. Moreover, our model pioneers the application of multi-tasking in another benchmark, serving as a new baseline for future research. Notably, our approach achieves this performance while using fewer parameters compared to existing multi-task neural networks in the literature, and consequently makes our method more efficient in terms of computational complexity.
We characterize the quotients among lattice path matroids (LPMs) in terms of their diagrams. This characterization allows us to show that ordering LPMs by quotients yields a graded poset, whose rank polynomial has the Narayana numbers as coefficients. Furthermore, we study full lattice path flag matroids and show that -- contrary to arbitrary positroid flag matroids -- they correspond to points in the nonnegative flag variety. At the basis of this result lies an identification of certain intervals of the strong Bruhat order with lattice path flag matroids. A recent conjecture of Mcalmon, Oh, and Xiang states a characterization of quotients of positroids. We use our results to prove this conjecture in the case of LPMs.
Thanks to recent advances in generative AI, we are able to prompt large language models (LLMs) to produce texts which are fluent and grammatical. In addition, it has been shown that we can elicit attempts at grammatical error correction (GEC) from LLMs when prompted with ungrammatical input sentences. We evaluate how well LLMs can perform at GEC by measuring their performance on established benchmark datasets. We go beyond previous studies, which only examined GPT* models on a selection of English GEC datasets, by evaluating seven open-source and three commercial LLMs on four established GEC benchmarks. We investigate model performance and report results against individual error types. Our results indicate that LLMs do not always outperform supervised English GEC models except in specific contexts -- namely commercial LLMs on benchmarks annotated with fluency corrections as opposed to minimal edits. We find that several open-source models outperform commercial ones on minimal edit benchmarks, and that in some settings zero-shot prompting is just as competitive as few-shot prompting.
The real-time dynamic environment perception has become vital for autonomous robots in crowded spaces. Although the popular voxel-based mapping methods can efficiently represent 3D obstacles with arbitrarily complex shapes, they can hardly distinguish between static and dynamic obstacles, leading to the limited performance of obstacle avoidance. While plenty of sophisticated learning-based dynamic obstacle detection algorithms exist in autonomous driving, the quadcopter's limited computation resources cannot achieve real-time performance using those approaches. To address these issues, we propose a real-time dynamic obstacle tracking and mapping system for quadcopter obstacle avoidance using an RGB-D camera. The proposed system first utilizes a depth image with an occupancy voxel map to generate potential dynamic obstacle regions as proposals. With the obstacle region proposals, the Kalman filter and our continuity filter are applied to track each dynamic obstacle. Finally, the environment-aware trajectory prediction method is proposed based on the Markov chain using the states of tracked dynamic obstacles. We implemented the proposed system with our custom quadcopter and navigation planner. The simulation and physical experiments show that our methods can successfully track and represent obstacles in dynamic environments in real-time and safely avoid obstacles. Our software is available on GitHub as an open-source ROS package.
This paper presents asymptotic results for the maximum likelihood and restricted maximum likelihood (REML) estimators within a two-way crossed mixed effect model as the sizes of the rows, columns, and cells tend to infinity. Under very mild conditions which do not require the assumption of normality, the estimators are proven to be asymptotically normal, possessing a structured covariance matrix. The growth rate for the number of rows, columns, and cells is unrestricted, whether considered pairwise or collectively.
DNA is a promising storage medium, but its stability and occurrence of Indel errors pose a significant challenge. The relative occurrence of Guanine(G) and Cytosine(C) in DNA is crucial for its longevity, and reverse complementary base pairs should be avoided to prevent the formation of a secondary structure in DNA strands. We overcome these challenges by selecting appropriate group homomorphisms. For storing and retrieving information in DNA strings we use kernel code and the Varshamov-Tenengolts algorithm. The Varshamov-Tenengolts algorithm corrects single indel errors. Additionally, we construct codes of any desired length (n) while calculating its reverse complement distance based on the value of n.
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.