This paper deals with surrogate modelling of a computer code output in a hierarchical multi-fidelity context, i.e., when the output can be evaluated at different levels of accuracy and computational cost. Using observations of the output at low- and high-fidelity levels, we propose a method that combines Gaussian process (GP) regression and Bayesian neural network (BNN), in a method called GPBNN. The low-fidelity output is treated as a single-fidelity code using classical GP regression. The high-fidelity output is approximated by a BNN that incorporates, in addition to the high-fidelity observations, well-chosen realisations of the low-fidelity output emulator. The predictive uncertainty of the final surrogate model is then quantified by a complete characterisation of the uncertainties of the different models and their interaction. GPBNN is compared with most of the multi-fidelity regression methods allowing to quantify the prediction uncertainty.
This paper aims to front with dimensionality reduction in regression setting when the predictors are a mixture of functional variable and high-dimensional vector. A flexible model, combining both sparse linear ideas together with semiparametrics, is proposed. A wide scope of asymptotic results is provided: this covers as well rates of convergence of the estimators as asymptotic behaviour of the variable selection procedure. Practical issues are analysed through finite sample simulated experiments while an application to Tecator's data illustrates the usefulness of our methodology.
In this work, a Generalized Finite Difference (GFD) scheme is presented for effectively computing the numerical solution of a parabolic-elliptic system modelling a bacterial strain with density-suppressed motility. The GFD method is a meshless method known for its simplicity for solving non-linear boundary value problems over irregular geometries. The paper first introduces the basic elements of the GFD method, and then an explicit-implicit scheme is derived. The convergence of the method is proven under a bound for the time step, and an algorithm is provided for its computational implementation. Finally, some examples are considered comparing the results obtained with a regular mesh and an irregular cloud of points.
We propose a novel algorithm for the support estimation of partially known Gaussian graphical models that incorporates prior information about the underlying graph. In contrast to classical approaches that provide a point estimate based on a maximum likelihood or a maximum a posteriori criterion using (simple) priors on the precision matrix, we consider a prior on the graph and rely on annealed Langevin diffusion to generate samples from the posterior distribution. Since the Langevin sampler requires access to the score function of the underlying graph prior, we use graph neural networks to effectively estimate the score from a graph dataset (either available beforehand or generated from a known distribution). Numerical experiments demonstrate the benefits of our approach.
This simulation study evaluates the effectiveness of multiple imputation (MI) techniques for multilevel data. It compares the performance of traditional Multiple Imputation by Chained Equations (MICE) with tree-based methods such as Chained Random Forests with Predictive Mean Matching and Extreme Gradient Boosting. Adapted versions that include dummy variables for cluster membership are also included for the tree-based methods. Methods are evaluated for coefficient estimation bias, statistical power, and type I error rates on simulated hierarchical data with different cluster sizes (25 and 50) and levels of missingness (10\% and 50\%). Coefficients are estimated using random intercept and random slope models. The results show that while MICE is preferred for accurate rejection rates, Extreme Gradient Boosting is advantageous for reducing bias. Furthermore, the study finds that bias levels are similar across different cluster sizes, but rejection rates tend to be less favorable with fewer clusters (lower power, higher type I error). In addition, the inclusion of cluster dummies in tree-based methods improves estimation for Level 1 variables, but is less effective for Level 2 variables. When data become too complex and MICE is too slow, extreme gradient boosting is a good alternative for hierarchical data. Keywords: Multiple imputation; multi-level data; MICE; missRanger; mixgb
In this paper, we introduce CDL, a software library designed for the analysis of permutations and linear orders subject to various structural restrictions. Prominent examples of these restrictions include pattern avoidance, a topic of interest in both computer science and combinatorics, and "never conditions" utilized in social choice and voting theory. CDL offers a range of fundamental functionalities, including identifying the permutations that meet specific restrictions and determining the isomorphism of such sets. To facilitate exploration of large permutation sets or domains, CDL incorporates multiple search strategies and heuristics.
We present the library lymph for the finite element numerical discretization of coupled multi-physics problems. lymph is a Matlab library for the discretization of partial differential equations based on high-order discontinuous Galerkin methods on polytopal grids (PolyDG) for spatial discretization coupled with suitable finite-difference time marching schemes. The objective of the paper is to introduce the library by describing it in terms of installation, input/output data, and code structure, highlighting - when necessary - key implementation aspects related to the method. A user guide, proceeding step-by-step in the implementation and solution of a Poisson problem, is also provided. In the last part of the paper, we show the results obtained for several differential problems, namely the Poisson problem, the heat equation, and the elastodynamics system. Through these examples, we show the convergence properties and highlight some of the main features of the proposed method, i.e. geometric flexibility, high-order accuracy, and robustness with respect to heterogeneous physical parameters.
We propose a two-step Newton's method for refining an approximation of a singular zero whose deflation process terminates after one step, also known as a deflation-one singularity. Given an isolated singular zero of a square analytic system, our algorithm exploits an invertible linear operator obtained by combining the Jacobian and a projection of the Hessian in the direction of the kernel of the Jacobian. We prove the quadratic convergence of the two-step Newton method when it is applied to an approximation of a deflation-one singular zero. Also, the algorithm requires a smaller size of matrices than the existing methods, making it more efficient. We demonstrate examples and experiments to show the efficiency of the method.
We present a novel combination of dynamic embedded topic models and change-point detection to explore diachronic change of lexical semantic modality in classical and early Christian Latin. We demonstrate several methods for finding and characterizing patterns in the output, and relating them to traditional scholarship in Comparative Literature and Classics. This simple approach to unsupervised models of semantic change can be applied to any suitable corpus, and we conclude with future directions and refinements aiming to allow noisier, less-curated materials to meet that threshold.
The privacy in classical federated learning can be breached through the use of local gradient results by using engineered queries from the clients. However, quantum communication channels are considered more secure because the use of measurements in the data causes some loss of information, which can be detected. Therefore, the quantum version of federated learning can be used to provide more privacy. Additionally, sending an $N$ dimensional data vector through a quantum channel requires sending $\log N$ entangled qubits, which can provide exponential efficiency if the data vector is obtained as quantum states. In this paper, we propose a quantum federated learning model where fixed design quantum chips are operated based on the quantum states sent by a centralized server. Based on the coming superposition states, the clients compute and then send their local gradients as quantum states to the server, where they are aggregated to update parameters. Since the server does not send model parameters, but instead sends the operator as a quantum state, the clients are not required to share the model. This allows for the creation of asynchronous learning models. In addition, the model as a quantum state is fed into client-side chips directly; therefore, it does not require measurements on the upcoming quantum state to obtain model parameters in order to compute gradients. This can provide efficiency over the models where parameter vector is sent via classical or quantum channels and local gradients are obtained through the obtained values of these parameters.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.