For Hamiltonian systems with non-canonical structure matrices, a new family of fourth-order energy-preserving integrators is presented. The integrators take a form of a combination of Runge--Kutta methods and continuous-stage Runge--Kutta methods and feature a set of free parameters that offer greater flexibility and efficiency. Specifically, we demonstrate that by carefully choosing these free parameters a simplified Newton iteration applied to the integrators of order four can be parallelizable. This results in faster and more efficient integrators compared with existing fourth-order energy-preserving integrators.
Challenges to reproducibility and replicability have gained widespread attention, driven by large replication projects with lukewarm success rates. A nascent work has emerged developing algorithms to estimate the replicability of published findings. The current study explores ways in which AI-enabled signals of confidence in research might be integrated into the literature search. We interview 17 PhD researchers about their current processes for literature search and ask them to provide feedback on a replicability estimation tool. Our findings suggest that participants tend to confuse replicability with generalizability and related concepts. Information about replicability can support researchers throughout the research design processes. However, the use of AI estimation is debatable due to the lack of explainability and transparency. The ethical implications of AI-enabled confidence assessment must be further studied before such tools could be widely accepted. We discuss implications for the design of technological tools to support scholarly activities and advance replicability.
Supervised machine learning models and public surveillance data has been employed for infectious disease forecasting in many settings. These models leverage various data sources capturing drivers of disease spread, such as climate conditions or human behavior. However, few models have incorporated the organizational structure of different geographic locations for forecasting. Traveling waves of seasonal outbreaks have been reported for dengue, influenza, and other infectious diseases, and many of the drivers of infectious disease dynamics may be shared across different cities, either due to their geographic or socioeconomic proximity. In this study, we developed a machine learning model to predict case counts of four infectious diseases across Brazilian cities one week ahead by incorporating information from related cities. We compared selecting related cities using both geographic distance and GDP per capita. Incorporating information from geographically proximate cities improved predictive performance for two of the four diseases, specifically COVID-19 and Zika. We also discuss the impact on forecasts in the presence of anomalous contagion patterns and the limitations of the proposed methodology.
We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument. Those functions concentrate around their conditional expectations. We further extend the results to concentration in general metric spaces.
Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: //github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.
Efficiently enumerating all the extreme points of a polytope identified by a system of linear inequalities is a well-known challenge issue. We consider a special case and present an algorithm that enumerates all the extreme points of a bisubmodular polyhedron in $\mathcal{O}(n^4|V|)$ time and $\mathcal{O}(n^2)$ space complexity, where $n$ is the dimension of underlying space and $V$ is the set of outputs. We use the reverse search and signed poset linked to extreme points to avoid the redundant search. Our algorithm is a generalization of enumerating all the extreme points of a base polyhedron which comprises some combinatorial enumeration problems.
Capturing the extremal behaviour of data often requires bespoke marginal and dependence models which are grounded in rigorous asymptotic theory, and hence provide reliable extrapolation into the upper tails of the data-generating distribution. We present a toolbox of four methodological frameworks, motivated by modern extreme value theory, that can be used to accurately estimate extreme exceedance probabilities or the corresponding level in either a univariate or multivariate setting. Our frameworks were used to facilitate the winning contribution of Team Yalla to the EVA (2023) Conference Data Challenge, which was organised for the 13$^\text{th}$ International Conference on Extreme Value Analysis. This competition comprised seven teams competing across four separate sub-challenges, with each requiring the modelling of data simulated from known, yet highly complex, statistical distributions, and extrapolation far beyond the range of the available samples in order to predict probabilities of extreme events. Data were constructed to be representative of real environmental data, sampled from the fantasy country of "Utopia"
We develop an inferential toolkit for analyzing object-valued responses, which correspond to data situated in general metric spaces, paired with Euclidean predictors within the conformal framework. To this end we introduce conditional profile average transport costs, where we compare distance profiles that correspond to one-dimensional distributions of probability mass falling into balls of increasing radius through the optimal transport cost when moving from one distance profile to another. The average transport cost to transport a given distance profile to all others is crucial for statistical inference in metric spaces and underpins the proposed conditional profile scores. A key feature of the proposed approach is to utilize the distribution of conditional profile average transport costs as conformity score for general metric space-valued responses, which facilitates the construction of prediction sets by the split conformal algorithm. We derive the uniform convergence rate of the proposed conformity score estimators and establish asymptotic conditional validity for the prediction sets. The finite sample performance for synthetic data in various metric spaces demonstrates that the proposed conditional profile score outperforms existing methods in terms of both coverage level and size of the resulting prediction sets, even in the special case of scalar and thus Euclidean responses. We also demonstrate the practical utility of conditional profile scores for network data from New York taxi trips and for compositional data reflecting energy sourcing of U.S. states.
We present the first formulation of the optimal polynomial approximation of the solution of linear non-autonomous systems of ODEs in the framework of the so-called $\star$-product. This product is the basis of new approaches for the solution of such ODEs, both in the analytical and the numerical sense. The paper shows how to formally state the problem and derives upper bounds for its error.
In this article, an efficient numerical method for computing finite-horizon controllability Gramians in Cholesky-factored form is proposed. The method is applicable to general dense matrices of moderate size and produces a Cholesky factor of the Gramian without computing the full product. In contrast to other methods applicable to this task, the proposed method is a generalization of the scaling-and-squaring approach for approximating the matrix exponential. It exploits a similar doubling formula for the Gramian, and thereby keeps the required computational effort modest. Most importantly, a rigorous backward error analysis is provided, which guarantees that the approximation is accurate to the round-off error level in double precision. This accuracy is illustrated in practice on a large number of standard test examples. The method has been implemented in the Julia package FiniteHorizonGramians.jl, which is available online under the MIT license. Code for reproducing the experimental results is included in this package, as well as code for determining the optimal method parameters. The analysis can thus easily be adapted to a different finite-precision arithmetic.
This paper analyses conforming and nonconforming virtual element formulations of arbitrary polynomial degrees on general polygonal meshes for the coupling of solid and fluid phases in deformable porous plates. The governing equations consist of one fourth-order equation for the transverse displacement of the middle surface coupled with a second-order equation for the pressure head relative to the solid with mixed boundary conditions. We propose novel enrichment operators that connect nonconforming virtual element spaces of general degree to continuous Sobolev spaces. These operators satisfy additional orthogonal and best-approximation properties (referred to as a conforming companion operator in the context of finite element methods), which play an important role in the nonconforming methods. This paper proves a priori error estimates in the best-approximation form, and derives residual--based reliable and efficient a posteriori error estimates in appropriate norms, and shows that these error bounds are robust with respect to the main model parameters. The computational examples illustrate the numerical behaviour of the suggested virtual element discretisations and confirm the theoretical findings on different polygonal meshes with mixed boundary conditions.