In this chapter we illustrate the use of some Machine Learning techniques in the context of omics data. More precisely, we review and evaluate the use of Random Forest and Penalized Multinomial Logistic Regression for integrative analysis of genomics and immunomics in pancreatic cancer. Furthermore, we propose the use of association rules with predictive purposes to overcome the low predictive power of the previously mentioned models. Finally, we apply the reviewed methods to a real data set from TCGA made of 107 tumoral pancreatic samples and 117,486 germline SNPs, showing the good performance of the proposed methods to predict the immunological infiltration in pancreatic cancer.
We consider the problem of estimating log-determinants of large, sparse, positive definite matrices. A key focus of our algorithm is to reduce computational cost, and it is based on sparse approximate inverses. The algorithm can be implemented to be adaptive, and it uses graph spline approximation to improve accuracy. We illustrate our approach on classes of large sparse matrices.
In this work, we present an efficient approach to solve nonlinear high-contrast multiscale diffusion problems. We incorporate the explicit-implicit-null (EIN) method to separate the nonlinear term into a linear term and a damping term, and then utilise the implicit and explicit time marching scheme for the two parts respectively. Due to the multiscale property of the linear part, we further introduce a temporal partially explicit splitting scheme and construct suitable multiscale subspaces to speed up the computation. The approximated solution is splitted into these subspaces associated with different physics. The temporal splitting scheme employs implicit discretization in the subspace with small dimension that representing the high-contrast property and uses explicit discretization for the other subspace. We exploit the stability of the proposed scheme and give the condition for the choice of the linear diffusion coefficient. The convergence of the proposed method is provided. Several numerical tests are performed to show the efficiency and accuracy of the proposed approach.
For nonlinear Cosserat elasticity, we consider multiscale methods in this paper. In particular, we explore the generalized multiscale finite element method (GMsFEM) to solve an isotropic Cosserat problem with strain-limiting property (ensuring bounded linearized strains even under high stresses). Such strain-limiting Cosserat model can find potential applications in solids and biological fibers. However, Cosserat media with naturally rotational degrees of freedom, nonlinear constitutive relations, high contrast, and heterogeneities may produce challenging multiscale characteristics in the solution, and upscaling by multiscale methods is necessary. Therefore, we utilize the offline and residual-based online (adaptive or uniform) GMsFEM in this context while handling the nonlinearity by Picard iteration. Through various two-dimensional experiments (for perforated, composite, and stochastically heterogeneous media with small and big strain-limiting parameters), our numerical results show the approaches' convergence, efficiency, and robustness. In addition, these results demonstrate that such approaches provide good accuracy, the online GMsFEM gives more accurate solutions than the offline one, and the online adaptive strategy has similar accuracy to the uniform one but with fewer degrees of freedom.
In this paper we present a new "external verifier" for the Lean theorem prover, written in Lean itself. This is the first complete verifier for Lean 4 other than the reference implementation in C++ used by Lean itself, and our new verifier is competitive with the original, running between 20% and 50% slower and usable to verify all of Lean's mathlib library, forming an additional step in Lean's aim to self-host the full elaborator and compiler. Moreover, because the verifier is written in a language which admits formal verification, it is possible to state and prove properties about the kernel itself, and we report on some initial steps taken in this direction to formalize the Lean type theory abstractly and show that the kernel correctly implements this theory, to eliminate the possibility of implementation bugs in the kernel and increase the trustworthiness of proofs conducted in it. This work is still ongoing but we plan to use this project to help justify any future changes to the kernel and type theory and ensure unsoundness does not sneak in through either the abstract theory or implementation bugs.
In this paper, we propose a generic approach to perform global sensitivity analysis (GSA) for compartmental models based on continuous-time Markov chains (CTMC). This approach enables a complete GSA for epidemic models, in which not only the effects of uncertain parameters such as epidemic parameters (transmission rate, mean sojourn duration in compartments) are quantified, but also those of intrinsic randomness and interactions between the two. The main step in our approach is to build a deterministic representation of the underlying continuous-time Markov chain by controlling the latent variables modeling intrinsic randomness. Then, model output can be written as a deterministic function of both uncertain parameters and controlled latent variables, so that it becomespossible to compute standard variance-based sensitivity indices, e.g. the so-called Sobol' indices. However, different simulation algorithms lead to different representations. We exhibit in this work three different representations for CTMC stochastic compartmental models and discuss the results obtained by implementing and comparing GSAs based on each of these representations on a SARS-CoV-2 epidemic model.
In this paper, we study parameter identification for solutions to (possibly non-linear) SDEs driven by additive Rosenblatt process and singularity of the induced laws on the path space. We propose a joint estimator for the drift parameter, diffusion intensity, and Hurst index that can be computed from discrete-time observations with a bounded time horizon and we prove its strong consistency (as well as the speed of convergence) under in-fill asymptotics with a fixed time horizon. As a consequence of this strong consistency, singularity of measures generated by the solutions with different drifts is shown. This results in the invalidity of a Girsanov-type theorem for Rosenblatt processes.
In this paper we apply the stochastic variance reduced gradient (SVRG) method, which is a popular variance reduction method in optimization for accelerating the stochastic gradient method, to solve large scale linear ill-posed systems in Hilbert spaces. Under {\it a priori} choices of stopping indices, we derive a convergence rate result when the sought solution satisfies a benchmark source condition and establish a convergence result without using any source condition. To terminate the method in an {\it a posteriori} manner, we consider the discrepancy principle and show that it terminates the method in finite many iteration steps almost surely. Various numerical results are reported to test the performance of the method.
Given the rapid advancement of artificial intelligence, understanding the foundations of intelligent behaviour is increasingly important. Active inference, regarded as a general theory of behaviour, offers a principled approach to probing the basis of sophistication in planning and decision-making. In this paper, we examine two decision-making schemes in active inference based on 'planning' and 'learning from experience'. Furthermore, we also introduce a mixed model that navigates the data-complexity trade-off between these strategies, leveraging the strengths of both to facilitate balanced decision-making. We evaluate our proposed model in a challenging grid-world scenario that requires adaptability from the agent. Additionally, our model provides the opportunity to analyze the evolution of various parameters, offering valuable insights and contributing to an explainable framework for intelligent decision-making.
We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample. Next, we determine an optimal approximate design on this design space. Finally, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are "closest" to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models including models where the Fisher information is of rank larger than $1$.
In this article we will explore the concept of speedrunning as a representation of a simplified version of quantum mechanics within a classical simulation. This analogy can be seen as a simplified approach to understanding the broader idea that quantum mechanics may emerge from classical mechanics simulations due to the limitations of the simulation. The concept of speedrunning will be explored from the perspective inside the simulation, where the player is seen as a "force of nature" that can be interpreted through Newton's first law. Starting from this general assumption, the aim is to build a bridge between these two fields by using the mathematical representation of path integrals. The use of such an approach as an intermediate layer between machine learning techniques aimed at finding an optimal strategy and a game simulation is also analysed. This article will focus primarily on the relationship between classical and quantum physics within the simulation, leaving aside more technical issues in field theory such as invariance with respect to Lorentz transformations and virtual particles.