In this paper we consider the numerical approximation of infinite horizon problems via the dynamic programming approach. The value function of the problem solves a Hamilton-Jacobi-Bellman (HJB) equation that is approximated by a fully discrete method. It is known that the numerical problem is difficult to handle by the so called curse of dimensionality. To mitigate this issue we apply a reduction of the order by means of a new proper orthogonal decomposition (POD) method based on time derivatives. We carry out the error analysis of the method using recently proved optimal bounds for the fully discrete approximations. Moreover, the use of snapshots based on time derivatives allow us to bound some terms of the error that could not be bounded in a standard POD approach. Some numerical experiments show the good performance of the method in practice.
In this paper we prove that rectified deep neural networks do not suffer from the curse of dimensionality when approximating McKean--Vlasov SDEs in the sense that the number of parameters in the deep neural networks only grows polynomially in the space dimension $d$ of the SDE and the reciprocal of the accuracy $\epsilon$.
We adopt the integral definition of the fractional Laplace operator and study an optimal control problem on Lipschitz domains that involves a fractional elliptic partial differential equation (PDE) as state equation and a control variable that enters the state equation as a coefficient; pointwise constraints on the control variable are considered as well. We establish the existence of optimal solutions and analyze first and, necessary and sufficient, second order optimality conditions. Regularity estimates for optimal variables are also analyzed. We develop two finite element discretization strategies: a semidiscrete scheme in which the control variable is not discretized, and a fully discrete scheme in which the control variable is discretized with piecewise constant functions. For both schemes, we analyze the convergence properties of discretizations and derive error estimates.
In modern computational materials science, deep learning has shown the capability to predict interatomic potentials, thereby supporting and accelerating conventional simulations. However, existing models typically sacrifice either accuracy or efficiency. Moreover, lightweight models are highly demanded for offering simulating systems on a considerably larger scale at reduced computational costs. A century ago, Felix Bloch demonstrated how leveraging the equivariance of the translation operation on a crystal lattice (with geometric symmetry) could significantly reduce the computational cost of determining wavefunctions and accurately calculate material properties. Here, we introduce a lightweight equivariant interaction graph neural network (LEIGNN) that can enable accurate and efficient interatomic potential and force predictions in crystals. Rather than relying on higher-order representations, LEIGNN employs a scalar-vector dual representation to encode equivariant features. By extracting both local and global structures from vector representations and learning geometric symmetry information, our model remains lightweight while ensuring prediction accuracy and robustness through the equivariance. Our results show that LEIGNN consistently outperforms the prediction performance of the representative baselines and achieves significant efficiency across diverse datasets, which include catalysts, molecules, and organic isomers. Finally, to further validate the predicted interatomic potentials from our model, we conduct classical molecular dynamics (MD) and ab initio MD simulation across various systems, including solid, liquid, and gas. It is found that LEIGNN can achieve the accuracy of ab initio MD and retain the computational efficiency of classical MD across all examined systems, demonstrating its accuracy, efficiency, and universality.
In this paper we propose a number of KEM-based protocols to establish a shared secret between two parties, and study their resistance over unauthenticated channels. This means analyzing the security of the protocol itself, and its robustness against Man-in-the-Middle attacks. We do this by constructing a variation of known unauthenticated models that applies the techniques used to constructed the protocols, and formalize their security under this model. We compare them with their KEX-based counterparts to highlight the differences that arise naturally, due to the nature of KEM constructions, in terms of the protocol itself and the types of attacks that they are subject to. We provide practical go-to KEM-based protocols instances to migrate to, based on the conditions of currently-in-use KEX-based protocols.
In a one-way analysis-of-variance (ANOVA) model, the number of all pairwise comparisons can be large even when there are only a moderate number of groups. Motivated by this, we consider a regime with a growing number of groups, and prove that for testing pairwise comparisons the BH procedure can offer asymptotic control on false discoveries, despite that the t-statistics involved do not exhibit the well-known positive dependence structure called the PRDS to guarantee exact false discovery rate (FDR) control. Sharing Tukey's viewpoint that the difference in the means of any two groups cannot be exactly zero, our main result is stated in terms of the control on the directional false discovery rate and directional false discovery proportion. A key technical contribution is that we have shown the dependence among the t-statistics to be weak enough to induce a convergence result typically needed for establishing asymptotic FDR control. Our analysis does not adhere to stylized assumptions such as normality, variance homogeneity and a balanced design, and thus provides a theoretical grounding for applications in more general situations.
We present a new stability and error analysis of fully discrete approximation schemes for the transient Stokes equation. For the spatial discretization, we consider a wide class of Galerkin finite element methods which includes both inf-sup stable spaces and symmetric pressure stabilized formulations. We extend the results from Burman and Fern\'andez [\textit{SIAM J. Numer. Anal.}, 47 (2009), pp. 409-439] and provide a unified theoretical analysis of backward difference formulae (BDF methods) of order 1 to 6. The main novelty of our approach lies in the use of Dahlquist's G-stability concept together with multiplier techniques introduced by Nevannlina-Odeh and recently by Akrivis et al. [\textit{SIAM J. Numer. Anal.}, 59 (2021), pp. 2449-2472] to derive optimal stability and error estimates for both the velocity and the pressure. When combined with a method dependent Ritz projection for the initial data, unconditional stability can be shown while for arbitrary interpolation, pressure stability is subordinate to the fulfillment of a mild inverse CFL-type condition between space and time discretizations.
In this paper we explore the concept of sequential inductive prediction intervals using theory from sequential testing. We furthermore introduce a 3-parameter PAC definition of prediction intervals that allows us via simulation to achieve almost sharp bounds with high probability.
In this paper, we first introduce and define several new information divergences in the space of transition matrices of finite Markov chains which measure the discrepancy between two Markov chains. These divergences offer natural generalizations of classical information-theoretic divergences, such as the $f$-divergences and the R\'enyi divergence between probability measures, to the context of finite Markov chains. We begin by detailing and deriving fundamental properties of these divergences and notably gives a Markov chain version of the Pinsker's inequality and Chernoff information. We then utilize these notions in a few applications. First, we investigate the binary hypothesis testing problem of Markov chains, where the newly defined R\'enyi divergence between Markov chains and its geometric interpretation play an important role in the analysis. Second, we propose and analyze information-theoretic (Ces\`aro) mixing times and ergodicity coefficients, along with spectral bounds of these notions in the reversible setting. Examples of the random walk on the hypercube, as well as the connections between the critical height of the low-temperature Metropolis-Hastings chain and these proposed ergodicity coefficients, are highlighted.
In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.
Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept. In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.