Global Climate Model (GCM) tuning (calibration) is a tedious and time-consuming process, with high-dimensional input and output fields. Experts typically tune by iteratively running climate simulations with hand-picked values of tuning parameters. Many, in both the statistical and climate literature, have proposed alternative calibration methods, but most are impractical or difficult to implement. We present a practical, robust and rigorous calibration approach on the atmosphere-only model of the Department of Energy's Energy Exascale Earth System Model (E3SM) version 2. Our approach can be summarized into two main parts: (1) the training of a surrogate that predicts E3SM output in a fraction of the time compared to running E3SM, and (2) gradient-based parameter optimization. To train the surrogate, we generate a set of designed ensemble runs that span our input parameter space and use polynomial chaos expansions on a reduced output space to fit the E3SM output. We use this surrogate in an optimization scheme to identify values of the input parameters for which our model best matches gridded spatial fields of climate observations. To validate our choice of parameters, we run E3SMv2 with the optimal parameter values and compare prediction results to expertly-tuned simulations across 45 different output fields. This flexible, robust, and automated approach is straightforward to implement, and we demonstrate that the resulting model output matches present day climate observations as well or better than the corresponding output from expert tuned parameter values, while considering high-dimensional output and operating in a fraction of the time.
Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 10 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 80% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources.
Programs with a continuous state space or that interact with physical processes often require notions of equivalence going beyond the standard binary setting in which equivalence either holds or does not hold. In this paper we explore the idea of equivalence taking values in a quantale V, which covers the cases of (in)equations and (ultra)metric equations among others. Our main result is the introduction of a V-equational deductive system for linear {\lambda}-calculus together with a proof that it is sound and complete. In fact we go further than this, by showing that linear {\lambda}-theories based on this V-equational system form a category that is equivalent to a category of autonomous categories enriched over 'generalised metric spaces'. If we instantiate this result to inequations, we get an equivalence with autonomous categories enriched over partial orders. In the case of (ultra)metric equations, we get an equivalence with autonomous categories enriched over (ultra)metric spaces. We additionally show that this syntax-semantics correspondence extends to the affine setting. We use our results to develop examples of inequational and metric equational systems for higher-order programming in the setting of real-time, probabilistic, and quantum computing.
Hamilton-Jacobi (HJ) partial differential equations (PDEs) have diverse applications spanning physics, optimal control, game theory, and imaging sciences. This research introduces a first-order optimization-based technique for HJ PDEs, which formulates the time-implicit update of HJ PDEs as saddle point problems. We remark that the saddle point formulation for HJ equations is aligned with the primal-dual formulation of optimal transport and potential mean-field games (MFGs). This connection enables us to extend MFG techniques and design numerical schemes for solving HJ PDEs. We employ the primal-dual hybrid gradient (PDHG) method to solve the saddle point problems, benefiting from the simple structures that enable fast computations in updates. Remarkably, the method caters to a broader range of Hamiltonians, encompassing non-smooth and spatiotemporally dependent cases. The approach's effectiveness is verified through various numerical examples in both one-dimensional and two-dimensional examples, such as quadratic and $L^1$ Hamiltonians with spatial and time dependence.
Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a preference between two calibrations (samples), highlighting which one, if any, is better than the other. The optimization procedure must use the observed preferences to find the tuning of the decision vector that is most preferred by the decision-maker, while also minimizing the number of comparisons. In this work, we formulate the preference-based optimization problem from a utility theory perspective. Then, we propose GLISp-r, an extension of a recent preference-based optimization procedure called GLISp. The latter uses a Radial Basis Function surrogate to describe the tastes of the decision-maker. Iteratively, GLISp proposes new samples to compare with the best calibration available by trading off exploitation of the surrogate model and exploration of the decision space. In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS, a popular procedure in the black-box optimization framework. Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem. We motivate this claim theoretically, with a proof of global convergence, and empirically, by comparing the performances of GLISp and GLISp-r on several benchmark optimization problems.
Semitopologies model consensus in distributed system by equating the notion of a quorum -- a set of participants sufficient to make local progress -- with that of an open set. This yields a topology-like theory of consensus, but semitopologies generalise topologies, since the intersection of two quorums need not necessarily be a quorum. The semitopological model of consensus is naturally heterogeneous and local, just like topologies can be heterogenous and local, and for the same reasons: points may have different quorums and there is no restriction that open sets / quorums be uniformly generated (e.g. open sets can be something other than two-thirds majorities of the points in the space). Semiframes are an algebraic abstraction of semitopologies. They are to semitopologies as frames are to topologies. We give a notion of semifilter, which plays a role analogous to filters, and show how to build a semiframe out of the open sets of a semitopology, and a semitopology out of the semifilters of a semiframe. We define suitable notions of category and morphism and prove a categorical duality between (sober) semiframes and (spatial) semitopologies, and investigate well-behavedness properties on semitopologies and semiframes across the duality. Surprisingly, the structure of semiframes is not what one might initially expect just from looking at semitopologies, and the canonical structure required for the duality result -- a compatibility relation *, generalising sets intersection -- is also canonical for expressing well-behavedness properties. Overall, we deliver a new categorical, algebraic, abstract framework within which to study consensus on distributed systems, and which is also simply interesting to consider as a mathematical theory in its own right.
Network motifs are recurrent, small-scale patterns of interactions observed frequently in a system. They shed light on the interplay between the topology and the dynamics of complex networks across various domains. In this work, we focus on the problem of counting occurrences of small sub-hypergraph patterns in very large hypergraphs, where higher-order interactions connect arbitrary numbers of system units. We show how directly exploiting higher-order structures speeds up the counting process compared to traditional data mining techniques for exact motif discovery. Moreover, with hyperedge sampling, performance is further improved at the cost of small errors in the estimation of motif frequency. We evaluate our method on several real-world datasets describing face-to-face interactions, co-authorship and human communication. We show that our approximated algorithm allows us to extract higher-order motifs faster and on a larger scale, beyond the computational limits of an exact approach.
We give a fully polynomial-time randomized approximation scheme (FPRAS) for two terminal reliability in directed acyclic graphs.
Graph Neural Networks (GNNs) have emerged as formidable resources for processing graph-based information across diverse applications. While the expressive power of GNNs has traditionally been examined in the context of graph-level tasks, their potential for node-level tasks, such as node classification, where the goal is to interpolate missing node labels from the observed ones, remains relatively unexplored. In this study, we investigate the proficiency of GNNs for such classifications, which can also be cast as a function interpolation problem. Explicitly, we focus on ascertaining the optimal configuration of weights and layers required for a GNN to successfully interpolate a band-limited function over Euclidean cubes. Our findings highlight a pronounced efficiency in utilizing GNNs to generalize a bandlimited function within an $\varepsilon$-error margin. Remarkably, achieving this task necessitates only $O_d((\log\varepsilon^{-1})^d)$ weights and $O_d((\log\varepsilon^{-1})^d)$ training samples. We explore how this criterion stacks up against the explicit constructions of currently available Neural Networks (NNs) designed for similar tasks. Significantly, our result is obtained by drawing an innovative connection between the GNN structures and classical sampling theorems. In essence, our pioneering work marks a meaningful contribution to the research domain, advancing our understanding of the practical GNN applications.
It is well known since 1960s that by exploring the tensor product structure of the discrete Laplacian on Cartesian meshes, one can develop a simple direct Poisson solver with an $\mathcal O(N^{\frac{d+1}d})$ complexity in $d$-dimension. The GPU acceleration of numerically solving PDEs has been explored successfully around fifteen years ago and become more and more popular in the past decade, driven by significant advancement in both hardware and software technologies, especially in the recent few years. We present in this paper a simple but extremely fast MATLAB implementation on a modern GPU, which can be easily reproduced, for solving 3D Poisson type equations using a spectral-element method. In particular, it costs less than one second on a Nvidia A100 for solving a Poisson equation with one billion degree of freedoms.
FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect has yet to be addressed for FP8. This paper presents a methodology to select the scalings for FP8 linear layers, based on dynamically updating per-tensor scales for the weights, gradients and activations. We apply this methodology to train and validate large language models of the type of GPT and Llama 2 using FP8, for model sizes ranging from 111M to 70B. To facilitate the understanding of the FP8 dynamics, our results are accompanied by plots of the per-tensor scale distribution for weights, activations and gradients during both training and inference.