Surface integral equations (SIEs)-based boundary element methods are widely used for analyzing electromagnetic scattering scenarii. However, after discretization of SIEs, the spectrum and eigenvectors of the boundary element matrices are not usually representative of the spectrum and eigenfunctions of the underlying surface integral operators, which can be problematic for methods that rely heavily on spectral properties. To address this issue, we delineate some efficient algorithms that allow for the computation of matrix square roots and inverse square roots of the Gram matrices corresponding to the discretization scheme, which can be used for revealing the spectrum of standard electromagnetic integral operators. The algorithms, which are based on properly chosen expansions of the square root and inverse square root functions, are quite effective when applied to several of the most relevant Gram matrices used for boundary element discretizations in electromagnetics. Tables containing different sets of expansion coefficients are provided along with comparative numerical experiments that evidence advantages and disadvantages of the different approaches. In addition, to demonstrate the spectrum-revealing properties of the proposed techniques, they are applied to the discretization of the problem of scattering by a sphere for which the analytic spectrum is known.
Machine-learning models consist of kernels, which are algorithms applying operations on tensors -- data indexed by a linear combination of natural numbers. Examples of kernels include convolutions, transpositions, and vectorial products. There are many ways to implement a kernel. These implementations form the kernel's optimization space. Kernel scheduling is the problem of finding the best implementation, given an objective function -- typically execution speed. Kernel optimizers such as Ansor, Halide, and AutoTVM solve this problem via search heuristics, which combine two phases: exploration and exploitation. The first step evaluates many different kernel optimization spaces. The latter tries to improve the best implementations by investigating a kernel within the same space. For example, Ansor combines kernel generation through sketches for exploration and leverages an evolutionary algorithm to exploit the best sketches. In this work, we demonstrate the potential to reduce Ansor's search time while enhancing kernel quality by incorporating Droplet Search, an AutoTVM algorithm, into Ansor's exploration phase. The approach involves limiting the number of samples explored by Ansor, selecting the best, and exploiting it with a coordinate descent algorithm. By applying this approach to the first 300 kernels that Ansor generates, we usually obtain better kernels in less time than if we let Ansor analyze 10,000 kernels. This result has been replicated in 20 well-known deep-learning models (AlexNet, ResNet, VGG, DenseNet, etc.) running on four architectures: an AMD Ryzen 7 (x86), an NVIDIA A100 tensor core, an NVIDIA RTX 3080 GPU, and an ARM A64FX. A patch with this combined approach was approved in Ansor in February 2024. As evidence of the generality of this search methodology, a similar patch, achieving equally good results, was submitted to TVM's MetaSchedule in June 2024.
Traditional regulations of chemical exposure tend to focus on single exposures, overlooking the potential amplified toxicity due to multiple concurrent exposures. We are interested in understanding the average outcome if exposures were limited to fall under a multivariate threshold. Because threshold levels are often unknown a priori, we provide an algorithm that finds exposure threshold levels where the expected outcome is maximized or minimized. Because both identifying thresholds and estimating policy effects on the same data would lead to overfitting bias, we also provide a data-adaptive estimation framework, which allows for both threshold discovery and policy estimation. Simulation studies show asymptotic convergence to the optimal exposure region and to the true effect of an intervention. We demonstrate how our method identifies true interactions in a public synthetic mixture data set. Finally, we applied our method to NHANES data to discover metal exposures that have the most harmful effects on telomere length. We provide an implementation in the CVtreeMLE R package.
Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup duration, and scale-dependent optimizer tuning. With these factors corrected, we obtain excellent agreement with the Hoffmann et al. (i.e., "Chinchilla") scaling law. Counter to a hypothesis of Hoffmann et al., we find that careful learning rate decay is not essential for the validity of their scaling law. As a secondary result, we derive scaling laws for the optimal learning rate and batch size, finding that tuning the AdamW $\beta_2$ parameter is essential at lower batch sizes.
Particle-in-Cell Monte Carlo simulations on large-scale systems play a fundamental role in understanding the complexities of plasma dynamics in fusion devices. Efficient handling and analysis of vast datasets are essential for advancing these simulations. Previously, we addressed this challenge by integrating openPMD with BIT1, a Particle-in-Cell Monte Carlo code, streamlining data streaming and storage. This integration not only enhanced data management but also improved write throughput and storage efficiency. In this work, we delve deeper into the impact of BIT1 openPMD BP4 instrumentation, monitoring, and in-situ analysis. Utilizing cutting-edge profiling and monitoring tools such as gprof, CrayPat, Cray Apprentice2, IPM, and Darshan, we dissect BIT1's performance post-integration, shedding light on computation, communication, and I/O operations. Fine-grained instrumentation offers insights into BIT1's runtime behavior, while immediate monitoring aids in understanding system dynamics and resource utilization patterns, facilitating proactive performance optimization. Advanced visualization techniques further enrich our understanding, enabling the optimization of BIT1 simulation workflows aimed at controlling plasma-material interfaces with improved data analysis and visualization at every checkpoint without causing any interruption to the simulation.
Randomized subspace approximation with "matrix sketching" is an effective approach for constructing approximate partial singular value decompositions (SVDs) of large matrices. The performance of such techniques has been extensively analyzed, and very precise estimates on the distribution of the residual errors have been derived. However, our understanding of the accuracy of the computed singular vectors (measured in terms of the canonical angles between the spaces spanned by the exact and the computed singular vectors, respectively) remains relatively limited. In this work, we present practical bounds and estimates for canonical angles of randomized subspace approximation that can be computed efficiently either a priori or a posteriori, without assuming prior knowledge of the true singular subspaces. Under moderate oversampling in the randomized SVD, our prior probabilistic bounds are asymptotically tight and can be computed efficiently, while bringing a clear insight into the balance between oversampling and power iterations given a fixed budget on the number of matrix-vector multiplications. The numerical experiments demonstrate the empirical effectiveness of these canonical angle bounds and estimates on different matrices under various algorithmic choices for the randomized SVD.
Although numerical models provide accurate solutions for ice sheet dynamics based on physics laws, they accompany intensified computational demands to solve partial differential equations. In recent years, convolutional neural networks (CNNs) have been widely used as statistical emulators for those numerical models. However, since CNNs operate on regular grids, they cannot represent the refined meshes and computational efficiency of finite-element numerical models. Therefore, instead of CNNs, this study adopts an equivariant graph convolutional network (EGCN) as an emulator for the ice sheet dynamics modeling. EGCN reproduces ice thickness and velocity changes in the Helheim Glacier, Greenland, and Pine Island Glacier, Antarctica, with 260 times and 44 times faster computation time, respectively. Compared to the traditional CNN and graph convolutional network, EGCN shows outstanding accuracy in thickness prediction near fast ice streams by preserving the equivariance to the translation and rotation of graphs.
We consider the incomplete multi-graph matching problem, which is a generalization of the NP-hard quadratic assignment problem for matching multiple finite sets. Multi-graph matching plays a central role in computer vision, e.g., for matching images or shapes, so that a number of dedicated optimization techniques have been proposed. While the closely related NP-hard multi-dimensional assignment problem (MDAP) has been studied for decades in the operations research community, it only considers complete matchings and has a different cost structure. We bridge this gap and transfer well-known approximation algorithms for the MDAP to incomplete multi-graph matching. To this end, we revisit respective algorithms, adapt them to incomplete multi-graph matching, and propose their extended and parallelized versions. Our experimental validation shows that our new method substantially outperforms the previous state of the art in terms of objective and runtime. Our algorithm matches, for example, 29 images with more than 500 keypoints each in less than two minutes, whereas the fastest considered competitor requires at least half an hour while producing far worse results.
Investigating solutions of nonlinear equation systems is challenging in a general framework, especially if the equations contain uncertainties about parameters modeled by probability densities. Such random equations, understood as stationary (non-dynamical) equations with parameters as random variables, have a long history and a broad range of applications. In this work, we study nonlinear random equations by combining them with mixture model parameter random variables in order to investigate the combinatorial complexity of such equations and how this can be utilized practically. We derive a general likelihood function and posterior density of approximate best fit solutions while avoiding significant restrictions about the type of nonlinearity or mixture models, and demonstrate their numerically efficient application for the applied researcher. In the results section we are specifically focusing on example simulations of approximate likelihood/posterior solutions for random linear equation systems, nonlinear systems of random conic section equations, as well as applications to portfolio optimization, stochastic control and random matrix theory in order to show the wide applicability of the presented methodology.
Structural vector autoregressive (SVAR) processes are commonly used time series models to identify and quantify causal interactions between dynamically interacting processes from observational data. The causal relationships between these processes can be effectively represented by a finite directed process graph - a graph that connects two processes whenever there is a direct delayed or simultaneous effect between them. Recent research has introduced a framework for quantifying frequency domain causal effects along paths on the process graph. This framework allows to identify how the spectral density of one process is contributing to the spectral density of another. In the current work, we characterise the asymptotic distribution of causal effect and spectral contribution estimators in terms of algebraic relations dictated by the process graph. Based on the asymptotic distribution we construct approximate confidence intervals and Wald type hypothesis tests for the estimated effects and spectral contributions. Under the assumption of causal sufficiency, we consider the class of differentiable estimators for frequency domain causal quantities, and within this class we identify the asymptotically optimal estimator. We illustrate the frequency domain Wald tests and uncertainty approximation on synthetic data, and apply them to analyse the impact of the 10 to 11 year solar cycle on the North Atlantic Oscillation (NAO). Our results confirm a significant effect of the solar cycle on the NAO at the 10 to 11 year time scale.
We propose an efficient offline pointing calibration method for operational antenna systems which does not require any downtime. Our approach minimizes the calibration effort and exploits technical signal information which is typically used for monitoring and control purposes in ground station operations. Using a standard antenna interface and data from an operational satellite contact, we come up with a robust strategy for training data set generation. On top of this, we learn the parameters of a suitable coordinate transform by means of linear regression. In our experiments, we show the usefulness of the method in a real-world setup.