A novel optimization procedure for the generation of stability polynomials of stabilized explicit Runge-Kutta methods is devised. Intended for semidiscretizations of hyperbolic partial differential equations, the herein developed approach allows the optimization of stability polynomials with more than hundred stages. A potential application of these high degree stability polynomials are problems with locally varying characteristic speeds as found in non-uniformly refined meshes and different wave speeds. To demonstrate the applicability of the stability polynomials we construct 2N storage many-stage Runge-Kutta methods that match their designed second order of accuracy when applied to a range of linear and nonlinear hyperbolic PDEs with smooth solutions. The methods are constructed to reduce the amplification of round off errors which becomes a significant concern for these many-stage methods.
There is growing interest in using machine learning (ML) methods for structural metamodeling due to the substantial computational cost of traditional simulations. Purely data-driven strategies often face limitations in model robustness, interpretability, and dependency on extensive data. To address these challenges, this paper introduces a novel physics-informed machine learning (PiML) method that integrates scientific principles and physical laws into deep neural networks to model seismic responses of nonlinear structures. The approach constrains the ML model's solution space within known physical bounds through three main features: dimensionality reduction via combined model order reduction and wavelet analysis, long short-term memory (LSTM) networks, and Newton's second law. Dimensionality reduction addresses structural systems' redundancy and boosts efficiency while extracting essential features through wavelet analysis. LSTM networks capture temporal dependencies for accurate time-series predictions. Manipulating the equation of motion helps learn system nonlinearities and confines solutions within physically interpretable results. These attributes allow for model training with sparse data, enhancing accuracy, interpretability, and robustness. Furthermore, a dataset of archetype steel moment resistant frames under seismic loading, available in the DesignSafe-CI Database [1], is considered for evaluation. The resulting metamodel handles complex data better than existing physics-guided LSTM models and outperforms other non-physics data-driven networks.
The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes from the environment map reflection model, which requires accurate surface normal while simultaneously bottlenecks normal estimation with discontinuous gradients. We leverage the per-pixel reflection gradients generated by deferred shading to bridge the optimization process of neighboring Gaussians, allowing nearly correct normal estimations to gradually propagate and eventually spread over all reflective objects. Our method significantly outperforms state-of-the-art techniques and concurrent work in synthesizing high-quality specular reflection effects, demonstrating a consistent improvement of peak signal-to-noise ratio (PSNR) for both synthetic and real-world scenes, while running at a frame rate almost identical to vanilla Gaussian splatting.
We propose a distributed implementation for integrated sensing and communication (ISAC) backed by a massive multiple input multiple output (CF-mMIMO) architecture without cells. Distributed multi-antenna access points (APs) simultaneously serve communication users (UEs) and emit probing signals towards multiple specified zones for sensing. The APs can switch between communication and sensing modes, and adjust their transmit power based on the network settings and sensing and communication operations' requirements. By considering local partial zero-forcing and maximum-ratio-transmit precoding at the APs for communication and sensing, respectively, we first derive closed-form expressions for the spectral efficiency (SE) of the UEs and the mainlobe-to-average-sidelobe ratio (MASR) of the sensing zones. Then, a joint operation mode selection and power control design problem is formulated to maximize the SE fairness among the UEs, while ensuring specific levels of MASR for sensing zones. The complicated mixed-integer problem is relaxed and solved via successive convex approximation approach. We further propose a low-complexity design, where AP mode selection is designed through a greedy algorithm and then power control is designed based on this chosen mode. Our findings reveal that the proposed scheme can consistently ensure a sensing success rate of $100\%$ for different network setups with a satisfactory fairness among all UEs.
The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology and lowering the schedules to various runtimes and interconnect technologies. We also propose a novel topology that delivers near-optimal all-to-all performance.
We develop a novel approach for efficiently applying variational quantum linear solver (VQLS) in context of structured sparse matrices. Such matrices frequently arise during numerical solution of partial differential equations which are ubiquitous in science and engineering. Conventionally, Pauli basis is used for linear combination of unitary (LCU) decomposition of the underlying matrix to facilitate the evaluation the global/local VQLS cost functions. However, Pauli basis in worst case can result in number of LCU terms that scale quadratically with respect to the matrix size. We show that by using an alternate basis one can better exploit the sparsity and underlying structure of matrix leading to number of tensor product terms which scale only logarithmically with respect to the matrix size. Given this new basis is comprised of non-unitary operators, we employ the concept of unitary completion to design efficient quantum circuits for computing the global/local VQLS cost functions. We compare our approach with other related concepts in the literature including unitary dilation and measurement in Bell basis, and discuss its pros/cons while using VQLS applied to Heat equation as an example.
We provide in this work an algorithm for approximating a very broad class of symmetric Toeplitz matrices to machine precision in $\mathcal{O}(n \log n)$ time. In particular, for a Toeplitz matrix $\mathbf{\Sigma}$ with values $\mathbf{\Sigma}_{j,k} = h_{|j-k|} = \int_{-1/2}^{1/2} e^{2 \pi i |j-k| \omega} S(\omega) \mathrm{d} \omega$ where $S(\omega)$ is piecewise smooth, we give an approximation $\mathbf{\mathcal{F}} \mathbf{\Sigma} \mathbf{\mathcal{F}}^H \approx \mathbf{D} + \mathbf{U} \mathbf{V}^H$, where $\mathbf{\mathcal{F}}$ is the DFT matrix, $\mathbf{D}$ is diagonal, and the matrices $\mathbf{U}$ and $\mathbf{V}$ are in $\mathbb{C}^{n \times r}$ with $r \ll n$. Studying these matrices in the context of time series, we offer a theoretical explanation of this structure and connect it to existing spectral-domain approximation frameworks. We then give a complete discussion of the numerical method for assembling the approximation and demonstrate its efficiency for improving Whittle-type likelihood approximations, including dramatic examples where a correction of rank $r = 2$ to the standard Whittle approximation increases the accuracy from $3$ to $14$ digits for a matrix $\mathbf{\Sigma} \in \mathbb{R}^{10^5 \times 10^5}$. The method and analysis of this work applies well beyond time series analysis, providing an algorithm for extremely accurate direct solves with a wide variety of symmetric Toeplitz matrices. The analysis employed here largely depends on asymptotic expansions of oscillatory integrals, and also provides a new perspective on when existing spectral-domain approximation methods for Gaussian log-likelihoods can be particularly problematic.
In this paper we consider the filtering problem associated to partially observed McKean-Vlasov stochastic differential equations (SDEs). The model consists of data that are observed at regular and discrete times and the objective is to compute the conditional expectation of (functionals) of the solutions of the SDE at the current time. This problem, even the ordinary SDE case is challenging and requires numerical approximations. Based upon the ideas in [3, 12] we develop a new particle filter (PF) and multilevel particle filter (MLPF) to approximate the afore-mentioned expectations. We prove under assumptions that, for $\epsilon>0$, to obtain a mean square error of $\mathcal{O}(\epsilon^2)$ the PF has a cost per-observation time of $\mathcal{O}(\epsilon^{-5})$ and the MLPF costs $\mathcal{O}(\epsilon^{-4})$ (best case) or $\mathcal{O}(\epsilon^{-4}\log(\epsilon)^2)$ (worst case). Our theoretical results are supported by numerical experiments.
Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, study populations, and data collection protocols, among others. To this end, we propose a framework for multi-study HTE estimation that accounts for between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with machine learning (ML) in the multi-study setting. It involves a data-adaptive objective function that links study-specific treatment effects with nuisance functions through membership probabilities, which enable information to be borrowed across potentially heterogeneous studies. The multi-study R-learner framework can combine data from randomized controlled trials, observational studies, or a combination of both. It's easy to implement and flexible in its ability to incorporate ML for estimating HTEs, nuisance functions, and membership probabilities. In the series estimation framework, we show that the multi-study R-learner is asymptotically normal and more efficient than the R-learner when there is between-study heterogeneity in the propensity score model under homoscedasticity. We illustrate using cancer data that the proposed method performs favorably compared to existing approaches in the presence of between-study heterogeneity.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.