We propose algorithms for efficient time integration of large systems of oscillatory second order ordinary differential equations (ODEs) whose solution can be expressed in terms of trigonometric matrix functions. Our algorithms are based on a residual notion for second order ODEs, which allows to extend the ``residual-time restarting'' Krylov subspace framework -- which was recently introduced for exponential and $\varphi$-functions occurring in time integration of first order ODEs -- to our setting. We then show that the computational cost can be further reduced in many cases by using our restarting in the Gautschi cosine scheme. We analyze residual convergence in terms of Faber and Chebyshev series and supplement these theoretical results by numerical experiments illustrating the efficiency of the proposed methods.
Consider the Toeplitz matrix $T_n(f)$ generated by the symbol $f(\theta)=\hat{f}_r e^{\mathbf{i}r\theta}+\hat{f}_0+\hat{f}_{-s} e^{-\mathbf{i}s\theta}$, where $\hat{f}_r, \hat{f}_0, \hat{f}_{-s} \in \mathbb{C}$ and $0<r<n,~0<s<n$. For $r=s=1$ we have the classical tridiagonal Toeplitz matrices, for which the eigenvalues and eigenvectors are known. Similarly, the eigendecompositions are known for $1<r=s$, when the generated matrices are ``symmetrically sparse tridiagonal''. In the current paper we study the eigenvalues of $T_n(f)$ for $1\leq r<s$, which are ``non-symmetrically sparse tridiagonal''. We propose an algorithm which constructs one or two ad hoc matrices smaller than $T_n(f)$, whose eigenvalues are sufficient for determining the full spectrum of $T_n(f)$. The algorithm is explained through use of a conjecture for which examples and numerical experiments are reported for supporting it and for clarifying the presentation. Open problems are briefly discussed.
The recently introduced maximum-likelihood (ML) decoding scheme called guessing random additive noise decoding (GRAND) has demonstrated a remarkably low time complexity in high signal-to-noise ratio (SNR) regimes. However, the complexity is not as low at low SNR regimes and low code rates. To mitigate this concern, we propose a scheme for a near-ML variant of GRAND called ordered reliability bits GRAND (or ORBGRAND), which divides codewords into segments based on the properties of the underlying code, generates sub-patterns for each segment consistent with the syndrome (thus reducing the number of inconsistent error patterns generated), and combines them in a near-ML order using two-level integer partitions of logistic weight. The numerical evaluation demonstrates that the proposed scheme, called segmented ORBGRAND, significantly reduces the average number of queries at any SNR regime. Moreover, the segmented ORBGRAND with abandonment also improves the error correction performance.
Reproducing kernel Hilbert $C^*$-module (RKHM) is a generalization of reproducing kernel Hilbert space (RKHS) by means of $C^*$-algebra, and the Perron-Frobenius operator is a linear operator related to the composition of functions. Combining these two concepts, we present deep RKHM, a deep learning framework for kernel methods. We derive a new Rademacher generalization bound in this setting and provide a theoretical interpretation of benign overfitting by means of Perron-Frobenius operators. By virtue of $C^*$-algebra, the dependency of the bound on output dimension is milder than existing bounds. We show that $C^*$-algebra is a suitable tool for deep learning with kernels, enabling us to take advantage of the product structure of operators and to provide a clear connection with convolutional neural networks. Our theoretical analysis provides a new lens through which one can design and analyze deep kernel methods.
In this work, high order asymptotic preserving schemes are constructed and analysed for kinetic equations under a diffusive scaling. The framework enables to consider different cases: the diffusion equation, the advection-diffusion equation and the presence of inflow boundary conditions. Starting from the micro-macro reformulation of the original kinetic equation, high order time integrators are introduced. This class of numerical schemes enjoys the Asymptotic Preserving (AP) property for arbitrary initial data and degenerates when $\epsilon$ goes to zero into a high order scheme which is implicit for the diffusion term, which makes it free from the usual diffusion stability condition. The space discretization is also discussed and high order methods are also proposed based on classical finite differences schemes. The Asymptotic Preserving property is analysed and numerical results are presented to illustrate the properties of the proposed schemes in different regimes.
The numerical solution of the generalized eigenvalue problem for a singular matrix pencil is challenging due to the discontinuity of its eigenvalues. Classically, such problems are addressed by first extracting the regular part through the staircase form and then applying a standard solver, such as the QZ algorithm. Recently, several novel approaches have been proposed to transform the singular pencil into a regular pencil by relatively simple randomized modifications. In this work, we analyze three such methods by Hochstenbach, Mehl, and Plestenjak that modify, project, or augment the pencil using random matrices. All three methods rely on the normal rank and do not alter the finite eigenvalues of the original pencil. In this work we analyze these methods and show that the eigenvalue condition numbers of the transformed pencils are unlikely to be much larger than the $\delta$-weak eigenvalue condition numbers, introduced by Lotz and Noferini, of the original pencil. This not only indicates favorable numerical stability but also shows that these condition numbers are a reliable criterion for detecting finite eigenvalues. We also provide evidence that, from a numerical stability perspective, the use of complex instead of real random matrices is preferable even for real singular matrix pencils and real eigenvalues.
Temporal point processes (TPP) are a natural tool for modeling event-based data. Among all TPP models, Hawkes processes have proven to be the most widely used, mainly due to their adequate modeling for various applications, particularly when considering exponential or non-parametric kernels. Although non-parametric kernels are an option, such models require large datasets. While exponential kernels are more data efficient and relevant for specific applications where events immediately trigger more events, they are ill-suited for applications where latencies need to be estimated, such as in neuroscience. This work aims to offer an efficient solution to TPP inference using general parametric kernels with finite support. The developed solution consists of a fast $\ell_2$ gradient-based solver leveraging a discretized version of the events. After theoretically supporting the use of discretization, the statistical and computational efficiency of the novel approach is demonstrated through various numerical experiments. Finally, the method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG). Given the use of general parametric kernels, results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
The design of algorithms for political redistricting generally takes one of two approaches: optimize an objective such as compactness or, drawing on fair division, construct a protocol whose outcomes guarantee partisan fairness. We aim to have the best of both worlds by optimizing an objective subject to a binary fairness constraint. As the fairness constraint we adopt the geometric target, which requires the number of seats won by each party to be at least the average (rounded down) of its outcomes under the worst and best partitions of the state. To study the feasibility of this approach, we introduce a new model of redistricting that closely mirrors the classic model of cake-cutting. This model has two innovative features. First, in any part of the state there is an underlying 'density' of voters with political leanings toward any given party, making it impossible to finely separate voters for different parties into different districts. This captures a realistic constraint that previously existing theoretical models of redistricting tend to ignore. Second, parties may disagree on the distribution of voters - whether by genuine disagreement or attempted strategic behavior. In the absence of a 'ground truth' distribution, a redistricting algorithm must therefore aim to simultaneously be fair to each party with respect to its own reported data. Our main theoretical result is that, surprisingly, the geometric target is always feasible with respect to arbitrarily diverging data sets on how voters are distributed. Any standard for fairness is only useful if it can be readily satisfied in practice. Our empirical results, which use real election data and maps of six US states, demonstrate that the geometric target is always feasible, and that imposing it as a fairness constraint comes at almost no cost to three well-studied optimization objectives.
The study of robustness has received much attention due to its inevitability in data-driven settings where many systems face uncertainty. One such example of concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet there only exists a limited number of works dedicated to this direction. In particular, there is the work of Kirschner et al. (2020), which bridges the existing literature of Distributionally Robust Optimization (DRO) by casting the BO problem from the lens of DRO. While this work is pioneering, it admittedly suffers from various practical shortcomings such as finite contexts assumptions, leaving behind the main question Can one devise a computationally tractable algorithm for solving this DRO-BO problem? In this work, we tackle this question to a large degree of generality by considering robustness against data-shift in $\phi$-divergences, which subsumes many popular choices, such as the $\chi^2$-divergence, Total Variation, and the extant Kullback-Leibler (KL) divergence. We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds. We then show experimentally that our method surpasses existing methods, attesting to the theoretical results.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.