We study the meta-learning of numerical algorithms for scientific computing, which combines the mathematically driven, handcrafted design of general algorithm structure with a data-driven adaptation to specific classes of tasks. This represents a departure from the classical approaches in numerical analysis, which typically do not feature such learning-based adaptations. As a case study, we develop a machine learning approach that automatically learns effective solvers for initial value problems in the form of ordinary differential equations (ODEs), based on the Runge-Kutta (RK) integrator architecture. By combining neural network approximations and meta-learning, we show that we can obtain high-order integrators for targeted families of differential equations without the need for computing integrator coefficients by hand. Moreover, we demonstrate that in certain cases we can obtain superior performance to classical RK methods. This can be attributed to certain properties of the ODE families being identified and exploited by the approach. Overall, this work demonstrates an effective, learning-based approach to the design of algorithms for the numerical solution of differential equations, an approach that can be readily extended to other numerical tasks.
There has been a surge of interest in developing robust estimators for models with heavy-tailed data in statistics and machine learning. This paper proposes a log-truncated M-estimator for a large family of statistical regressions and establishes its excess risk bound under the condition that the data have $(1+\varepsilon)$-th moment with $\varepsilon \in (0,1]$. With an additional assumption on the associated risk function, we obtain an $\ell_2$-error bound for the estimation. Our theorems are applied to establish robust M-estimators for concrete regressions. Besides convex regressions such as quantile regression and generalized linear models, many non-convex regressions can also be fit into our theorems, we focus on robust deep neural network regressions, which can be solved by the stochastic gradient descent algorithms. Simulations and real data analysis demonstrate the superiority of log-truncated estimations over standard estimations.
Industry is moving towards large-scale systems where processor cores, memories, accelerators, etc.\ are bundled via 2.5D integration. These various components are fabricated separately as chiplets and then integrated using an interconnect carrier, a so-called interposer. This new design style provides benefits in terms of yield as well as economies of scale, as chiplets may come from various third-party vendors, and be integrated into one sophisticated system. The benefits of this approach, however, come at the cost of new challenges for the system's security and integrity when many third-party component chiplets, some from not fully trusted vendors, are integrated. Here, we explore these challenges, but also promises, for modern interposer-based systems of cache-coherent, multi-core chiplets. First, we introduce a new, coherence-based attack, GETXspy, wherein a single compromised chiplet can expose a high-bandwidth side/covert-channel in an ostensibly secure system. We further show that prior art is insufficient to stop this new attack. Second, we propose using an active interposer as generic, secure-by-construction platform that forms a physical root of trust for modern 2.5D systems. Our scheme has limited overhead, restricted to the active interposer, allowing the chiplets and the coherence system to remain untouched. We show that our scheme prevents a wide range of attacks, including but not limited to our GETXspy attack, with little overhead on system performance, $\sim$4\%. This overhead reduces as workloads increase, ensuring scalability of the scheme.
This paper is concerned with the efficient spectral solutions for weakly singular nonlocal diffusion equations with Dirichlet-type volume constraints. This type of equation contains an integral operator which typically has a singularity at the midpoint of the integral domain, and the approximation of such the integral operator is one of the essential difficulties in solving the nonlocal equations. To overcome this problem, two-sided Jacobi spectral quadrature rules are proposed to develop a Jacobi spectral collocation method for the nonlocal diffusion equations. Rigorous convergence analysis of the proposed method is presented in $L^\infty$ norms, and we further prove that the Jacobi collocation solution converges to its corresponding local limit as nonlocal interactions vanish. Numerical examples are given to verify the theoretical results.
In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.
In this paper, a nonlinear system of fractional ordinary differential equations with multiple scales in time is investigated. We are interested in the effective long-term computation of the solution. The main challenge is how to obtain the solution of the coupled problem at a lower computational cost. We analysize a multiscale method for the nonlinear system where the fast system has a periodic applied force and the slow equation contains fractional derivatives as a simplication of the atherosclerosis with a plaque growth. A local periodic equation is derived to approximate the original system and the error estimates are given. Then a finite difference method is designed to approximate the original and the approximate problems. We construct four examples, including three with exact solutions and one following the original problem setting, to test the accuracy and computational efficiency of the proposed method. It is observed that, the computational time is very much reduced and the multiscale method performs very well in comparison to fully resolved simulation for the case of small time scale separation. The larger the time scale separation is, the more effective the multiscale method is.
The positive definiteness of discrete time-fractional derivatives is fundamental to the numerical stability (in the energy sense) for time-fractional phase-field models. A novel technique is proposed to estimate the minimum eigenvalue of discrete convolution kernels generated by the nonuniform L1, half-grid based L1 and time-averaged L1 formulas of the fractional Caputo's derivative. The main discrete tools are the discrete orthogonal convolution kernels and discrete complementary convolution kernels. Certain variational energy dissipation laws at discrete levels of the variable-step L1-type methods are then established for time-fractional Cahn-Hilliard model.They are shown to be asymptotically compatible, in the fractional order limit $\alpha\rightarrow1$, with the associated energy dissipation law for the classical Cahn-Hilliard equation. Numerical examples together with an adaptive time-stepping procedure are provided to demonstrate the effectiveness of the proposed methods.
We develop a theoretical analysis for special neural network architectures, termed operator recurrent neural networks, for approximating nonlinear functions whose inputs are linear operators. Such functions commonly arise in solution algorithms for inverse boundary value problems. Traditional neural networks treat input data as vectors, and thus they do not effectively capture the multiplicative structure associated with the linear operators that correspond to the data in such inverse problems. We therefore introduce a new family that resembles a standard neural network architecture, but where the input data acts multiplicatively on vectors. Motivated by compact operators appearing in boundary control and the analysis of inverse boundary value problems for the wave equation, we promote structure and sparsity in selected weight matrices in the network. After describing this architecture, we study its representation properties as well as its approximation properties. We furthermore show that an explicit regularization can be introduced that can be derived from the mathematical analysis of the mentioned inverse problems, and which leads to certain guarantees on the generalization properties. We observe that the sparsity of the weight matrices improves the generalization estimates. Lastly, we discuss how operator recurrent networks can be viewed as a deep learning analogue to deterministic algorithms such as boundary control for reconstructing the unknown wavespeed in the acoustic wave equation from boundary measurements.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
In this paper, we examine the use case of general adversarial networks (GANs) in the field of marketing. In particular, we analyze how GAN models can replicate text patterns from successful product listings on Airbnb, a peer-to-peer online market for short-term apartment rentals. To do so, we define the Diehl-Martinez-Kamalu (DMK) loss function as a new class of functions that forces the model's generated output to include a set of user-defined keywords. This allows the general adversarial network to recommend a way of rewording the phrasing of a listing description to increase the likelihood that it is booked. Although we tailor our analysis to Airbnb data, we believe this framework establishes a more general model for how generative algorithms can be used to produce text samples for the purposes of marketing.
Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.