We present an algorithmic framework for quantum-inspired classical algorithms on close-to-low-rank matrices, generalizing the series of results started by Tang's breakthrough quantum-inspired algorithm for recommendation systems [STOC'19]. Motivated by quantum linear algebra algorithms and the quantum singular value transformation (SVT) framework of Gily\'en, Su, Low, and Wiebe [STOC'19], we develop classical algorithms for SVT that run in time independent of input dimension, under suitable quantum-inspired sampling assumptions. Our results give compelling evidence that in the corresponding QRAM data structure input model, quantum SVT does not yield exponential quantum speedups. Since the quantum SVT framework generalizes essentially all known techniques for quantum linear algebra, our results, combined with sampling lemmas from previous work, suffice to generalize all recent results about dequantizing quantum machine learning algorithms. In particular, our classical SVT framework recovers and often improves the dequantization results on recommendation systems, principal component analysis, supervised clustering, support vector machines, low-rank regression, and semidefinite program solving. We also give additional dequantization results on low-rank Hamiltonian simulation and discriminant analysis. Our improvements come from identifying the key feature of the quantum-inspired input model that is at the core of all prior quantum-inspired results: $\ell^2$-norm sampling can approximate matrix products in time independent of their dimension. We reduce all our main results to this fact, making our exposition concise, self-contained, and intuitive.
We derive an intuitionistic version of G\"odel-L\"ob modal logic ($\sf{GL}$) in the style of Simpson, via proof theoretic techniques. We recover a labelled system, $\sf{\ell IGL}$, by restricting a non-wellfounded labelled system for $\sf{GL}$ to have only one formula on the right. The latter is obtained using techniques from cyclic proof theory, sidestepping the barrier that $\sf{GL}$'s usual frame condition (converse well-foundedness) is not first-order definable. While existing intuitionistic versions of $\sf{GL}$ are typically defined over only the box (and not the diamond), our presentation includes both modalities. Our main result is that $\sf{\ell IGL}$ coincides with a corresponding semantic condition in birelational semantics: the composition of the modal relation and the intuitionistic relation is conversely well-founded. We call the resulting logic $\sf{IGL}$. While the soundness direction is proved using standard ideas, the completeness direction is more complex and necessitates a detour through several intermediate characterisations of $\sf{IGL}$.
Hawkes processes are often applied to model dependence and interaction phenomena in multivariate event data sets, such as neuronal spike trains, social interactions, and financial transactions. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is generally a computationally expensive task, all the more with Bayesian estimation methods. In particular, for generalised nonlinear Hawkes processes, Monte-Carlo Markov Chain methods applied to compute the doubly intractable posterior distribution are not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we first unify existing variational Bayes approaches under a general nonparametric inference framework, and analyse the asymptotic properties of these methods under easily verifiable conditions on the prior, the variational class, and the nonlinear model. Secondly, we propose a novel sparsity-inducing procedure, and derive an adaptive mean-field variational algorithm for the popular sigmoid Hawkes processes. Our algorithm is parallelisable and therefore computationally efficient in high-dimensional setting. Through an extensive set of numerical simulations, we also demonstrate that our procedure is able to adapt to the dimensionality of the parameter of the Hawkes process, and is partially robust to some type of model mis-specification.
Blind quantum computation (BQC) protocols enable quantum algorithms to be executed on third-party quantum agents while keeping the data and algorithm confidential. The previous proposals for measurement-based BQC require preparing a highly entangled cluster state. In this paper, we show that such a requirement is not necessary. Our protocol only requires pre-shared bell pairs between delegated quantum agents, and there is no requirement for any classical or quantum information exchange between agents during the execution. Our proposal requires fewer quantum resources than previous proposals by eliminating the need for a universal cluster state.
This paper presents the error analysis of numerical methods on graded meshes for stochastic Volterra equations with weakly singular kernels. We first prove a novel regularity estimate for the exact solution via analyzing the associated convolution structure. This reveals that the exact solution exhibits an initial singularity in the sense that its H\"older continuous exponent on any neighborhood of $t=0$ is lower than that on every compact subset of $(0,T]$. Motivated by the initial singularity, we then construct the Euler--Maruyama method, fast Euler--Maruyama method, and Milstein method based on graded meshes. By establishing their pointwise-in-time error estimates, we give the grading exponents of meshes to attain the optimal uniform-in-time convergence orders, where the convergence orders improve those of the uniform mesh case. Numerical experiments are finally reported to confirm the sharpness of theoretical findings.
We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore, if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.
Learning personalization has proven its effectiveness in enhancing learner performance. Therefore, modern digital learning platforms have been increasingly depending on recommendation systems to offer learners personalized suggestions of learning materials. Learners can utilize those recommendations to acquire certain skills for the labor market or for their formal education. Personalization can be based on several factors, such as personal preference, social connections or learning context. In an educational environment, the learning context plays an important role in generating sound recommendations, which not only fulfill the preferences of the learner, but also correspond to the pedagogical goals of the learning process. This is because a learning context describes the actual situation of the learner at the moment of requesting a learning recommendation. It provides information about the learner current state of knowledge, goal orientation, motivation, needs, available time, and other factors that reflect their status and may influence how learning recommendations are perceived and utilized. Context aware recommender systems have the potential to reflect the logic that a learning expert may follow in recommending materials to students with respect to their status and needs. In this paper, we review the state-of-the-art approaches for defining a user learning-context. We provide an overview of the definitions available, as well as the different factors that are considered when defining a context. Moreover, we further investigate the links between those factors and their pedagogical foundations in learning theories. We aim to provide a comprehensive understanding of contextualized learning from both pedagogical and technical points of view. By combining those two viewpoints, we aim to bridge a gap between both domains, in terms of contextualizing learning recommendations.
We present a multigrid algorithm to solve efficiently the large saddle-point systems of equations that typically arise in PDE-constrained optimization under uncertainty. The algorithm is based on a collective smoother that at each iteration sweeps over the nodes of the computational mesh, and solves a reduced saddle-point system whose size depends on the number $N$ of samples used to discretized the probability space. We show that this reduced system can be solved with optimal $O(N)$ complexity. We test the multigrid method on three problems: a linear-quadratic problem for which the multigrid method is used to solve directly the linear optimality system; a nonsmooth problem with box constraints and $L^1$-norm penalization on the control, in which the multigrid scheme is used within a semismooth Newton iteration; a risk-adverse problem with the smoothed CVaR risk measure where the multigrid method is called within a preconditioned Newton iteration. In all cases, the multigrid algorithm exhibits very good performances and robustness with respect to all parameters of interest.
Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.