We describe a quantum algorithm based on an interior point method for solving a linear program with $n$ inequality constraints on $d$ variables. The algorithm explicitly returns a feasible solution that is $\epsilon$-close to optimal, and runs in time $\sqrt{n}\, \mathrm{poly}(d,\log(n),\log(1/\varepsilon))$ which is sublinear for tall linear programs (i.e., $n \gg d$). Our algorithm speeds up the Newton step in the state-of-the-art interior point method of Lee and Sidford [FOCS '14]. This requires us to efficiently approximate the Hessian and gradient of the barrier function, and these are our main contributions. To approximate the Hessian, we describe a quantum algorithm for the spectral approximation of $A^T A$ for a tall matrix $A \in \mathbb R^{n \times d}$. The algorithm uses leverage score sampling in combination with Grover search, and returns a $\delta$-approximation by making $O(\sqrt{nd}/\delta)$ row queries to $A$. This generalizes an earlier quantum speedup for graph sparsification by Apers and de Wolf [FOCS '20]. To approximate the gradient, we use a recent quantum algorithm for multivariate mean estimation by Cornelissen, Hamoudi and Jerbi [STOC '22]. While a naive implementation introduces a dependence on the condition number of the Hessian, we avoid this by pre-conditioning our random variable using our quantum algorithm for spectral approximation.
This paper presents the first application of the direct parametrisation method for invariant manifolds to a fully coupled multiphysics problem involving the nonlinear vibrations of deformable structures subjected to an electrostatic field. The formulation proposed is intended for model order reduction of electrostatically actuated resonating Micro-Electro-Mechanical Systems (MEMS). The continuous problem is first rewritten in a manner that can be directly handled by the parametrisation method, which relies upon automated asymptotic expansions. A new mixed fully Lagrangian formulation is thus proposed which contains only explicit polynomial nonlinearities, which is then discretised in the framework of finite element procedures. Validation is performed on the classical parallel plate configuration, where different formulations using either the general framework, or an approximation of the electrostatic field due to the geometric configuration selected, are compared. Reduced-order models along these formulations are also compared to full-order simulations operated with a time integration approach. Numerical results show a remarkable performance both in terms of accuracy and wealth of nonlinear effects that can be accounted for. In particular, the transition from hardening to softening behaviour of the primary resonance while increasing the constant voltage component of the electric actuation, is recovered. Secondary resonances leading to superharmonic and parametric resonances are also investigated with the reduced-order model.
Many interesting physical problems described by systems of hyperbolic conservation laws are stiff, and thus impose a very small time-step because of the restrictive CFL stability condition. In this case, one can exploit the superior stability properties of implicit time integration which allows to choose the time-step only from accuracy requirements, and thus avoid the use of small time-steps. We discuss an efficient framework to devise high order implicit schemes for stiff hyperbolic systems without tailoring it to a specific problem. The nonlinearity of high order schemes, due to space- and time-limiting procedures which control nonphysical oscillations, makes the implicit time integration difficult, e.g.~because the discrete system is nonlinear also on linear problems. This nonlinearity of the scheme is circumvented as proposed in (Puppo et al., Comm.~Appl.~Math.~\& Comput., 2023) for scalar conservation laws, where a first order implicit predictor is computed to freeze the nonlinear coefficients of the essentially non-oscillatory space reconstruction, and also to assist limiting in time. In addition, we propose a novel conservative flux-centered a-posteriori time-limiting procedure using numerical entropy indicators to detect troubled cells. The numerical tests involve classical and artificially devised stiff problems using the Euler's system of gas-dynamics.
Nowadays, machine learning algorithms continue to grow in complexity and require a substantial amount of computational resources and energy. For these reasons, there is a growing awareness of the development of new green algorithms and distributed AI can contribute to this. Federated learning (FL) is one of the most active research lines in machine learning, as it allows the training of collaborative models in a distributed way, an interesting option in many real-world environments, such as the Internet of Things, allowing the use of these models in edge computing devices. In this work, we present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round, unlike traditional FL methods that require multiple rounds for convergence. This allows obtaining an effective and efficient model that simplifies the management of the training process. Moreover, this method preserve data privacy by design, a crucial aspect in current data protection regulations. We conducted experiments with large datasets and a large number of federated clients. Despite being based on a network model without hidden layers, it maintains in all cases competitive accuracy results compared to more complex state-of-the-art machine learning models. Furthermore, we show that the method performs equally well in both identically and non-identically distributed scenarios. Finally, it is an environmentally friendly algorithm as it allows significant energy savings during the training process compared to its centralized counterpart.
This work studies nonparametric Bayesian estimation of the intensity function of an inhomogeneous Poisson point process in the important case where the intensity depends on covariates, based on the observation of a single realisation of the point pattern over a large area. It is shown how the presence of covariates allows to borrow information from far away locations in the observation window, enabling consistent inference in the growing domain asymptotics. In particular, optimal posterior contraction rates under both global and point-wise loss functions are derived. The rates in global loss are obtained under conditions on the prior distribution resembling those in the well established theory of Bayesian nonparametrics, here combined with concentration inequalities for functionals of stationary processes to control certain random covariate-dependent loss functions appearing in the analysis. The local rates are derived with an ad-hoc study that builds on recent advances in the theory of P\'olya tree priors, extended to the present multivariate setting with a novel construction that makes use of the random geometry induced by the covariates.
As a crossover frontier of physics and mechanics, quantum computing is showing its great potential in computational mechanics. However, quantum hardware noise remains a critical barrier to achieving accurate simulation results due to the limitation of the current hardware level. In this paper, we integrate error-mitigated quantum computing in data-driven computational mechanics, where the zero-noise extrapolation (ZNE) technique is employed to improve the accuracy of quantum computing. Numerical examples including multiscale simulation of a composite L-shaped beam are conducted with the quantum computer simulator Qpanda, and the results validate the effectiveness of the proposed method. We believe this work presents a promising step towards using the power of quantum computing in computational mechanics.
Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.
High-order tensor methods for solving both convex and nonconvex optimization problems have generated significant research interest, leading to algorithms with optimal global rates of convergence and local rates that are faster than Newton's method. On each iteration, these methods require the unconstrained local minimization of a (potentially nonconvex) multivariate polynomial of degree higher than two, constructed using third-order (or higher) derivative information, and regularized by an appropriate power of regularization. Developing efficient techniques for solving such subproblems is an ongoing topic of research, and this paper addresses the case of the third-order tensor subproblem. We propose the CQR algorithmic framework, for minimizing a nonconvex Cubic multivariate polynomial with Quartic Regularisation, by minimizing a sequence of local quadratic models that incorporate simple cubic and quartic terms. The role of the cubic term is to crudely approximate local tensor information, while the quartic one controls model regularization and progress. We provide necessary and sufficient optimality conditions that fully characterise the global minimizers of these cubic-quartic models. We then turn these conditions into secular equations that can be solved using nonlinear eigenvalue techniques. We show, using our optimality characterisations, that a CQR algorithmic variant has the optimal-order evaluation complexity of $\mathcal{O}(\epsilon^{-3/2})$ when applied to minimizing our quartically-regularised cubic subproblem, which can be further improved in special cases. We propose practical CQR variants that use local tensor information to construct the local cubic-quartic models. We test these variants numerically and observe them to be competitive with ARC and other subproblem solvers on typical instances and even superior on ill-conditioned subproblems with special structure.
Partitioned neural network functions are used to approximate the solution of partial differential equations. The problem domain is partitioned into non-overlapping subdomains and the partitioned neural network functions are defined on the given non-overlapping subdomains. Each neural network function then approximates the solution in each subdomain. To obtain the convergent neural network solution, certain continuity conditions on the partitioned neural network functions across the subdomain interface need to be included in the loss function, that is used to train the parameters in the neural network functions. In our work, by introducing suitable interface values, the loss function is reformulated into a sum of localized loss functions and each localized loss function is used to train the corresponding local neural network parameters. In addition, to accelerate the neural network solution convergence, the localized loss function is enriched with an augmented Lagrangian term, where the interface condition and the boundary condition are enforced as constraints on the local solutions by using Lagrange multipliers. The local neural network parameters and Lagrange multipliers are then found by optimizing the localized loss function. To take the advantage of the localized loss function for the parallel computation, an iterative algorithm is also proposed. For the proposed algorithms, their training performance and convergence are numerically studied for various test examples.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.