The Landau-Lifshitz-Gilbert (LLG) equation is a widely used model for fast magnetization dynamics in ferromagnetic materials. Recently, the inertial LLG equation, which contains an inertial term, has been proposed to capture the ultra-fast magnetization dynamics at the sub-picosecond timescale. Mathematically, this generalized model contains the first temporal derivative and a newly introduced second temporal derivative of magnetization. Consequently, it produces extra difficulties in numerical analysis due to the mixed hyperbolic-parabolic type of this equation with degeneracy. In this work, we propose an implicit finite difference scheme based on the central difference in both time and space. A fixed point iteration method is applied to solve the implicit nonlinear system. With the help of a second order accurate constructed solution, we provide a convergence analysis in $H^1$ for this numerical scheme, in the $\ell^\infty (0, T; H_h^1)$ norm. It is shown that the proposed method is second order accurate in both time and space, with unconditional stability and a natural preservation of the magnetization length. In the hyperbolic regime, significant damping wave behaviors of magnetization at a shorter timescale are observed through numerical simulations.
Diffusion Models (DMs) have demonstrated state-of-the-art performance in content generation without requiring adversarial training. These models are trained using a two-step process. First, a forward - diffusion - process gradually adds noise to a datum (usually an image). Then, a backward - reverse diffusion - process gradually removes the noise to turn it into a sample of the target distribution being modelled. DMs are inspired by non-equilibrium thermodynamics and have inherent high computational complexity. Due to the frequent function evaluations and gradient calculations in high-dimensional spaces, these models incur considerable computational overhead during both training and inference stages. This can not only preclude the democratization of diffusion-based modelling, but also hinder the adaption of diffusion models in real-life applications. Not to mention, the efficiency of computational models is fast becoming a significant concern due to excessive energy consumption and environmental scares. These factors have led to multiple contributions in the literature that focus on devising computationally efficient DMs. In this review, we present the most recent advances in diffusion models for vision, specifically focusing on the important design aspects that affect the computational efficiency of DMs. In particular, we emphasize the recently proposed design choices that have led to more efficient DMs. Unlike the other recent reviews, which discuss diffusion models from a broad perspective, this survey is aimed at pushing this research direction forward by highlighting the design strategies in the literature that are resulting in practicable models for the broader research community. We also provide a future outlook of diffusion models in vision from their computational efficiency viewpoint.
Standard discontinuous Galerkin methods, based on piecewise polynomials of degree $ \qq=0,1$, are considered for temporal semi-discretization for second order hyperbolic equations. The main goal of this paper is to present a simple and straightforward a priori error analysis of optimal order with minimal regularity requirement on the solution. Uniform norm in time error estimates are also proved. To this end, energy identities and stability estimates of bthe discrete problem are proved for a slightly more general problem. These are used to prove optimal order a priori error estimates with minimal regularity requirement on the solution. The combination with the classic continuous Galerkin finite element discretization in space variable is used, to formulate a full-discrete scheme. The a priori error analysis is presented. Numerical experiments are performed to verify the theoretical results.
Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where it can be seen as a `bona-fide' discretisation of an underlying gradient flow. Yet, many ML setups involving overparametrised models do not fall into this problem class, which has motivated research beyond the so-called ``Edge of Stability'' (EoS), where the step-size crosses the admissibility threshold inversely proportional to the Lipschitz constant above. Perhaps surprisingly, GD has been empirically observed to still converge regardless of local instability and oscillatory behavior. The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation' implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems. Specifically, we characterize a local condition involving third-order derivatives that stabilizes oscillations of GD above the EoS, and leverage such property in a teacher-student setting, under population loss. Finally, focusing on Matrix Factorization, we establish a non-asymptotic `Local Implicit Bias' of GD above the EoS, whereby quasi-symmetric initializations converge to symmetric solutions -- where sharpness is minimum amongst all minimisers.
This paper considers computing partial eigenpairs of differential eigenvalue problems (DEPs) such that eigenvalues are in a certain region on the complex plane. Recently, based on a "solve-then-discretize" paradigm, an operator analogue of the FEAST method has been proposed for DEPs without discretization of the coefficient operators. Compared to conventional "discretize-then-solve" approaches that discretize the operators and solve the resulting matrix problem, the operator analogue of FEAST exhibits much higher accuracy; however, it involves solving a large number of ordinary differential equations (ODEs). In this paper, to reduce the computational costs, we propose operation analogues of Sakurai-Sugiura-type complex moment-based eigensolvers for DEPs using higher-order complex moments and analyze the error bound of the proposed methods. We show that the number of ODEs to be solved can be reduced by a factor of the degree of complex moments without degrading accuracy, which is verified by numerical results. Numerical results demonstrate that the proposed methods are over five times faster compared with the operator analogue of FEAST for several DEPs while maintaining almost the same high accuracy. This study is expected to promote the "solve-then-discretize" paradigm for solving DEPs and contribute to faster and more accurate solutions in real-world applications.
In this work, we theoretically and numerically discuss the time fractional subdiffusion-normal transport equation, which depicts a crossover from sub-diffusion (as $t\rightarrow 0$) to normal diffusion (as $t\rightarrow \infty$). Firstly, the well-posedness and regularities of the model are studied by using the bivariate Mittag-Leffler function. Theoretical results show that after introducing the first-order derivative operator, the regularity of the solution can be improved in substance. Then, a numerical scheme with high-precision is developed no matter the initial value is smooth or non-smooth. More specifically, we use the contour integral method (CIM) with parameterized hyperbolic contour to approximate the temporal local and non-local operators, and employ the standard Galerkin finite element method for spacial discretization. Rigorous error estimates show that the proposed numerical scheme has spectral accuracy in time and optimal convergence order in space. Besides, we further improve the algorithm and reduce the computational cost by using the barycentric Lagrange interpolation. Finally, the obtained theoretical results as well as the acceleration algorithm are verified by several 1-D and 2-D numerical experiments, which also show that the numerical scheme developed in this paper is effective and robust.
We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (\cite{chizat2018global, mei2018mean}) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated unbounded nonlinearity of the training dynamics. This work establishes the first linear convergence result for vanilla two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field regime. Our result relies on a novel time-depdendent estimate of the logarithmic Sobolev constants for a family of measures determined by the evolving distribution of hidden neurons.
Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can successfully reconstruct single sentences with high fidelity for large batch sizes and even multiple sentences if applied iteratively. We evaluate three defense methods: gradient pruning, DPSGD, and a simple approach to freeze word embeddings that we propose. We show that both gradient pruning and DPSGD lead to a significant drop in utility. However, if we fine-tune a public pre-trained LM on private text without updating word embeddings, it can effectively defend the attack with minimal data utility loss. Together, we hope that our results can encourage the community to rethink the privacy concerns of LM training and its standard practices in the future.
This paper presents Holbert: a work-in-progress pedagogical proof assistant and online textbook platform, aimed at the educational use-case, specifically for the teaching of programming language theory. Holbert allows proof exercises and rule definitions to be embedded directly in an online textbook, where proofs and rules can be manipulated using a graphical interface. We give an overview of the logical foundations of Holbert, examples of its use, and give an update as to its current implementation status.
Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.
Few sample learning (FSL) is significant and challenging in the field of machine learning. The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence since humans can readily establish their cognition to novelty from just a single or a handful of examples whereas machine learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning technologies, little surveys or reviews for FSL are available until now. In this context, we extensively review 200+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history as well as the current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review the latest advances on these topics. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.