This paper establishes a mathematical foundation for the Adam optimizer, elucidating its connection to natural gradient descent through Riemannian and information geometry. We rigorously analyze the diagonal empirical Fisher information matrix (FIM) in Adam, clarifying all detailed approximations and advocating for the use of log probability functions as loss, which should be based on discrete distributions, due to the limitations of empirical FIM. Our analysis uncovers flaws in the original Adam algorithm, leading to proposed corrections such as enhanced momentum calculations, adjusted bias corrections, adaptive epsilon, and gradient clipping. We refine the weight decay term based on our theoretical framework. Our modified algorithm, Fisher Adam (FAdam), demonstrates superior performance across diverse domains including LLM, ASR, and VQ-VAE, achieving state-of-the-art results in ASR.
The Lippmann--Schwinger--Lanczos (LSL) algorithm has recently been shown to provide an efficient tool for imaging and direct inversion of synthetic aperture radar data in multi-scattering environments [17], where the data set is limited to the monostatic, a.k.a. single input/single output (SISO) measurements. The approach is based on constructing data-driven estimates of internal fields via a reduced-order model (ROM) framework and then plugging them into the Lippmann-Schwinger integral equation. However, the approximations of the internal solutions may have more error due to missing the off diagonal elements of the multiple input/multiple output (MIMO) matrix valued transfer function. This, in turn, may result in multiple echoes in the image. Here we present a ROM-based data completion algorithm to mitigate this problem. First, we apply the LSL algorithm to the SISO data as in [17] to obtain approximate reconstructions as well as the estimate of internal field. Next, we use these estimates to calculate a forward Lippmann-Schwinger integral to populate the missing off-diagonal data (the lifting step). Finally, to update the reconstructions, we solve the Lippmann-Schwinger equation using the original SISO data, where the internal fields are constructed from the lifted MIMO data. The steps of obtaining the approximate reconstructions and internal fields and populating the missing MIMO data entries can be repeated for complex models to improve the images even further. Efficiency of the proposed approach is demonstrated on 2D and 2.5D numerical examples, where we see reconstructions are improved substantially.
Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R\'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.
The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mapping problem, which leads to the failure of generating referential and meaningful questions from an image. ii) They fail to model complex implicit relations among the visual objects in an image and also overlook potential interactions between the side information and image. To address these limitations, we first propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference. Concretely, we aim to ask the right visual questions with Double Hints - textual answers and visual regions of interests, which could effectively mitigate the existing one-to-many mapping issue. Particularly, we develop a simple methodology to self-learn the visual hints without introducing any additional human annotations. Furthermore, to capture these sophisticated relationships, we propose a new double-hints guided Graph-to-Sequence learning framework, which first models them as a dynamic graph and learns the implicit topology end-to-end, and then utilizes a graph-to-sequence model to generate the questions with double hints. Experimental results demonstrate the priority of our proposed method.
The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms.
Hyperspectral imaging, a rapidly evolving field, has witnessed the ascendancy of deep learning techniques, supplanting classical feature extraction and classification methods in various applications. However, many researchers employ arbitrary architectures for hyperspectral image processing, often without rigorous analysis of the interplay between spectral and spatial information. This oversight neglects the implications of combining these two modalities on model performance. In this paper, we evaluate the performance of diverse deep learning architectures for hyperspectral image segmentation. Our analysis disentangles the impact of different architectures, spanning various spectral and spatial granularities. Specifically, we investigate the effects of spectral resolution (capturing spectral information) and spatial texture (conveying spatial details) on segmentation outcomes. Additionally, we explore the transferability of knowledge from large pre-trained image foundation models, originally designed for RGB images, to the hyperspectral domain. Results show that incorporating spatial information alongside spectral data leads to improved segmentation results, and that it is essential to further work on novel architectures comprising spectral and spatial information and on the adaption of RGB foundation models into the hyperspectral domain. Furthermore, we contribute to the field by cleaning and publicly releasing the Tecnalia WEEE Hyperspectral dataset. This dataset contains different non-ferrous fractions of Waste Electrical and Electronic Equipment (WEEE), including Copper, Brass, Aluminum, Stainless Steel, and White Copper, spanning the range of 400 to 1000 nm. We expect these conclusions can guide novel researchers in the field of hyperspectral imaging.
Implicit visual knowledge in a large latent diffusion model (LLDM) pre-trained on natural images is rich and hypothetically universal to natural and medical images. To test this hypothesis from a practical perspective, we propose a novel framework for undersampled MRI Reconstruction by Prompting a large latent Diffusion model (MRPD). While the existing methods trained on MRI datasets are typically of limited generalizability toward diverse data acquisition scenarios, MRPD supports unsupervised and universally adaptive MRI reconstruction. For unsupervised reconstruction, MRSampler guides LLDM with a random-phase-modulated hard-to-soft control. With any single- or multiple-source MRI dataset, MRPD's performance is boosted universally by a lightweight MRAdapter that only finetunes the LLDM's autoencoder. Experiments on FastMRI and IXI show that MRPD is the only model that supports both MRI database-free and database-available scenarios and attains the best generalizability towards out-of-domain (OOD) samplings, contrasts, and organs among compared unsupervised, supervised, and MRI diffusion methods. To our knowledge, MRPD is the first method that empirically shows the universal prowess of an LLDM pre-trained on vast natural images for MRI. Our official implementation is at //github.com/Z7Gao/MRPD.
Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at //github.com/ChengpengLi1003/DotaMath.
We propose an extremely versatile approach to address a large family of matrix nearness problems, possibly with additional linear constraints. Our method is based on splitting a matrix nearness problem into two nested optimization problems, of which the inner one can be solved either exactly or cheaply, while the outer one can be recast as an unconstrained optimization task over a smooth real Riemannian manifold. We observe that this paradigm applies to many matrix nearness problems of practical interest appearing in the literature, thus revealing that they are equivalent in this sense to a Riemannian optimization problem. We also show that the objective function to be minimized on the Riemannian manifold can be discontinuous, thus requiring regularization techniques, and we give conditions for this to happen. Finally, we demonstrate the practical applicability of our method by implementing it for a number of matrix nearness problems that are relevant for applications and are currently considered very demanding in practice. Extensive numerical experiments demonstrate that our method often greatly outperforms its predecessors, including algorithms specifically designed for those particular problems.
This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.
Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.