In nonparameteric Bayesian approaches, Gaussian stochastic processes can serve as priors on real-valued function spaces. Existing literature on the posterior convergence rates under Gaussian process priors shows that it is possible to achieve optimal or near-optimal posterior contraction rates if the smoothness of the Gaussian process matches that of the target function. Among those priors, Gaussian processes with a parametric Mat\'ern covariance function is particularly notable in that its degree of smoothness can be determined by a dedicated smoothness parameter. Ma and Bhadra(2023) recently introduced a new family of covariance functions called the Confluent Hypergeometric (CH) class that simultaneously possess two parameters: one controls the tail index of the polynomially decaying covariance function, and the other parameter controls the degree of mean-squared smoothness analogous to the Mat\'ern class. In this paper, we show that with proper choice of rescaling parameters in the Mat\'ern and CH covariance functions, it is possible to obtain the minimax optimal posterior contraction rate for $\eta$-regular functions. Unlike the previous results for unrescaled cases, the smoothness parameter of the covariance function need not equal $\eta$ for achieving the optimal minimax rate, for either rescaled Mat\'ern or rescaled CH covariances, illustrating a key benefit for rescaling. The theoretical properties of the rescaled Mat\'ern and CH classes are further verified via extensive simulations and an illustration on a geospatial data set is presented.
Efficient parallel computing has become a pivotal element in advancing artificial intelligence. Yet, the deployment of Spiking Neural Networks (SNNs) in this domain is hampered by their inherent sequential computational dependency. This constraint arises from the need for each time step's processing to rely on the preceding step's outcomes, significantly impeding the adaptability of SNN models to massively parallel computing environments. Addressing this challenge, our paper introduces the innovative Parallel Spiking Unit (PSU) and its two derivatives, the Input-aware PSU (IPSU) and Reset-aware PSU (RPSU). These variants skillfully decouple the leaky integration and firing mechanisms in spiking neurons while probabilistically managing the reset process. By preserving the fundamental computational attributes of the spiking neuron model, our approach enables the concurrent computation of all membrane potential instances within the SNN, facilitating parallel spike output generation and substantially enhancing computational efficiency. Comprehensive testing across various datasets, including static and sequential images, Dynamic Vision Sensor (DVS) data, and speech datasets, demonstrates that the PSU and its variants not only significantly boost performance and simulation speed but also augment the energy efficiency of SNNs through enhanced sparsity in neural activity. These advancements underscore the potential of our method in revolutionizing SNN deployment for high-performance parallel computing applications.
We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply to a wide range of stepsize selections and of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.
The growing computing power over the years has enabled simulations to become more complex and accurate. While immensely valuable for scientific discovery and problem-solving, however, high-fidelity simulations come with significant computational demands. As a result, it is common to run a low-fidelity model with a subgrid-scale model to reduce the computational cost, but selecting the appropriate subgrid-scale models and tuning them are challenging. We propose a novel method for learning the subgrid-scale model effects when simulating partial differential equations augmented by neural ordinary differential operators in the context of discontinuous Galerkin (DG) spatial discretization. Our approach learns the missing scales of the low-order DG solver at a continuous level and hence improves the accuracy of the low-order DG approximations as well as accelerates the filtered high-order DG simulations with a certain degree of precision. We demonstrate the performance of our approach through multidimensional Taylor-Green vortex examples at different Reynolds numbers and times, which cover laminar, transitional, and turbulent regimes. The proposed method not only reconstructs the subgrid-scale from the low-order (1st-order) approximation but also speeds up the filtered high-order DG (6th-order) simulation by two orders of magnitude.
Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs notably and scales to high dimensional data. However, the APT method bears the computation of an expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic APT was proposed to solve this by discretizing the normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we propose a nested APT method to estimate the involved nested expectation instead. This facilitates establishing the convergence analysis. Since the nested estimators for the loss function and its gradient are biased, we make use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To further reduce the excessive variance of the unbiased estimators, this paper also develops some truncated MLMC estimators by taking account of the trade-off between the bias and the average cost. Numerical experiments for approximating complex posteriors with multimodal in moderate dimensions are provided.
Large language models (LLMs) have exhibited an array of reasoning capabilities but face challenges like error propagation and hallucination, particularly in specialised areas like finance, where data is heterogeneous, and precision is paramount. We explore the potential of language model augmentation with external tools to mitigate these limitations and offload certain reasoning steps to external tools that are more suited for the task, instead of solely depending on the LLM's inherent abilities. More concretely, using financial domain question-answering datasets, we apply supervised fine-tuning on a LLaMA-2 13B Chat model to act both as a 'task router' and 'task solver'. The 'task router' dynamically directs a question to either be answered internally by the LLM or externally via the right tool from the tool set. Our tool-equipped SFT model, Raven, demonstrates an improvement of 35.2% and 5.06% over the base model and SFT-only baselines, respectively, and is highly competitive with strong GPT-3.5 results. To the best of our knowledge, our work is the first that investigates tool augmentation of language models for the finance domain.
Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.
Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations.
Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at //github.com/hyintell/awesome-refreshing-llms
Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.
Aspect level sentiment classification aims to identify the sentiment expressed towards an aspect given a context sentence. Previous neural network based methods largely ignore the syntax structure in one sentence. In this paper, we propose a novel target-dependent graph attention network (TD-GAT) for aspect level sentiment classification, which explicitly utilizes the dependency relationship among words. Using the dependency graph, it propagates sentiment features directly from the syntactic context of an aspect target. In our experiments, we show our method outperforms multiple baselines with GloVe embeddings. We also demonstrate that using BERT representations further substantially boosts the performance.