Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
Large Language Models (LLMs) have made extraordinary progress in the field of Artificial Intelligence and have demonstrated remarkable capabilities across a large variety of tasks and domains. However, as we venture closer to creating Artificial General Intelligence (AGI) systems, we recognize the need to supplement LLMs with long-term memory to overcome the context window limitation and more importantly, to create a foundation for sustained reasoning, cumulative learning and long-term user interaction. In this paper we propose RecallM, a novel architecture for providing LLMs with an adaptable and updatable long-term memory mechanism. Unlike previous methods, the RecallM architecture is particularly effective at belief updating and maintaining a temporal understanding of the knowledge provided to it. We demonstrate through various experiments the effectiveness of this architecture. Furthermore, through our own temporal understanding and belief updating experiments, we show that RecallM is four times more effective than using a vector database for updating knowledge previously stored in long-term memory. We also demonstrate that RecallM shows competitive performance on general question-answering and in-context learning tasks.
Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{//github.com/Wang-ML-Lab/variational-imbalanced-regression}.
Performance analysis is carried out in a near-field multiple-input multiple-output (MIMO) system for both discrete and continuous aperture antennas. The effective degrees of freedom (EDoF) is first derived. It is shown that near-field MIMO systems have a higher EDoF than free-space far-field ones. Additionally, the near-field EDoF further depends on the communication distance. Based on the derived EDoF, closed-form expressions of channel capacity with a fixed distance are obtained. As a further advance, with randomly deployed receivers, ergodic capacity is derived. Simulation results reveal that near-field MIMO has an enhanced multiplexing gain even under line-of-sight transmissions. In addition, the performance of discrete MIMO converges to that of continuous aperture MIMO.
Business optimisation is the process of finding and implementing efficient and cost-effective means of operation to bring a competitive advantage for businesses. Synthesizing problem formulations is an integral part of business optimisation which is centred around human expertise, thus with a high potential of becoming a bottleneck. With the recent advancements in Large Language Models (LLMs), human expertise needed in problem formulation can potentially be minimized using Artificial Intelligence (AI). However, developing a LLM for problem formulation is challenging, due to training data requirements, token limitations, and the lack of appropriate performance metrics in LLMs. To minimize the requirement of large training data, considerable attention has recently been directed towards fine-tuning pre-trained LLMs for downstream tasks, rather than training a LLM from scratch for a specific task. In this paper, we adopt this approach and propose an AI-Copilot for business optimisation by fine-tuning a pre-trained LLM for problem formulation. To address token limitations, we introduce modularization and prompt engineering techniques to synthesize complex problem formulations as modules that fit into the token limits of LLMs. In addition, we design performance evaluation metrics that are more suitable for assessing the accuracy and quality of problem formulations compared to existing evaluation metrics. Experiment results demonstrate that our AI-Copilot can synthesize complex and large problem formulations for a typical business optimisation problem in production scheduling.
Prediction rule ensembles (PREs) are a relatively new statistical learning method, which aim to strike a balance between predictive accuracy and interpretability. Starting from a decision tree ensemble, like a boosted tree ensemble or a random forest, PREs retain a small subset of tree nodes in the final predictive model. These nodes can be written as simple rules of the form if [condition] then [prediction]. As a result, PREs are often much less complex than full decision tree ensembles, while they have been found to provide similar predictive accuracy in many situations. The current paper introduces the methodology and shows how PREs can be fitted using the R package pre through several real-data examples from psychological research. The examples also illustrate a number of features of package \textbf{pre} that may be particularly useful for applications in psychology: support for categorical, multivariate and count responses, application of (non-)negativity constraints, inclusion of confirmatory rules and standardized variable importance measures.
Overload situations, in the presence of resource limitations, in complex event processing (CEP) systems are typically handled using load shedding to maintain a given latency bound. However, load shedding might negatively impact the quality of results (QoR). To minimize the shedding impact on QoR, CEP researchers propose shedding approaches that drop events/internal state with the lowest importances/utilities. In both black-box and white-box shedding approaches, different features are used to predict these utilities. In this work, we propose a novel black-box shedding approach that uses a new set of features to drop events from the input event stream to maintain a given latency bound. Our approach uses a probabilistic model to predict these event utilities. Moreover, our approach uses Zobrist hashing and well-known machine learning models, e.g., decision trees and random forests, to handle the predicted event utilities. Through extensive evaluations on several synthetic and two real-world datasets and a representative set of CEP queries, we show that, in the majority of cases, our load shedding approach outperforms state-of-the-art black-box load shedding approaches, w.r.t. QoR.
We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.
In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.
Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5.80, 4.60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .