Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.
High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as $n$ and/or $p$ increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.
Building robust, interpretable, and secure AI system requires quantifying and representing uncertainty under a probabilistic perspective to mimic human cognitive abilities. However, probabilistic computation presents significant challenges for most conventional artificial neural network, as they are essentially implemented in a deterministic manner. In this paper, we develop an efficient probabilistic computation framework by truncating the probabilistic representation of neural activation up to its mean and covariance and construct a moment neural network that encapsulates the nonlinear coupling between the mean and covariance of the underlying stochastic network. We reveal that when only the mean but not the covariance is supervised during gradient-based learning, the unsupervised covariance spontaneously emerges from its nonlinear coupling with the mean and faithfully captures the uncertainty associated with model predictions. Our findings highlight the inherent simplicity of probabilistic computation by seamlessly incorporating uncertainty into model prediction, paving the way for integrating it into large-scale AI systems.
In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatment effects. In this study, we propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation. Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores. Through numerical experiments on both artificial and real-world data, we confirm that our method leads to better estimation results than individual analyses. While dimensionality reduction loses some information in the private data and causes performance degradation, we observe that sharing intermediate representations with many parties to resolve the lack of subjects and covariates sufficiently improves performance to overcome the degradation caused by dimensionality reduction. Although external validity is not necessarily guaranteed, our results suggest that DC-QE is a promising method. With the widespread use of our method, intermediate representations can be published as open data to help researchers find causalities and accumulate a knowledge base.
This article introduces a sliding window model for defocus deblurring that achieves the best performance to date with extremely low memory usage. Named Swintormer, the method utilizes a diffusion model to generate latent prior features that assist in restoring more detailed images. It also extends the sliding window strategy to specialized Transformer blocks for efficient inference. Additionally, we have further optimized Multiply-Accumulate operations (Macs). Compared to the currently top-performing GRL method, our Swintormer model drastically reduces computational complexity from 140.35 GMACs to 8.02 GMacs, while also improving the Signal-to-Noise Ratio (SNR) for defocus deblurring from 27.04 dB to 27.07 dB. This new method allows for the processing of higher resolution images on devices with limited memory, significantly expanding potential application scenarios. The article concludes with an ablation study that provides an in-depth analysis of the impact of each network module on final performance. The source code and model will be available at the following website: //github.com/bnm6900030/swintormer.
Knowledge-based Visual Question Answering about Named Entities is a challenging task that requires retrieving information from a multimodal Knowledge Base. Named entities have diverse visual representations and are therefore difficult to recognize. We argue that cross-modal retrieval may help bridge the semantic gap between an entity and its depictions, and is foremost complementary with mono-modal retrieval. We provide empirical evidence through experiments with a multimodal dual encoder, namely CLIP, on the recent ViQuAE, InfoSeek, and Encyclopedic-VQA datasets. Additionally, we study three different strategies to fine-tune such a model: mono-modal, cross-modal, or joint training. Our method, which combines mono-and cross-modal retrieval, is competitive with billion-parameter models on the three datasets, while being conceptually simpler and computationally cheaper.
In order to investigate the relationship between Shannon information measure of random variables, scholars such as Yeung utilized information diagrams to explore the structured representation of information measures, establishing correspondences with sets. However, this method has limitations when studying information measures of five or more random variables. In this paper, we consider employing algebraic methods to study the relationship of information measures of random variables. By introducing a semiring generated by random variables, we establish correspondences between sets and elements of the semiring. Utilizing the Grobner-Shirshov basis, we present the structure of the semiring and its standard form. Furthermore, we delve into the structure of the semiring generated under Markov chain conditions (referred to as Markov semiring), obtaining its standard form.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.