The traditional method of computing singular value decomposition (SVD) of a data matrix is based on a least squares principle, thus, is very sensitive to the presence of outliers. Hence the resulting inferences across different applications using the classical SVD are extremely degraded in the presence of data contamination (e.g., video surveillance background modelling tasks, etc.). A robust singular value decomposition method using the minimum density power divergence estimator (rSVDdpd) has been found to provide a satisfactory solution to this problem and works well in applications. For example, it provides a neat solution to the background modelling problem of video surveillance data in the presence of camera tampering. In this paper, we investigate the theoretical properties of the rSVDdpd estimator such as convergence, equivariance and consistency under reasonable assumptions. Since the dimension of the parameters, i.e., the number of singular values and the dimension of singular vectors can grow linearly with the size of the data, the usual M-estimation theory has to be suitably modified with concentration bounds to establish the asymptotic properties. We believe that we have been able to accomplish this satisfactorily in the present work. We also demonstrate the efficiency of rSVDdpd through extensive simulations.
Remote sensing object detection (RSOD), one of the most fundamental and challenging tasks in the remote sensing field, has received longstanding attention. In recent years, deep learning techniques have demonstrated robust feature representation capabilities and led to a big leap in the development of RSOD techniques. In this era of rapid technical evolution, this review aims to present a comprehensive review of the recent achievements in deep learning based RSOD methods. More than 300 papers are covered in this review. We identify five main challenges in RSOD, including multi-scale object detection, rotated object detection, weak object detection, tiny object detection, and object detection with limited supervision, and systematically review the corresponding methods developed in a hierarchical division manner. We also review the widely used benchmark datasets and evaluation metrics within the field of RSOD, as well as the application scenarios for RSOD. Future research directions are provided for further promoting the research in RSOD.
The focus of this study is to investigate the impact of different initialization strategies for the weight matrix of Successor Features (SF) on learning efficiency and convergence in Reinforcement Learning (RL) agents. Using a grid-world paradigm, we compare the performance of RL agents, whose SF weight matrix is initialized with either an identity matrix, zero matrix, or a randomly generated matrix (using Xavier, He, or uniform distribution method). Our analysis revolves around evaluating metrics such as value error, step length, PCA of Successor Representation (SR) place field, and the distance of SR matrices between different agents. The results demonstrate that RL agents initialized with random matrices reach the optimal SR place field faster and showcase a quicker reduction in value error, pointing to more efficient learning. Furthermore, these random agents also exhibit a faster decrease in step length across larger grid-world environments. The study provides insights into the neurobiological interpretations of these results, their implications for understanding intelligence, and potential future research directions. These findings could have profound implications for the field of artificial intelligence, particularly in the design of learning algorithms.
The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a fine-grained view of the landscape of computational hardness for this problem. We establish several new hardness lower bounds for both exact and approximate model inversion. In exact inversion, the goal is to determine whether a target is contained within the range of a given generative model. Under the strong exponential time hypothesis (SETH), we demonstrate that the computational complexity of exact inversion is lower bounded by $\Omega(2^n)$ via a reduction from $k$-SAT; this is a strengthening of known results. For the more practically relevant problem of approximate inversion, the goal is to determine whether a point in the model range is close to a given target with respect to the $\ell_p$-norm. When $p$ is a positive odd integer, under SETH, we provide an $\Omega(2^n)$ complexity lower bound via a reduction from the closest vectors problem (CVP). Finally, when $p$ is even, under the exponential time hypothesis (ETH), we provide a lower bound of $2^{\Omega (n)}$ via a reduction from Half-Clique and Vertex-Cover.
A significant challenge in the field of quantum machine learning (QML) is to establish applications of quantum computation to accelerate common tasks in machine learning such as those for neural networks. Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was limited since its numerical implementation by conventional classical computation requires an exponential runtime $\exp(O(D))$ as data dimension $D$ increases. To address this problem, we develop a quantum ridgelet transform (QRT), which implements the ridgelet transform of a quantum state within a linear runtime $O(D)$ of quantum computation. As an application, we also show that one can use QRT as a fundamental subroutine for QML to efficiently find a sparse trainable subnetwork of large shallow wide neural networks without conducting large-scale optimization of the original network. This application discovers an efficient way in this regime to demonstrate the lottery ticket hypothesis on finding such a sparse trainable neural network. These results open an avenue of QML for accelerating learning tasks with commonly used classical neural networks.
We propose a class of nonstationary processes to characterize space- and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the pollutant spread, but information regarding prevailing wind directions may be missing or unreliable. We propose to map a discrete set of wind directions to edges in a sparse directed acyclic graph (DAG), accounting for uncertainty in directional correlation patterns across a domain. The resulting Bag of DAGs processes (BAGs) lead to interpretable nonstationarity and scalability for large data due to sparsity of DAGs in the bag. We outline Bayesian hierarchical models using BAGs and illustrate inferential and performance gains of our methods compared to other state-of-the-art alternatives. We analyze fine particulate matter using high-resolution data from low-cost air quality sensors in California during the 2020 wildfire season. An R package is available on GitHub.
In this letter, the average mutual information (AMI) of generalized quadrature spatial modulation (GQSM) is first derived for continuous-input continuous-output channels. Our mathematical analysis shows that the calculation error induced by Monte Carlo integration increases exponentially with the signal-to-noise ratio. This nature of GQSM is resolved by deriving a closed-form expression. The derived AMI is compared with other related SM schemes and evaluated for different antenna activation patterns. Our results show that an equiprobable antenna selection method slightly decreases AMI of symbols, while the method significantly improves AMI in total.
Linear combination is a potent data fusion method in information retrieval tasks, thanks to its ability to adjust weights for diverse scenarios. However, achieving optimal weight training has traditionally required manual relevance judgments on a large percentage of documents, a labor-intensive and expensive process. In this study, we investigate the feasibility of obtaining near-optimal weights using a mere 20\%-50\% of relevant documents. Through experiments on four TREC datasets, we find that weights trained with multiple linear regression using this reduced set closely rival those obtained with TREC's official "qrels." Our findings unlock the potential for more efficient and affordable data fusion, empowering researchers and practitioners to reap its full benefits with significantly less effort.
In the realm of Large Language Models, the balance between instruction data quality and quantity has become a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from vast open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to identify discrepancies between a model's expected responses and its autonomous generation prowess. Through the adept application of IFD, cherry samples are pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on renowned datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of conventional data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the optimization of LLMs, promising both efficiency and resource-conscious advancements.
Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence. It has great fundamental importance and strong industrial needs. Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources. Though recognition accuracy is usually the first concern for new progresses, efficiency is actually rather important and sometimes critical for both academic research and industrial applications. Moreover, insightful views on the opportunities and challenges of efficiency are also highly required for the entire community. While general surveys on the efficiency issue of DNNs have been done from various perspectives, as far as we are aware, scarcely any of them focused on visual recognition systematically, and thus it is unclear which progresses are applicable to it and what else should be concerned. In this paper, we present the review of the recent advances with our suggestions on the new possible directions towards improving the efficiency of DNN-related visual recognition approaches. We investigate not only from the model but also the data point of view (which is not the case in existing surveys), and focus on three most studied data types (images, videos and points). This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems.
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.