In this research, we propose a novel technique for visualizing nonstationarity in geostatistics, particularly when confronted with a single realization of data at irregularly spaced locations. Our method hinges on formulating a statistic that tracks a stable microergodic parameter of the exponential covariance function, allowing us to address the intricate challenges of nonstationary processes that lack repeated measurements. We implement the fused lasso technique to elucidate nonstationary patterns at various resolutions. For prediction purposes, we segment the spatial domain into stationary sub-regions via Voronoi tessellations. Additionally, we devise a robust test for stationarity based on contrasting the sample means of our proposed statistics between two selected Voronoi subregions. The effectiveness of our method is demonstrated through simulation studies and its application to a precipitation dataset in Colorado.
Mediation analysis is an important statistical tool in many research fields. Its aim is to investigate the mechanism along the causal pathway between an exposure and an outcome. The joint significance test is widely utilized as a prominent statistical approach for examining mediation effects in practical applications. Nevertheless, the limitation of this mediation testing method stems from its conservative Type I error, which reduces its statistical power and imposes certain constraints on its popularity and utility. The proposed solution to address this gap is the adaptive joint significance test for one mediator, a novel data-adaptive test for mediation effect that exhibits significant advancements compared to traditional joint significance test. The proposed method is designed to be user-friendly, eliminating the need for complicated procedures. We have derived explicit expressions for size and power, ensuring the theoretical validity of our approach. Furthermore, we extend the proposed adaptive joint significance tests for small-scale mediation hypotheses with family-wise error rate (FWER) control. Additionally, a novel adaptive Sobel-type approach is proposed for the estimation of confidence intervals for the mediation effects, demonstrating significant advancements over conventional Sobel's confidence intervals in terms of achieving desirable coverage probabilities. Our mediation testing and confidence intervals procedure is evaluated through comprehensive simulations, and compared with numerous existing approaches. Finally, we illustrate the usefulness of our method by analysing three real-world datasets with continuous, binary and time-to-event outcomes, respectively.
Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such systems have the potential to address a wide range of vision tasks simultaneously, without being limited to specific problems or data domains. This universality is crucial for practical, real-world computer vision applications. In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system. This problem presents several intricate challenges, including cross-dataset category label duplication, label conflicts, and the necessity to handle hierarchical taxonomies. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training utilizing a pre-trained large vision model. Our method has demonstrated remarkable performance, securing a prestigious second-place ranking in the object detection track of the Robust Vision Challenge 2022 (RVC 2022) on a million-scale cross-dataset object detection benchmark. We believe that our comprehensive study will serve as a valuable reference and offer an alternative approach for addressing similar challenges within the computer vision community. The source code for our work is openly available at //github.com/linfeng93/Large-UniDet.
AI in Math deals with mathematics in a constructive manner so that reasoning becomes automated, less laborious, and less error-prone. For algorithms, the question becomes how to automate analyses for specific problems. For the first time, this work provides an automatic method for approximation analysis on a well-studied problem in theoretical computer science: computing approximate Nash equilibria in two-player games. We observe that such algorithms can be reformulated into a search-and-mix paradigm, which involves a search phase followed by a mixing phase. By doing so, we are able to fully automate the procedure of designing and analyzing the mixing phase. For example, we illustrate how to perform our method with a program to analyze the approximation bounds of all the algorithms in the literature. Same approximation bounds are computed without any hand-written proof. Our automatic method heavily relies on the LP-relaxation structure in approximate Nash equilibria. Since many approximation algorithms and online algorithms adopt the LP relaxation, our approach may be extended to automate the analysis of other algorithms.
In this article, we explore two effective means to communicate the concept of walkability - 1) visualization, and 2) descriptive statistics. We introduce the concept of walkability as measuring the quality of an urban space based on the distance needed to walk from that space to a range of different social, environmental, and economic amenities. We use Dublin city as a worked example and explore quantification and visualization of walkability of various areas of the city. We utilize the Google Map API and Tableau to visualize the less walkable areas across Dublin city and using WLS regression, we assess the effects of unwalkability on house prices in Dublin, thus quantifying the importance of walkable areas from an economic perspective.
In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents.
In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed codebase and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at \url{//github.com/OpenGVLab/3dodi}
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.