Suppose it is of interest to characterize effect heterogeneity of an intervention across levels of a baseline covariate using only pre- and post- intervention outcome measurements from those who received the intervention, i.e. with no control group. For example, a researcher concerned with equity may wish to ascertain whether a minority group benefited less from an intervention than the majority group. We introduce the `subgroup parallel trends' assumption that the counterfactual untreated outcomes in each subgroup of interest follow parallel trends pre- and post- intervention. Under the subgroup parallel trends assumption, it is straightforward to show that a simple `subgroup difference in differences' (SDiD) expression (i.e., the average pre/post outcome difference in one subgroup subtracted by the average pre/post outcome difference in the other subgroup) identifies the difference between the intervention's effects in the two subgroups. This difference in effects across subgroups is identified even though the conditional effects in each subgroup are not. The subgroup parallel trends assumption is not stronger than the standard parallel trends assumption across treatment groups when a control group is available, and there are circumstances where it is more plausible. Thus, when effect modification by a baseline covariate is of interest, researchers might consider SDiD whether or not a control group is available.
We show that the essential properties of entropy (monotonicity, additivity and subadditivity) are consequences of entropy being a monoidal natural transformation from the under category functor $-/\mathsf{LProb}_{\rho}$ (where $\mathsf{LProb}_{\rho}$ is category of $\ell_{\rho}$ discrete probability spaces) to $\Delta_{\mathbb{R}}$. Moreover, the Shannon entropy can be characterized as the universal monoidal natural transformation from $-/\mathsf{LProb}_{\rho}$ to the category of "strongly regularly ordered" vector spaces (a reflective subcategory of the lax-slice 2-category over $\mathsf{MonCat}_{\ell}$ in the 2-category of monoidal categories), providing a succinct characterization of Shannon entropy as a reflection arrow. We can likewise define entropy for every category with a monoidal structure on its under categories (e.g. the category of finite abelian groups, the category of finite inhabited sets, the category of finite dimensional vector spaces, and the augmented simplex category) via the reflection arrow to the reflective subcategory of strongly regularly ordered vector spaces. This implies that all these entropies over different categories are components of a single natural transformation (the unit of the idempotent monad), allowing us to connect these entropies in a natural manner. We also provide a universal characterization of the conditional Shannon entropy based on the chain rule which, unlike the characterization of information loss by Baez, Fritz and Leinster, does not require any continuity assumption.
Novel view synthesis and 3D modeling using implicit neural field representation are shown to be very effective for calibrated multi-view cameras. Such representations are known to benefit from additional geometric and semantic supervision. Most existing methods that exploit additional supervision require dense pixel-wise labels or localized scene priors. These methods cannot benefit from high-level vague scene priors provided in terms of scenes' descriptions. In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. More precisely, we assume that only the knowledge of the indoor scene (under investigation) being Manhattan is known -- with no additional information whatsoever -- with an unknown Manhattan coordinate frame. Such high-level prior is used to self-supervise the surface normals derived explicitly in the implicit neural fields. Our modeling allows us to cluster the derived normals and exploit their orthogonality constraints for self-supervision. Our exhaustive experiments on datasets of diverse indoor scenes demonstrate the significant benefit of the proposed method over the established baselines. The source code will be available at //github.com/nikola3794/normal-clustering-nerf.
Data augmentation is widely used in text classification, especially in the low-resource regime where a few examples for each class are available during training. Despite the success, generating data augmentations as hard positive examples that may increase their effectiveness is under-explored. This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations to train the low-resource text classification model efficiently. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding, making the augmented inputs hard to be recognized as positive by the classification model. We adversarially learn the dilution weights through a constrained min-max optimization process with the guidance of the labels. Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods. The additional analysis demonstrates that the data augmentations generated by AWD are interpretable and can flexibly extend to new examples without further training.
Even though the analysis of unsteady 2D flow fields is challenging, fluid mechanics experts generally have an intuition on where in the simulation domain specific features are expected. Using this intuition, showing similar regions enables the user to discover flow patterns within the simulation data. When focusing on similarity, a solid mathematical framework for a specific flow pattern is not required. We propose a technique that visualizes similar and dissimilar regions with respect to a region selected by the user. Using infinitesimal strain theory, we capture the strain and rotation progression and therefore the dynamics of fluid parcels along pathlines, which we encode as distributions. We then apply the Jensen-Shannon divergence to compute the (dis)similarity between pathline dynamics originating in a user-defined flow region and the pathline dynamics of the flow field. We validate our method by applying it to two simulation datasets of two-dimensional unsteady flows. Our results show that our approach is suitable for analyzing the similarity of time-dependent flow fields.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.
Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.