The Internet of Things (IoT) data and social media data are two of the fastest-growing data segments. Having high-quality data is crucial for making informed business decisions. The strategic process of leveraging insights from data is known as data-driven decision-making. To achieve this, it is necessary to collect, store, analyze, and protect data in the best ways possible. Data architecture is a complex task that involves describing the flow of data from its source to its destination and creating a blueprint for managing the data to meet business needs for information. In this paper, we utilize the Data Architecture Tool (DAT) to model data for Digital Space Management Service, which was developed as part of the VASARI project. This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.
In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.
The Multimodal Video Search by Examples (MVSE) project investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid, flexible, search to support large archives, which in MVSE is facilitated by representing video attributes by embeddings. This work aims to mitigate any performance loss from this rapid archive search by examining reranking approaches. In particular, zero-shot reranking methods using large language models are investigated as these are applicable to any video archive audio content. Performance is evaluated for topic-based retrieval on a publicly available video archive, the BBC Rewind corpus. Results demonstrate that reranking can achieve improved retrieval ranking without the need for any task-specific training data.
This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots in order to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for multi-robot IPP. The proposed approach is demonstrated to be fast and accurate on real-world data.
Due to severe societal and environmental impacts, wildfire prediction using multi-modal sensing data has become a highly sought-after data-analytical tool by various stakeholders (such as state governments and power utility companies) to achieve a more informed understanding of wildfire activities and plan preventive measures. A desirable algorithm should precisely predict fire risk and magnitude for a location in real time. In this paper, we develop a flexible spatio-temporal wildfire prediction framework using multi-modal time series data. We first predict the wildfire risk (the chance of a wildfire event) in real-time, considering the historical events using discrete mutually exciting point process models. Then we further develop a wildfire magnitude prediction set method based on the flexible distribution-free time-series conformal prediction (CP) approach. Theoretically, we prove a risk model parameter recovery guarantee, as well as coverage and set size guarantees for the CP sets. Through extensive real-data experiments with wildfire data in California, we demonstrate the effectiveness of our methods, as well as their flexibility and scalability in large regions.
Self-supervised learning (SSL) has shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure modes and interpreting their learned representations. In this paper, we study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins. Without the use of class label information, we discover discriminative features that correspond to unique physical attributes in images, present mostly in correctly-classified representations. Using these features, we can compress the representation space by up to 40% without significantly affecting linear classification performance. We then propose Self-Supervised Representation Quality Score (or Q-Score), an unsupervised score that can reliably predict if a given sample is likely to be mis-classified during linear evaluation, achieving AUPRC of 91.45 on ImageNet-100 and 78.78 on ImageNet-1K. Q-Score can also be used as a regularization term on pre-trained encoders to remedy low-quality representations. Fine-tuning with Q-Score regularization can boost the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K compared to their baselines. Finally, using gradient heatmaps and Salient ImageNet masks, we define a metric to quantify the interpretability of each representation. We show that discriminative features are strongly correlated to core attributes and, enhancing these features through Q-score regularization makes SSL representations more interpretable.
The analysis of network data has gained considerable interest in recent years. This also includes the analysis of large, high-dimensional networks with hundreds and thousands of nodes. While exponential random graph models serve as workhorse for network data analyses, their applicability to very large networks is problematic via classical inference such as maximum likelihood or exact Bayesian estimation owing to scaling and instability issues. The latter trace from the fact that classical network statistics consider nodes as exchangeable, i.e., actors in the network are assumed to be homogeneous. This is often questionable. One way to circumvent the restrictive assumption is to include actor-specific random effects, which account for unobservable heterogeneity. However, this increases the number of unknowns considerably, thus making the model highly-parameterized. As a solution even for very large networks, we propose a scalable approach based on variational approximations, which not only leads to numerically stable estimation but is also applicable to high-dimensional directed as well as undirected networks. We furthermore demonstrate that including node-specific covariates can reduce node heterogeneity, which we facilitate through versatile prior formulations and a new measure that we call posterior explained variance. We illustrate our approach in three diverse examples, covering network data from the Italian Parliament, international arms trading, and Facebook; and conduct detailed simulation studies.
Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.
Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.
Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.