Outsourcing a relational database to the cloud offers several benefits, including scalability, availability, and cost-effectiveness. However, there are concerns about the security and confidentiality of the outsourced data. A general approach here would be to encrypt the data with a standardized encryption algorithm and then store the data only encrypted in the cloud. The problem with this approach, however, is that with encryption, important properties of the data such as sorting, format or comparability, which are essential for the functioning of database queries, are lost. One solution to this problem is the use of encryption algorithms, which also preserve these properties in the encrypted data, thus enabling queries to encrypted data. These algorithms range from simple algorithms like Caesar encryption to secure algorithms like mOPE. The report at hand presents a survey on common encryption techniques used for storing data in relation Cloud database services. It presents the applied methods and identifies their characteristics.
Signed networks, characterized by edges labeled as either positive or negative, offer nuanced insights into interaction dynamics beyond the capabilities of unsigned graphs. Central to this is the task of identifying the maximum balanced subgraph, crucial for applications like polarized community detection in social networks and portfolio analysis in finance. Traditional models, however, are limited by an assumption of perfect partitioning, which fails to mirror the complexities of real-world data. Addressing this gap, we introduce an innovative generalized balanced subgraph model that incorporates tolerance for irregularities. Our proposed region-based heuristic algorithm, tailored for this NP-hard problem, strikes a balance between low time complexity and high-quality outcomes. Comparative experiments validate its superior performance against leading solutions, delivering enhanced effectiveness (notably larger subgraph sizes) and efficiency (achieving up to 100x speedup) in both traditional and generalized contexts.
We develop a new methodology for forecasting matrix-valued time series with historical matrix data and auxiliary vector time series data. We focus on time series of matrices with observations distributed on a fixed 2-D spatial grid, i.e., the spatio-temporal data, and an auxiliary time series of non-spatial vectors. The proposed model, Matrix AutoRegression with Auxiliary Covariates (MARAC), contains an autoregressive component for the historical matrix predictors and an additive component that maps the auxiliary vector predictors to a matrix response via tensor-vector product. The autoregressive component adopts a bi-linear transformation framework following Chen et al. (2021), significantly reducing the number of parameters. The auxiliary component posits that the tensor coefficient, which maps non-spatial predictors to a spatial response, contains slices of spatially-smooth matrix coefficients that are discrete evaluations of smooth functions from a Reproducible Kernel Hilbert Space (RKHS). We propose to estimate the model parameters under a penalized maximum likelihood estimation framework coupled with an alternating minimization algorithm. We establish the joint asymptotics of the autoregressive and tensor parameters under fixed and high-dimensional regimes. Extensive simulations and a geophysical application for forecasting the global Total Electron Content (TEC) are conducted to validate the performance of MARAC.
There is growing interest in visual data management systems that support queries with specialized operations ranging from resizing an image to running complex machine learning models. With a plethora of such operations, the basic need to receive query responses in minimal time takes a hit, especially when the client desires to run multiple such operations in a single query. Existing systems provide an ad-hoc approach where different solutions are clubbed together to provide an end-to-end visual data management system. Unlike such solutions, the Visual Data Management System (VDMS) natively executes queries with multiple operations, thus providing an end-to-end solution. However, a fixed subset of native operations and a synchronous threading architecture limit its generality and scalability. In this paper, we develop VDMS-Async that adds the capability to run user-defined operations with VDMS and execute operations within a query on a remote server. VDMS-Async utilizes an event-driven architecture to create an efficient pipeline for executing operations within a query. Our experiments have shown that VDMS-Async reduces the query execution time by 2-3X compared to existing state-of-the-art systems. Further, remote operations coupled with an event-driven architecture enables VDMS-Async to scale query execution time linearly with the addition of every new remote server. We demonstrate a 64X reduction in query execution time when adding 64 remote servers.
In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalizes all misdetections equally irrespective of their criticality. Subsequently, to mitigate the occurrence of critical failure cases, i.e., false negatives, a safety-aware training strategy might be required to enhance the detection performance for critical pedestrians. In this paper, we propose a novel safety-aware loss variation that leverages the estimated per-pedestrian criticality scores during training. We exploit the reachability set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat quantifying the criticality. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-aware loss function mitigates the misdetection of critical pedestrians without sacrificing performance for the general case, i.e., pedestrians outside the safety-critical zone.
The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focusing on the human aspect of these privacy issues: e.g., research on how design paradigms for LLMs affect users' disclosure behaviors, users' mental models and preferences for privacy controls, and the design of tools, systems, and artifacts that empower end-users to reclaim ownership over their personal data. To build usable, efficient, and privacy-friendly systems powered by these models with imperfect privacy properties, our goal is to initiate discussions to outline an agenda for conducting human-centered research on privacy issues in LLM-powered systems. This Special Interest Group (SIG) aims to bring together researchers with backgrounds in usable security and privacy, human-AI collaboration, NLP, or any other related domains to share their perspectives and experiences on this problem, to help our community establish a collective understanding of the challenges, research opportunities, research methods, and strategies to collaborate with researchers outside of HCI.
Despite the plethora of research devoted to analyzing the impact of disability on travel behavior, not enough studies have investigated the varying impact of social and environmental factors on the mode choice of people with disabilities that restrict their ability to use transportation modes efficiently. This research gap can be addressed by investigating the factors influencing the mode choice behavior of people with travel-limiting disabilities, which can inform the development of accessible and sustainable transportation systems. Additionally, such studies can provide insights into the social and economic barriers faced by this population group, which can help policymakers to promote social inclusion and equity. This study utilized a Random Parameters Logit model to identify the individual, trip, and environmental factors that influence mode selection among people with travel-limiting disabilities. Using the 2017 National Household Travel Survey data for New York State, which included information on respondents with travel-limiting disabilities, the analysis focused on a sample of 8,016 people. In addition, climate data from the National Oceanic and Atmospheric Administration were integrated as additional explanatory variables in the modeling process. The results revealed that people with disabilities may be inclined to travel longer distances walking in the absence of suitable accommodation facilities for other transportation modes. Furthermore, people were less inclined to walk during summer and winter, indicating a need to consider weather conditions as a significant determinant of mode choice. Moreover, low-income people with disabilities were more likely to rely on public transport or walking.
Topological data analysis provides a set of tools to uncover low-dimensional structure in noisy point clouds. Prominent amongst the tools is persistence homology, which summarizes birth-death times of homological features using data objects known as persistence diagrams. To better aid statistical analysis, a functional representation of the diagrams, known as persistence landscapes, enable use of functional data analysis and machine learning tools. Topological and geometric variabilities inherent in point clouds are confounded in both persistence diagrams and landscapes, and it is important to distinguish topological signal from noise to draw reliable conclusions on the structure of the point clouds when using persistence homology. We develop a framework for decomposing variability in persistence diagrams into topological signal and topological noise through alignment of persistence landscapes using an elastic Riemannian metric. Aligned landscapes (amplitude) isolate the topological signal. Reparameterizations used for landscape alignment (phase) are linked to a resolution parameter used to generate persistence diagrams, and capture topological noise in the form of geometric, global scaling and sampling variabilities. We illustrate the importance of decoupling topological signal and topological noise in persistence diagrams (landscapes) using several simulated examples. We also demonstrate that our approach provides novel insights in two real data studies.
Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.