Recent legislation required AI platforms to provide APIs for regulators to assess their compliance with the law. Research has nevertheless shown that platforms can manipulate their API answers through fairwashing. Facing this threat for reliable auditing, this paper studies the benefits of the joint use of platform scraping and of APIs. In this setup, we elaborate on the use of scraping to detect manipulated answers: since fairwashing only manipulates API answers, exploiting scraps may reveal a manipulation. To abstract the wide range of specific API-scrap situations, we introduce a notion of proxy that captures the consistency an auditor might expect between both data sources. If the regulator has a good proxy of the consistency, then she can easily detect manipulation and even bypass the API to conduct her audit. On the other hand, without a good proxy, relying on the API is necessary, and the auditor cannot defend against fairwashing. We then simulate practical scenarios in which the auditor may mostly rely on the API to conveniently conduct the audit task, while maintaining her chances to detect a potential manipulation. To highlight the tension between the audit task and the API fairwashing detection task, we identify Pareto-optimal strategies in a practical audit scenario. We believe this research sets the stage for reliable audits in practical and manipulation-prone setups.
Randomized experiments (REs) are the cornerstone for treatment effect evaluation. However, due to practical considerations, REs may encounter difficulty recruiting sufficient patients. External controls (ECs) can supplement REs to boost estimation efficiency. Yet, there may be incomparability between ECs and concurrent controls (CCs), resulting in misleading treatment effect evaluation. We introduce a novel bias function to measure the difference in the outcome mean functions between ECs and CCs. We show that the ANCOVA model augmented by the bias function for ECs renders a consistent estimator of the average treatment effect, regardless of whether or not the ANCOVA model is correct. To accommodate possibly different structures of the ANCOVA model and the bias function, we propose a double penalty integration estimator (DPIE) with different penalization terms for the two functions. With an appropriate choice of penalty parameters, our DPIE ensures consistency, oracle property, and asymptotic normality even in the presence of model misspecification. DPIE is more efficient than the estimator derived from REs alone, validated through theoretical and experimental results.
This paper presents a new method to describe spatio-temporal relations between objects and hands, to recognize both interactions and activities within video demonstrations of manual tasks. The approach exploits Scene Graphs to extract key interaction features from image sequences while simultaneously encoding motion patterns and context. Additionally, the method introduces event-based automatic video segmentation and clustering, which allow for the grouping of similar events and detect if a monitored activity is executed correctly. The effectiveness of the approach was demonstrated in two multi-subject experiments, showing the ability to recognize and cluster hand-object and object-object interactions without prior knowledge of the activity, as well as matching the same activity performed by different subjects.
Building interactive data interfaces is hard because the design of an interface depends on the data processing needs for the underlying analysis task, yet we do not have a good representation for analysis tasks. To fill this gap, this paper advocates for a Data Interface Grammar (DIG) as an intermediate representation of analysis tasks. We show that DIG is compatible with existing data engineering practices, compact to represent any analysis, simple to translate into an interface design, and amenable to offline analysis. We further illustrate the potential benefits of this abstraction, such as automatic interface generation, automatic interface backend optimization, tutorial generation, and workload generation.
This paper describes an auditory display of Hawaii's 2019 coral bleaching data via means of spatial audio and parameter mapping methods. Selected data fields spanning 78 days are mapped to sound surrogates of coral reefs' natural soundscapes, which are progressively altered in their constituent elements as the corresponding coral locations undergo bleaching. For some of these elements, this process outlines a trajectory from a dense to a sparser, reduced soundscape, while for others it translates moving away from harmonic tones and towards complex spectra. This experiment is accompanied by a short evaluation study to contextualize it in an established aesthetic perspective space and to probe its potential for public engagement in the discourse around climate change.
In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.
Massive captured face images are stored in the database for the identification of individuals. However, the stored images can be observed intentionally or unintentionally by data managers, which is not at the will of individuals and may cause privacy violations. Existing protection works only slightly change the visual content of the face while maintaining the utility of identification, making it susceptible to the inference of the true identity by human vision. In this paper, we propose an identity hider that enables significant visual content change for human vision while preserving high identifiability for face recognizers. Firstly, the identity hider generates a virtual face with new visual content by manipulating the latent space in StyleGAN2. In particular, the virtual face has the same irrelevant attributes as the original face, e.g., pose and expression. Secondly, the visual content of the virtual face is transferred into the original face and then the background is replaced with the original one. In addition, the identity hider has strong transferability, which ensures an arbitrary face recognizer can achieve satisfactory accuracy. Adequate experiments show that the proposed identity hider achieves excellent performance on privacy protection and identifiability preservation.
Edge computing facilitates low-latency services at the network's edge by distributing computation, communication, and storage resources within the geographic proximity of mobile and Internet-of-Things (IoT) devices. The recent advancement in Unmanned Aerial Vehicles (UAVs) technologies has opened new opportunities for edge computing in military operations, disaster response, or remote areas where traditional terrestrial networks are limited or unavailable. In such environments, UAVs can be deployed as aerial edge servers or relays to facilitate edge computing services. This form of computing is also known as UAV-enabled Edge Computing (UEC), which offers several unique benefits such as mobility, line-of-sight, flexibility, computational capability, and cost-efficiency. However, the resources on UAVs, edge servers, and IoT devices are typically very limited in the context of UEC. Efficient resource management is, therefore, a critical research challenge in UEC. In this article, we present a survey on the existing research in UEC from the resource management perspective. We identify a conceptual architecture, different types of collaborations, wireless communication models, research directions, key techniques and performance indicators for resource management in UEC. We also present a taxonomy of resource management in UEC. Finally, we identify and discuss some open research challenges that can stimulate future research directions for resource management in UEC.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.
Driven by the visions of Internet of Things and 5G communications, the edge computing systems integrate computing, storage and network resources at the edge of the network to provide computing infrastructure, enabling developers to quickly develop and deploy edge applications. Nowadays the edge computing systems have received widespread attention in both industry and academia. To explore new research opportunities and assist users in selecting suitable edge computing systems for specific applications, this survey paper provides a comprehensive overview of the existing edge computing systems and introduces representative projects. A comparison of open source tools is presented according to their applicability. Finally, we highlight energy efficiency and deep learning optimization of edge computing systems. Open issues for analyzing and designing an edge computing system are also studied in this survey.