We study a crowdsourcing problem where the platform aims to incentivize distributed workers to provide high quality and truthful solutions without the ability to verify the solutions. While most prior work assumes that the platform and workers have symmetric information, we study an asymmetric information scenario where the platform has informational advantages. Specifically, the platform knows more information regarding worker average solution accuracy, and can strategically reveal such information to workers. Workers will utilize the announced information to determine the likelihood that they obtain a reward if exerting effort on the task. We study two types of workers, naive workers who fully trust the announcement, and strategic workers who update prior belief based on the announcement. For naive workers, we show that the platform should always announce a high average accuracy to maximize its payoff. However, this is not always optimal for strategic workers, as it may reduce the credibility of the platform announcement and hence reduce the platform payoff. Interestingly, the platform may have an incentive to even announce an average accuracy lower than the actual value when facing strategic workers. Another counterintuitive result is that the platform payoff may decrease in the number of high accuracy workers.
This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results estimate the convergence rates of GANs under a collection of integral probability metrics defined through H\"older classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structure or have H\"older densities, when the network architectures are chosen properly. In particular, for distributions concentrate around a low-dimensional set, it is proved that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into generator and discriminator approximation error and statistical error, which may be of independent interest.
Recently, Unmanned Aerial Vehicle (UAV) based communications systems have attracted increasing research and commercial interest due to their cost effective deployment and ease of mobility.During natural disasters and emergencies, such networks are extremely useful to provide communication service. In such scenarios, UAVs position and trajectory must be optimal to maintain Quality of Service at the user end. This paper focuses on the deployment of an SDN-based UAV network providing communication service to the users. We consider the deployment of the system in stadiums and events. In this paper, we propose a scheme to allocate UAVs to the users and a traffic congestion algorithm to reduce the number of packets dropped to avoid re-transmissions from the user end. We also propose an energy efficient multi hop routing mechanism to avoid the high power requirement to transmit longer distances. We assume that all the back-haul links have sufficient capacities to carry all the traffic from the front-haul links and the design of UAVs must consider their power requirements for both flight and transmission.
Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human--model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data.
Collaborative Mobile crowdsourcing (CMCS) allows entities, e.g., local authorities or individuals, to hire a team of workers from the crowd of connected people, to execute complex tasks. In this paper, we investigate two different CMCS recruitment strategies allowing task requesters to form teams of socially connected and skilled workers: i) a platform-based strategy where the platform exploits its own knowledge about the workers to form a team and ii) a leader-based strategy where the platform designates a group leader that recruits its own suitable team given its own knowledge about its Social Network (SN) neighbors. We first formulate the recruitment as an Integer Linear Program (ILP) that optimally forms teams according to four fuzzy-logic-based criteria: level of expertise, social relationship strength, recruitment cost, and recruiter's confidence level. To cope with NP-hardness, we design a novel low-complexity CMCS recruitment approach relying on Graph Neural Networks (GNNs), specifically graph embedding and clustering techniques, to shrink the workers' search space and afterwards, exploiting a meta-heuristic genetic algorithm to select appropriate workers. Simulation results applied on a real-world dataset illustrate the performance of both proposed CMCS recruitment approaches. It is shown that our proposed low-complexity GNN-based recruitment algorithm achieves close performances to those of the baseline ILP with significant computational time saving and ability to operate on large-scale mobile crowdsourcing platforms. It is also shown that compared to the leader-based strategy, the platform-based strategy recruits a more skilled team but with lower SN relationships and higher cost.
We study the distributed synthesis of policies for multi-agent systems to perform \emph{spatial-temporal} tasks. We formalize the synthesis problem as a \emph{factored} Markov decision process subject to \emph{graph temporal logic} specifications. The transition function and task of each agent are functions of the agent itself and its neighboring agents. In this work, we develop another distributed synthesis method, which improves the scalability and runtime by two orders of magnitude compared to our prior work. The synthesis method decomposes the problem into a set of smaller problems, one for each agent by leveraging the structure in the model, and the specifications. We show that the running time of the method is linear in the number of agents. The size of the problem for each agent is exponential only in the number of neighboring agents, which is typically much smaller than the number of agents. We demonstrate the applicability of the method in case studies on disease control, urban security, and search and rescue. The numerical examples show that the method scales to hundreds of agents with hundreds of states per agent and can also handle significantly larger state spaces than our prior work.
The current public transportation system is unable to keep up with the growing passenger demand as the population grows in urban areas. The slow or lack of improvements for public transportation pushes people to use private transportation modes, such as carpooling and ridesharing. However, the occupancy rate of personal vehicles has been dropping in many cities. In this paper, we propose a centralized transit system that integrates public transit and ridesharing, which is capable of matching drivers and public transit riders such that the riders would result in shorter travel time. The optimization goal of the system is to assign as many riders to drivers as possible for ridesharing. We describe an exact approach and approximation algorithms to achieve the optimization goal. We conduct an extensive computational study to show the effectiveness of the transit system for different approximation algorithms. Our experiments are based on the real-world traffic data in Chicago City; the data sets include both public transit and ridesharing trip information. The experiment results show that our system is able to assign more than 60% of riders to drivers, leading to a substantial increase in occupancy rate of personal vehicles and reducing riders' travel time.
Low-energy Bluetooth devices have become ubiquitous and widely used for different applications. Among these, Bluetooth trackers are becoming popular as they allow users to track the location of their physical objects. To do so, Bluetooth trackers are often built-in within other commercial products connected to a larger crowdsourced tracking system. Such a system, however, can pose a threat to the security and privacy of the users, for instance, by revealing the location of a user's valuable object. In this paper, we introduce a set of security properties and investigate the state of commercial crowdsourced tracking systems, which present common design flaws that make them insecure. Leveraging the results of our investigation, we propose a new design for a secure crowdsourced tracking system (SECrow), which allows devices to leverage the benefits of the crowdsourced model without sacrificing security and privacy. Our preliminary evaluation shows that SECrow is a practical, secure, and effective crowdsourced tracking solution
A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.
While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.
Privacy is a major good for users of personalized services such as recommender systems. When applied to the field of health informatics, privacy concerns of users may be amplified, but the possible utility of such services is also high. Despite availability of technologies such as k-anonymity, differential privacy, privacy-aware recommendation, and personalized privacy trade-offs, little research has been conducted on the users' willingness to share health data for usage in such systems. In two conjoint-decision studies (sample size n=521), we investigate importance and utility of privacy-preserving techniques related to sharing of personal health data for k-anonymity and differential privacy. Users were asked to pick a preferred sharing scenario depending on the recipient of the data, the benefit of sharing data, the type of data, and the parameterized privacy. Users disagreed with sharing data for commercial purposes regarding mental illnesses and with high de-anonymization risks but showed little concern when data is used for scientific purposes and is related to physical illnesses. Suggestions for health recommender system development are derived from the findings.