The rise of mobile apps has brought greater convenience and customization for users. However, many apps use analytics services to collect a wide range of user interaction data purportedly to improve their service, while presenting app users with vague or incomplete information about this collection in their privacy policies. Typically, such policies neglect to describe all types of user interaction data or how the data is collected. User interaction data is not directly regulated by privacy legislation such as the GDPR. However, the extent and hidden nature of its collection means both that apps are walking a legal tightrope and that users' trust is at risk. To facilitate transparency and comparison, and based on common phrases used in published privacy policies and Android documentation, we make a standardized collection claim template. Based on static analysis of actual data collection implementations, we compare the privacy policy claims of the top 10 apps to fact-checked collection claims. Our findings reveal that all the claims made by these apps are incomplete. By providing a standardized way of describing user interaction data collection in mobile apps and comparing actual collection practices to privacy policies, this work aims to increase transparency and establish trust between app developers and users.
Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations along with several applications of its methods. The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications. This tutorial is intended to benefit both researchers and industry practitioners who are interested in developing new ULTR solutions or utilizing them in real-world applications.
With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.
While underlying the many ways to build strong cooperation settings between regulators and CSOs, this report focuses on making concrete recommendations for the design of an efficient and influential expert group with the European Commission. The creation of an expert group finds its roots in article 64 and recital 137 of the DSA which require the Commission to develop Union expertise and capabilities. Once established, the experts of this group will be able to bring evidence-based information directly to the Commission and specific expertise on the protection of fundamental rights and the safety of users online. By instituting an expert group, the Commission will not only benefit from valuable expert knowledge but will also demonstrate its willingness to put in place an efficient enforcement system based on collective intelligence. Aside from the establishment of an expert group, other cumulative mechanisms will also help the DSA's enforcement to thrive. Civil society organisations should, for instance, consider organising regular crowdsourcing events to deep-dive and analyse the data published by entities covered by the transparency obligations. As it has done in the past, the Commission can sponsor these events and be a direct beneficiary of their results. Another way for civil society organisations to bring information to the Regulator is by legal action, including by making complaints to the regulators.
Separating environmental effects from those of interspecific interactions on species distributions has always been a central objective of community ecology. Despite years of effort in analysing patterns of species co-occurrences and the developments of sophisticated tools, we are still unable to address this major objective. A key reason is that the wealth of ecological knowledge is not sufficiently harnessed in current statistical models, notably the knowledge on interspecific interactions. Here, we develop ELGRIN, a statistical model that simultaneously combines knowledge on interspecific interactions (i.e., the metanetwork), environmental data and species occurrences to tease apart their relative effects on species distributions. Instead of focusing on single effects of pairwise species interactions, which have little sense in complex communities, ELGRIN contrasts the overall effect of species interactions to that of the environment. Using various simulated and empirical data, we demonstrate the suitability of ELGRIN to address the objectives for various types of interspecific interactions like mutualism, competition and trophic interactions. We then apply the model on vertebrate trophic networks in the European Alps to map the effect of biotic interactions on species distributions.Data on ecological networks are everyday increasing and we believe the time is ripe to mobilize these data to better understand biodiversity patterns. ELGRIN provides this opportunity to unravel how interspecific interactions actually influence species distributions.
Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the usefulness of collecting system data, it may significantly impact the system execution by delaying response times and competing with system resources. The typical approach to cope with this is to filter portions of the system to be monitored and to sample data. Although these approaches are a step towards achieving a desired trade-off between the amount of collected information and the impact on the system performance, they focus on collecting data of a particular type or may capture a sample that does not correspond to the actual system behavior. In response, we propose an adaptive runtime monitoring process to dynamically adapt the sampling rate while monitoring software systems. It includes algorithms with statistical foundations to improve the representativeness of collected samples without compromising the system performance. Our evaluation targets five applications of a widely used benchmark. It shows that the error (RMSE) of the samples collected with our approach is 9-54% lower than the main alternative strategy (sampling rate inversely proportional to the throughput), with 1-6% higher performance impact.
The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We analyzed Twitter data to identify key concerns related to the use of ChatGPT in education. We employed BERT-based topic modeling to conduct a discourse analysis and social network analysis to identify influential users in the conversation. While Twitter users generally ex-pressed a positive attitude towards the use of ChatGPT, their concerns converged to five specific categories: academic integrity, impact on learning outcomes and skill development, limitation of capabilities, policy and social concerns, and workforce challenges. We also found that users from the tech, education, and media fields were often implicated in the conversation, while education and tech individual users led the discussion of concerns. Based on these findings, the study provides several implications for policymakers, tech companies and individuals, educators, and media agencies. In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Recommender systems exploit interaction history to estimate user preference, having been heavily used in a wide range of industry applications. However, static recommendation models are difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. In a CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. Considerable efforts, spread across disparate settings and applications, have been put into developing CRSs. Existing models, technologies, and evaluation methods for CRSs are far from mature. In this paper, we provide a systematic review of the techniques used in current CRSs. We summarize the key challenges of developing CRSs into five directions: (1) Question-based user preference elicitation. (2) Multi-turn conversational recommendation strategies. (3) Dialogue understanding and generation. (4) Exploitation-exploration trade-offs. (5) Evaluation and user simulation. These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI). Based on these research directions, we discuss some future challenges and opportunities. We provide a road map for researchers from multiple communities to get started in this area. We hope this survey helps to identify and address challenges in CRSs and inspire future research.
Deep Learning algorithms have achieved the state-of-the-art performance for Image Classification and have been used even in security-critical applications, such as biometric recognition systems and self-driving cars. However, recent works have shown those algorithms, which can even surpass the human capabilities, are vulnerable to adversarial examples. In Computer Vision, adversarial examples are images containing subtle perturbations generated by malicious optimization algorithms in order to fool classifiers. As an attempt to mitigate these vulnerabilities, numerous countermeasures have been constantly proposed in literature. Nevertheless, devising an efficient defense mechanism has proven to be a difficult task, since many approaches have already shown to be ineffective to adaptive attackers. Thus, this self-containing paper aims to provide all readerships with a review of the latest research progress on Adversarial Machine Learning in Image Classification, however with a defender's perspective. Here, novel taxonomies for categorizing adversarial attacks and defenses are introduced and discussions about the existence of adversarial examples are provided. Further, in contrast to exisiting surveys, it is also given relevant guidance that should be taken into consideration by researchers when devising and evaluating defenses. Finally, based on the reviewed literature, it is discussed some promising paths for future research.
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.