Online communities are not safe spaces for user privacy. Even though existing research focuses on creating and improving various content moderation strategies and privacy preserving technologies, platforms hosting online communities support features allowing users to surveil one another--leading to harassment, personal data breaches, and offline harm. To tackle this problem, we introduce a new, work-in-progress framework for analyzing data privacy within vulnerable, identity-based online communities. Where current SOUPS papers study surveillance and longitudinal user data as two distinct challenges to user privacy, more work needs to be done in exploring the sites where surveillance and historical user data assemble. By synthesizing over 40 years of developments in the analysis of surveillance, we derive properties of online communities that enable the abuse of user data by fellow community members and suggest key steps to improving security for vulnerable users. Deploying this new framework on new and existing platforms will ensure that online communities are privacy-conscious and designed more inclusively.
Quantum based systems are a relatively new research area for that different modelling languages including process calculi are currently under development. Encodings are often used to compare process calculi. Quality criteria are used then to rule out trivial or meaningless encodings. In this new context of quantum based systems, it is necessary to analyse the applicability of these quality criteria and to potentially extend or adapt them. As a first step, we test the suitability of classical criteria for encodings between quantum based languages and discuss new criteria. Concretely, we present an encoding, from a language inspired by CQP into a language inspired by qCCS. We show that this encoding satisfies compositionality, name invariance (for channel and qubit names), operational correspondence, divergence reflection, success sensitiveness, and that it preserves the size of quantum registers. Then we show that there is no encoding from qCCS into CQP that is compositional, operationally corresponding, and success sensitive.
Similarity caching allows requests for an item to be served by a similar item. Applications include recommendation systems, multimedia retrieval, and machine learning. Recently, many similarity caching policies have been proposed, like SIM-LRU and RND-LRU, but the performance analysis of their hit rate is still wanting. In this paper, we show how to extend the popular time-to-live approximation in classic caching to similarity caching. In particular, we propose a method to estimate the hit rate of the similarity caching policy RND-LRU. Our method, the RND-TTL approximation, introduces the RND-TTL cache model and then tunes its parameters in such a way to mimic the behavior of RND-LRU. The parameter tuning involves solving a fixed point system of equations for which we provide an algorithm for numerical resolution and sufficient conditions for its convergence. Our approach for approximating the hit rate of RND-LRU is evaluated on both synthetic and real world traces.
Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays.
Software often fails in the field, however reproducing and debugging field failures is very challenging: the failure-inducing input may be missing, and the program setup can be complicated and hard to reproduce by the developers. In this paper, we propose to generate fault signatures from the failure locations and the original source code to reproduce the faults in small executable programs. We say that a fault signature reproduces the fault in the original program if the two failed in the same location, triggered the same error conditions after executing the same selective sequences of failure-inducing statements. A fault signature aims to contain only sufficient statements that can reproduce the faults. That way, it provides some context to inform how a fault is developed and also avoids unnecessary complexity and setups that may block fault diagnosis. To compute fault signatures from the failures, we applied a path-sensitive static analysis tool to generate a path that leads to the fault, and then applied an existing syntactic patching tool to convert the path into an executable program. Our evaluation on real-world bugs from Corebench, BugBench, and Manybugs shows that fault signatures can reproduce the fault for the original programs. Because fault signatures are less complex, automatic test input generation tools generated failure-inducing inputs that could not be generated by using the entire programs. Some failure-inducing inputs can be directly transferred to the original programs. Our experimental data are publicly available at //doi.org/10.5281/zenodo.5430155.
Quantum computing introduces unfamiliar security vulnerabilities demanding customized threat models. Hardware and software Trojans pose serious concerns needing rethinking from classical paradigms. This paper develops the first structured taxonomy of Trojans tailored to quantum information systems. We enumerate potential attack vectors across the quantum stack from hardware to software layers. A categorization of quantum Trojan types and payloads is outlined ranging from reliability degradation, functionality corruption, backdoors, and denial-of-service. Adversarial motivations behind quantum Trojans are analyzed. By consolidating diverse threats into a unified perspective, this quantum Trojan taxonomy provides insights guiding threat modeling, risk analysis, detection mechanisms, and security best practices customized for this novel computing paradigm.
Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.
The notion of "in-domain data" in NLP is often over-simplistic and vague, as textual data varies in many nuanced linguistic aspects such as topic, style or level of formality. In addition, domain labels are many times unavailable, making it challenging to build domain-specific systems. We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision -- suggesting a simple data-driven definition of domains in textual data. We harness this property and propose domain data selection methods based on such models, which require only a small set of in-domain monolingual data. We evaluate our data selection methods for neural machine translation across five diverse domains, where they outperform an established approach as measured by both BLEU and by precision and recall of sentence selection with respect to an oracle.
Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
Embedding entities and relations into a continuous multi-dimensional vector space have become the dominant method for knowledge graph embedding in representation learning. However, most existing models ignore to represent hierarchical knowledge, such as the similarities and dissimilarities of entities in one domain. We proposed to learn a Domain Representations over existing knowledge graph embedding models, such that entities that have similar attributes are organized into the same domain. Such hierarchical knowledge of domains can give further evidence in link prediction. Experimental results show that domain embeddings give a significant improvement over the most recent state-of-art baseline knowledge graph embedding models.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.