Prior measurement studies on browser fingerprinting have unfortunately largely excluded Web Audio API-based fingerprinting in their analysis. We address this issue by conducting the first systematic study of effectiveness of web audio fingerprinting mechanisms. We focus on studying the feasibility and diversity properties of web audio fingerprinting. Along with 3 known audio fingerprinting vectors, we designed and implemented 4 new audio fingerprint vectors that work by obtaining FFTs of waveforms generated via different methods. Our study analyzed audio fingerprints from 2093 web users and presents new insights into the nature of Web Audio fingerprints. First, we show that audio fingeprinting vectors, unlike other prior vectors, reveal an apparent fickleness with some users' browsers giving away differing fingerprints in repeated attempts. However, we show that it is possible to devise a graph-based analysis mechanism to collectively consider all the different fingerprints of users and thus craft a stable fingerprinting mechanism. Our analysis also shows that it is possible to do this in a timely fashion. Next, we investigate the diversity of audio fingerprints and compare this with prior techniques. Our results show that audio fingerprints are much less diverse than other vectors with only 95 distinct fingerprints among 2093 users. At the same time, further analysis shows that web audio fingerprinting can potentially bring considerable additive value (in terms of entropy) to existing fingerprinting mechanisms. We also show that our results contradict the current security and privacy recommendations provided by W3C regarding audio fingerprinting. Overall, our systematic study allows browser developers to gauge the degree of privacy invasion presented by audio fingerprinting thus helping them take a more informed stance when designing privacy protection features in the future.
For nearly three decades, spatial games have produced a wealth of insights to the study of behavior and its relation to population structure. However, as different rules and factors are added or altered, the dynamics of spatial models often become increasingly complicated to interpret. To tackle this problem, we introduce persistent homology as a rigorous framework that can be used to both define and compute higher-order features of data in a manner which is invariant to parameter choices, robust to noise, and independent of human observation. Our work demonstrates its relevance for spatial games by showing how topological features of simulation data that persist over different spatial scales reflect the stability of strategies in 2D lattice games. To do so, we analyze the persistent homology of scenarios from two games: a Prisoner's Dilemma and a SIRS epidemic model. The experimental results show how the method accurately detects features that correspond to real aspects of the game dynamics. Unlike other tools that study dynamics of spatial systems, persistent homology can tell us something meaningful about population structure while remaining neutral about the underlying structure itself. Regardless of game complexity, since strategies either succeed or fail to conform to shapes of a certain topology there is much potential for the method to provide novel insights for a wide variety of spatially extended systems in biology, social science, and physics.
Solo research is a result of individual authorship decisions which accumulate over time, accompanying academic careers. This research is the first to comprehensively study the gender solo research gap among all internationally visible scientists within a whole national higher education system: we examine the gap through individual publication portfolios constructed for each Polish university professor with at least a doctoral degree and internationally visible in the decade of 2009-2018. Solo research is a special case of academic publishing where scientists compete individually, sending clear signals about their research ability. Solo research has been expected to disappear for half a century, but it continues to exist. Our focus is on how male and female scientists of various biological ages, age groups, academic positions, and institutional types make use of, and benefit from, solo publishing. We tested the hypothesis that male and female scientists differ in their use of solo publishing, and we termed this difference the gender solo research gap. The highest share of solo research for both genders is noted for middle-aged scientists working as associate professors rather than for young scientists as in previous studies. The low journal prestige level of female solo publications may suggest the propensity of women scientists to choose less competitive publication outlets. In our unique biographical, administrative, publication, and citation database (Polish Science Observatory), we have metadata on all Polish scientists present in Scopus (N-25,463) and on their 158,743 Scopus-indexed articles published in 2009-2018, including 18,900 solo articles.
Peer-to-peer (p2p) content delivery is promising to provide benefits like cost-saving and scalable peak-demand handling in comparison with conventional content delivery networks (CDNs) and complement the decentralized storage networks such as Filecoin. However, reliable p2p delivery requires proper enforcement of delivery fairness, i.e., the deliverers should be rewarded according to their in-time delivery. Unfortunately, most existing studies on delivery fairness are based on non-cooperative game-theoretic assumptions that are arguably unrealistic in the ad-hoc p2p setting. We for the first time put forth the expressive yet still minimalist securities for p2p content delivery, and give two efficient solutions FairDownload and FairStream via the blockchain for p2p downloading and p2p streaming scenarios, respectively. Our designs not only guarantee delivery fairness to ensure deliverers be paid (nearly) proportional to his in-time delivery, but also ensure the content consumers and content providers to be fairly treated. The fairness of each party can be guaranteed when the other two parties collude to arbitrarily misbehave. Moreover, the systems are efficient in the sense of attaining asymptotically optimal on-chain costs and optimal deliverer communication. We implement the protocols to build the prototype systems atop the Ethereum Ropsten network. Extensive experiments done in LAN and WAN settings showcase their high practicality.
Interactive audio spatialization technology previously developed for video game authoring and rendering has evolved into an essential component of platforms enabling shared immersive virtual experiences for future co-presence, remote collaboration and entertainment applications. New wearable virtual and augmented reality displays employ real-time binaural audio computing engines rendering multiple digital objects and supporting the free navigation of networked participants or their avatars through a juxtaposition of environments, real and virtual, often referred to as the Metaverse. These applications require a parametric audio scene programming interface to facilitate the creation and deployment of shared, dynamic and realistic virtual 3D worlds on mobile computing platforms and remote servers. We propose a practical approach for designing parametric 6-degree-of-freedom object-based interactive audio engines to deliver the perceptually relevant binaural cues necessary for audio/visual and virtual/real congruence in Metaverse experiences. We address the effects of room reverberation, acoustic reflectors, and obstacles in both the virtual and real environments, and discuss how such effects may be driven by combinations of pre-computed and real-time acoustic propagation solvers. We envision an open scene description model distilled to facilitate the development of interoperable applications distributed across multiple platforms, where each audio object represents, to the user, a natural sound source having controllable distance, size, orientation, and acoustic radiation properties.
For a multilingual podcast streaming service, it is critical to be able to deliver relevant content to all users independent of language. Podcast content relevance is conventionally determined using various metadata sources. However, with the increasing quality of speech recognition in many languages, utilizing automatic transcriptions to provide better content recommendations becomes possible. In this work, we explore the robustness of a Latent Dirichlet Allocation topic model when applied to transcripts created by an automatic speech recognition engine. Specifically, we explore how increasing transcription noise influences topics obtained from transcriptions in Danish; a low resource language. First, we observe a baseline of cosine similarity scores between topic embeddings from automatic transcriptions and the descriptions of the podcasts written by the podcast creators. We then observe how the cosine similarities decrease as transcription noise increases and conclude that even when automatic speech recognition transcripts are erroneous, it is still possible to obtain high-quality topic embeddings from the transcriptions.
The study of fairness in multiwinner elections focuses on settings where candidates have attributes. However, voters may also be divided into predefined populations under one or more attributes (e.g., "California" and "Illinois" populations under the "state" attribute), which may be same or different from candidate attributes. The models that focus on candidate attributes alone may systematically under-represent smaller voter populations. Hence, we develop a model, DiRe Committee WinnerDetermination (DRCWD), which delineates candidate and voter attributes to select a committee by specifying diversity and representation constraints and a voting rule. We analyze its computational complexity, inapproximability, and parameterized complexity. We develop a heuristic-based algorithm, which finds the winning DiRe committee in under two minutes on 63% of the instances of synthetic datasets and on 100% of instances of real-world datasets. We present an empirical analysis of the running time, feasibility, and utility traded-off. Overall, DRCWD motivates that a study of multiwinner elections should consider both its actors, namely candidates and voters, as candidate-specific models can unknowingly harm voter populations, and vice versa. Additionally, even when the attributes of candidates and voters coincide, it is important to treat them separately as diversity does not imply representation and vice versa. This is to say that having a female candidate on the committee, for example, is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female.
Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impressive results. However the control parameters often consist of latent variables and remain complex to interpret. In this paper, we analyze and compare different latent spaces and obtain an interpretation of their influence on expressive speech. This will enable the possibility to build controllable speech synthesis systems with an understandable behaviour.
Using the 6,638 case descriptions of societal impact submitted for evaluation in the Research Excellence Framework (REF 2014), we replicate the topic model (Latent Dirichlet Allocation or LDA) made in this context and compare the results with factor-analytic results using a traditional word-document matrix (Principal Component Analysis or PCA). Removing a small fraction of documents from the sample, for example, has on average a much larger impact on LDA than on PCA-based models to the extent that the largest distortion in the case of PCA has less effect than the smallest distortion of LDA-based models. In terms of semantic coherence, however, LDA models outperform PCA-based models. The topic models inform us about the statistical properties of the document sets under study, but the results are statistical and should not be used for a semantic interpretation - for example, in grant selections and micro-decision making, or scholarly work-without follow-up using domain-specific semantic maps.
ESports tournaments, such as Dota 2's The International (TI), attract millions of spectators to watch broadcasts on online streaming platforms, to communicate, and to share their experience and emotions. Unlike traditional streams, tournament broadcasts lack a streamer figure to which spectators can appeal directly. Using topic modelling and cross-correlation analysis of more than three million messages from 86 games of TI7, we uncover main topical and temporal patterns of communication. First, we disentangle contextual meanings of emotes and memes, which play a salient role in communication, and show a meta-topics semantic map of streaming slang. Second, our analysis shows a prevalence of the event-driven game communication during tournament broadcasts and particular topics associated with the event peaks. Third, we show that "copypasta" cascades and other related practices, while occupying a significant share of messages, are strongly associated with periods of lower in-game activity. Based on the analysis, we propose design ideas to support different modes of spectators' communication.