News outlets are now more than ever incentivized to provide their audience with slanted news, while the intrinsic homophilic nature of online social media may exacerbate polarized opinions. Here, we propose a new dynamic latent space model for time-varying online audience-duplication networks, which exploits social media content to conduct inference on media bias and polarization of news outlets. Our model contributes to the literature in several directions: 1) we provide a model-embedded data-driven interpretation for the latent leaning of news outlets in terms of media bias; 2) we endow our model with Markov-switching dynamics to capture polarization regimes while maintaining a parsimonious specification; 3) we contribute to the literature on the statistical properties of latent space network models. The proposed model is applied to a set of data on the online activity of national and local news outlets from four European countries in the years 2015 and 2016. We find evidence of a strong positive correlation between our media slant measure and a well-grounded external source of media bias. In addition, we provide insight into the polarization regimes across the four countries considered.
The efficacy of self-supervised speech models has been validated, yet the optimal utilization of their representations remains challenging across diverse tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks. AWEs have previously shown utility in capturing acoustic discriminability. In light of this, we propose measuring layer-wise similarity between AWEs and word embeddings, aiming to further investigate the inherent context within AWEs. Moreover, we evaluate the contribution of AWEs, in comparison to other types of speech features, in the context of Speech Emotion Recognition (SER). Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings. Our findings underscore the acoustic context conveyed by AWEs and showcase the highly competitive SER accuracies by appropriately employing AWEs.
Despite Deep Learning's (DL) empirical success, our theoretical understanding of its efficacy remains limited. One notable paradox is that while conventional wisdom discourages perfect data fitting, deep neural networks are designed to do just that, yet they generalize effectively. This study focuses on exploring this phenomenon attributed to the implicit bias at play. Various sources of implicit bias have been identified, such as step size, weight initialization, optimization algorithm, and number of parameters. In this work, we focus on investigating the implicit bias originating from weight initialization. To this end, we examine the problem of solving underdetermined linear systems in various contexts, scrutinizing the impact of initialization on the implicit regularization when using deep networks to solve such systems. Our findings elucidate the role of initialization in the optimization and generalization paradoxes, contributing to a more comprehensive understanding of DL's performance characteristics.
The collaboration of the real world and the virtual world, known as Digital Twin, has become a trend with numerous successful use cases. However, there are challenges mentioned in the literature that must be addressed. One of the most important issues is the difficulty of collaboration of Digital Twins due to the lack of standardization in their implementation. This article continues a previous work that proposed a generic architecture based on the FIWARE components to build Digital Twins in any field. Our work proposes the use of Linked Open Data as a mechanism to facilitate the communication of Digital Twins. We validate our proposal with a use case of an urban Digital Twin that collaborates with a parking Digital Twin. We conclude that Linked Open Data in combination with the FIWARE ecosystem is a real reference option to deploy Digital Twins and to enable the collaboration between Digital Twins.
This study investigates the possibility of mitigating the demographic biases that affect face recognition technologies through the use of synthetic data. Demographic biases have the potential to impact individuals from specific demographic groups, and can be identified by observing disparate performance of face recognition systems across demographic groups. They primarily arise from the unequal representations of demographic groups in the training data. In recent times, synthetic data have emerged as a solution to some problems that affect face recognition systems. In particular, during the generation process it is possible to specify the desired demographic and facial attributes of images, in order to control the demographic distribution of the synthesized dataset, and fairly represent the different demographic groups. We propose to fine-tune with synthetic data existing face recognition systems that present some demographic biases. We use synthetic datasets generated with GANDiffFace, a novel framework able to synthesize datasets for face recognition with controllable demographic distribution and realistic intra-class variations. We consider multiple datasets representing different demographic groups for training and evaluation. Also, we fine-tune different face recognition systems, and evaluate their demographic fairness with different metrics. Our results support the proposed approach and the use of synthetic data to mitigate demographic biases in face recognition.
Context. GitHub has introduced a new gamification element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities. Objective. In this paper, we present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub, in addition to the effects and reactions to their introduction. Method. First, we conduct an observational study by mining longitudinal data from more than 6,000 developers and performed correlation and regression analysis. Then, we conduct a survey and analyze over 300 GitHub community discussions on the topic of personal badges to gauge how the community responded to the introduction of the new feature. Results. We find that most of the developers sampled own at least a badge, but we also observe an increasing number of users who choose to keep their profile private and opt out of displaying badges. Besides, badges are generally poorly correlated with developers' qualities and dispositions such as timeliness and desire to collaborate. We also find that, except for the Starstruck badge (reflecting the number of followers), their introduction does not have an effect. Finally, the reaction of the community has been in general mixed, as developers find them appealing in principle but without a clear purpose and hardly reflecting their abilities in the current form. Conclusions. We provide recommendations to GitHub platform designers on how to improve the current implementation of personal badges as both a gamification mechanism and as sources of reliable cues of ability for developers' assessment
Large Foundational Language Models are capable of performing many tasks at a high level but are difficult to deploy in many applications because of their size and proprietary ownership. Many will be motivated to distill specific capabilities of foundational models into smaller models that can be owned and controlled. In the development of a therapeutic chatbot, we wish to distill a capability known as reflective listening, in which a therapist produces reflections of client speech. These reflections either restate what a client has said, or connect what was said to a relevant observation, idea or guess that encourages and guides the client to continue contemplation. In this paper, we present a method for distilling the generation of reflections from a Foundational Language Model (GPT-4) into smaller models. We first show that GPT-4, using zero-shot prompting, can generate reflections at near 100% success rate, superior to all previous methods. Using reflections generated by GPT-4, we fine-tune different sizes of the GPT-2 family. The GPT-2-small model achieves 83% success on a hold-out test set and the GPT-2 XL achieves 90% success. We also show that GPT-4 can help in the labor-intensive task of evaluating the quality of the distilled models, using it as a zero-shot classifier. Using triple-human review as a guide, the classifier achieves a Cohen-Kappa of 0.66, a substantial inter-rater reliability figure.
Video diffusion models has been gaining increasing attention for its ability to produce videos that are both coherent and of high fidelity. However, the iterative denoising process makes it computationally intensive and time-consuming, thus limiting its applications. Inspired by the Consistency Model (CM) that distills pretrained image diffusion models to accelerate the sampling with minimal steps and its successful extension Latent Consistency Model (LCM) on conditional image generation, we propose AnimateLCM, allowing for high-fidelity video generation within minimal steps. Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy that decouples the distillation of image generation priors and motion generation priors, which improves the training efficiency and enhance the generation visual quality. Additionally, to enable the combination of plug-and-play adapters in stable diffusion community to achieve various functions (e.g., ControlNet for controllable generation). we propose an efficient strategy to adapt existing adapters to our distilled text-conditioned video consistency model or train adapters from scratch without harming the sampling speed. We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results. Experimental results validate the effectiveness of our proposed method. Code and weights will be made public. More details are available at //github.com/G-U-N/AnimateLCM.
Optimization problems frequently appear in any scientific domain. Most of the times, the corresponding decision problem turns out to be NP-hard, and in these cases genetic algorithms are often used to obtain approximated solutions. However, the difficulty to approximate different NP-hard problems can vary a lot. In this paper, we analyze the usefulness of using genetic algorithms depending on the approximation class the problem belongs to. In particular, we use the standard approximability hierarchy, showing that genetic algorithms are especially useful for the most pessimistic classes of the hierarchy
Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.