The Digital Services Act (DSA) requires large social media platforms in the EU to provide clear and specific information whenever they remove or restrict access to certain content. These "Statements of Reasons" (SoRs) are collected in the DSA Transparency Database to ensure transparency and scrutiny of content moderation decisions of the providers of online platforms. In this work, we empirically analyze 156 million SoRs within an observation period of two months to provide an early look at content moderation decisions of social media platforms in the EU. Our empirical analysis yields the following main findings: (i) There are vast differences in the frequency of content moderation across platforms. For instance, TikTok performs more than 350 times more content moderation decisions per user than X/Twitter. (ii) Content moderation is most commonly applied for text and videos, whereas images and other content formats undergo moderation less frequently. (ii) The primary reasons for moderation include content falling outside the platform's scope of service, illegal/harmful speech, and pornography/sexualized content, with moderation of misinformation being relatively uncommon. (iii) The majority of rule-breaking content is detected and decided upon via automated means rather than manual intervention. However, X/Twitter reports that it relies solely on non-automated methods. (iv) There is significant variation in the content moderation actions taken across platforms. Altogether, our study implies inconsistencies in how social media platforms implement their obligations under the DSA -- resulting in a fragmented outcome that the DSA is meant to avoid. Our findings have important implications for regulators to clarify existing guidelines or lay out more specific rules that ensure common standards on how social media providers handle rule-breaking content on their platforms.
Retrieval-Augmented Generation (RAG) systems represent a significant advancement over traditional Large Language Models (LLMs). RAG systems enhance their generation ability by incorporating external data retrieved through an Information Retrieval (IR) phase, overcoming the limitations of standard LLMs, which are restricted to their pre-trained knowledge and limited context window. Most research in this area has predominantly concentrated on the generative aspect of LLMs within RAG systems. Our study fills this gap by thoroughly and critically analyzing the influence of IR components on RAG systems. This paper analyzes which characteristics a retriever should possess for an effective RAG's prompt formulation, focusing on the type of documents that should be retrieved. We evaluate various elements, such as the relevance of the documents to the prompt, their position, and the number included in the context. Our findings reveal, among other insights, that including irrelevant documents can unexpectedly enhance performance by more than 30% in accuracy, contradicting our initial assumption of diminished quality. These results underscore the need for developing specialized strategies to integrate retrieval with language generation models, thereby laying the groundwork for future research in this field.
Recent initiatives known as Future Internet Architectures (FIAs) seek to redesign the Internet to improve performance, scalability, and security. However, some governments perceive Internet access as a threat to their political standing and engage in widespread network surveillance and censorship. In this paper, we provide an in-depth analysis into the designs of prominent FIAs, to help understand of how FIAs impact surveillance and censorship abilities. Then, we survey the applicability of privacy-enhancing technologies to FIAs. We conclude by providing guidelines for future research into novel FIA-based privacy-enhancing technologies, and recommendations to guide the evaluation of these technologies.
Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no human annotations for question-critical timestamps in existing VideoQA datasets. In light of this, we propose a novel weakly supervised framework to enforce the LMMs to reason out the answers with question-critical moments as visual inputs. Specifically, we fuse the question and answer pairs as event descriptions to find multiple keyframes as target moments, which will be pseudo-labels. With these pseudo-labels as additionally weak supervision, we devise a lightweight Gaussian-based Contrastive Grounding (GCG) module. GCG learns multiple Gaussian functions to characterize the temporal structure of the video, and sample question-critical frames as positive moments to be the visual inputs of LMMs. Extensive experiments on several VideoQA benchmarks verify the effectiveness of our framework, and we achieve substantial improvements compared to previous state-of-the-art methods.
In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.
Explanations of AI systems rarely address the information needs of people affected by algorithmic decision-making (ADM). This gap between conveyed information and information that matters to affected stakeholders can impede understanding and adherence to regulatory frameworks such as the AI Act. To address this gap, we present the "XAI Novice Question Bank": A catalog of affected stakeholders' information needs in two ADM use cases (employment prediction and health monitoring), covering the categories data, system context, system usage, and system specifications. Information needs were gathered in an interview study where participants received explanations in response to their inquiries. Participants further reported their understanding and decision confidence, showing that while confidence tended to increase after receiving explanations, participants also met understanding challenges, such as being unable to tell why their understanding felt incomplete. Explanations further influenced participants' perceptions of the systems' risks and benefits, which they confirmed or changed depending on the use case. When risks were perceived as high, participants expressed particular interest in explanations about intention, such as why and to what end a system was put in place. With this work, we aim to support the inclusion of affected stakeholders into explainability by contributing an overview of information and challenges relevant to them when deciding on the adoption of ADM systems. We close by summarizing our findings in a list of six key implications that inform the design of future explanations for affected stakeholder audiences.
Most of the existing research on degrees-of-freedom (DoF) with imperfect channel state information at the transmitter (CSIT) assume the messages are private, which may not reflect reality as the two receivers can request the same content. To overcome this limitation, we therefore consider the hybrid unicast and multicast messages. In particular, we characterize the optimal DoF region for the two-user multiple-input multiple-output (MIMO) broadcast channel (BC) with imperfect CSIT and hybrid messages. For the converse, we establish a three-step procedure to exploit the utmost possible relaxation. For the achievability, since the DoF region is with specific three-dimensional structure regarding antenna configurations and CSIT qualities, we verify the existence or non-existence of corner point candidates via the feature of antenna configurations and CSIT qualities categorization and provide a hybrid message-aware rate-splitting scheme. Besides, we show that to achieve the strictly positive corner points, it is unnecessary to split the unicast messages into private and common parts, implying that adding a multicast message may mitigate the rate-splitting complexity.
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor to methodically delineate the concept of "scientific language", whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of scientific disciplines, our analysis adopts a focused lens, concentrating on the biological and chemical domains. This includes an in-depth examination of LLMs for textual knowledge, small molecules, macromolecular proteins, genomic sequences, and their combinations, analyzing them in terms of model architectures, capabilities, datasets, and evaluation. Finally, we critically examine the prevailing challenges and point out promising research directions along with the advances of LLMs. By offering a comprehensive overview of technical developments in this field, this survey aspires to be an invaluable resource for researchers navigating the intricate landscape of scientific LLMs.
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.
The Council on Environmental Quality's Climate and Economic Justice Screening Tool defines "disadvantaged communities" (DAC) in the USA, highlighting census tracts where benefits of climate and energy investments are not accruing. We use a principal component generalized linear model, which addresses the intertwined nature of economic factors, income and employment and model their relationship to DAC status. Our study 1) identifies the most significant income groups and employment industries that impact DAC status, 2) provides the probability of DAC status across census tracts and compares the predictive accuracy with widely used machine learning approaches, 3) obtains historical predictions of the probability of DAC status, 4) obtains spatial downscaling of DAC status across block groups. Our study provides valuable insights for policymakers and stakeholders to develop strategies that promote sustainable development and address inequities in climate and energy investments in the USA.
Knowledge Graph Embedding (KGE) aims to learn representations for entities and relations. Most KGE models have gained great success, especially on extrapolation scenarios. Specifically, given an unseen triple (h, r, t), a trained model can still correctly predict t from (h, r, ?), or h from (?, r, t), such extrapolation ability is impressive. However, most existing KGE works focus on the design of delicate triple modeling function, which mainly tells us how to measure the plausibility of observed triples, but offers limited explanation of why the methods can extrapolate to unseen data, and what are the important factors to help KGE extrapolate. Therefore in this work, we attempt to study the KGE extrapolation of two problems: 1. How does KGE extrapolate to unseen data? 2. How to design the KGE model with better extrapolation ability? For the problem 1, we first discuss the impact factors for extrapolation and from relation, entity and triple level respectively, propose three Semantic Evidences (SEs), which can be observed from train set and provide important semantic information for extrapolation. Then we verify the effectiveness of SEs through extensive experiments on several typical KGE methods. For the problem 2, to make better use of the three levels of SE, we propose a novel GNN-based KGE model, called Semantic Evidence aware Graph Neural Network (SE-GNN). In SE-GNN, each level of SE is modeled explicitly by the corresponding neighbor pattern, and merged sufficiently by the multi-layer aggregation, which contributes to obtaining more extrapolative knowledge representation. Finally, through extensive experiments on FB15k-237 and WN18RR datasets, we show that SE-GNN achieves state-of-the-art performance on Knowledge Graph Completion task and performs a better extrapolation ability.