Cultural evolution theory suggests that prestige bias (whereby individuals preferentially learn from prestigious figures) has played a key role in human ecological success. However, its impact within online environments remains unclear, particularly regarding whether reposts by prestigious individuals amplify diffusion more effectively than reposts by non-influential users. Here, we analyzed over 55 million posts and 520 million reposts on Twitter (currently X) to examine whether users with high influence scores (hg-index) more effectively amplified the reach of others' content. Our findings indicate that posts shared by influencers were more likely to be further shared compared to those shared by non-influencers. This effect persisted over time, especially in viral posts. Moreover, a small group of highly influential users accounted for approximately half of the information flow within repost cascades. These findings demonstrate a prestige bias in information diffusion within digital society, suggesting that cognitive biases shape content spread through reposting.
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.
Stablecoins are digital assets designed to maintain a stable value, typically pegged to traditional currencies. Despite their growing prominence, many stablecoins have struggled to consistently meet stability expectations, and their underlying mechanisms often remain opaque and challenging to analyze. This paper focuses on the DAI stablecoin, which combines crypto-collateralization and algorithmic mechanisms. We propose a formal logic-based framework for representing the policies and operations of DAI, implemented in Prolog and released as open-source software. Our framework enables detailed analysis and simulation of DAI's stability mechanisms, providing a foundation for understanding its robustness and identifying potential vulnerabilities.
The ability to engage in other activities during the ride is considered by consumers as one of the key reasons for the adoption of automated vehicles. However, engagement in non-driving activities will provoke occupants' motion sickness, deteriorating their overall comfort and thereby risking acceptance of automated driving. Therefore, it is critical to extend our understanding of motion sickness and unravel the modulating factors that affect it through experiments with participants. Currently, most experiments are conducted on public roads (realistic but not reproducible) or test tracks (feasible with prototype automated vehicles). This research study develops a method to design an optimal path and speed reference to efficiently replicate on-road motion sickness exposure on a small test track. The method uses model predictive control to replicate the longitudinal and lateral accelerations collected from on-road drives on a test track of 70 m by 175 m. A within-subject experiment (47 participants) was conducted comparing the occupants' motion sickness occurrence in test-track and on-road conditions, with the conditions being cross-randomized. The results illustrate no difference and no effect of the condition on the occurrence of the average motion sickness across the participants. Meanwhile, there is an overall correspondence of individual sickness levels between on-road and test-track. This paves the path for the employment of our method for a simpler, safer and more replicable assessment of motion sickness.
Language models have gained significant interest due to their general-purpose capabilities, which appear to emerge as models are scaled to increasingly larger parameter sizes. However, these large models impose stringent requirements on computing systems, necessitating significant memory and processing requirements for inference. This makes performing inference on mobile and edge devices challenging, often requiring invocating remotely-hosted models via network calls. Remote inference, in turn, introduces issues like latency, unreliable network connectivity, and privacy concerns. To address these challenges, we explored the possibility of deviating from the trend of increasing model size. Instead, we hypothesize that much smaller models (~30-120M parameters) can outperform their larger counterparts for specific tasks by carefully curating the data used for pre-training and fine-tuning. We investigate this within the context of deploying edge-device models to support sensing applications. We trained several foundational models through a systematic study and found that small models can run locally on edge devices, achieving high token rates and accuracy. Based on these findings, we developed a framework that allows users to train foundational models tailored to their specific applications and deploy them at the edge.
Communities and groups often need to make decisions grounded by social norms and preferences, such as when moderating content or providing judgments for aligning AI systems. Prevailing approaches to provide this grounding have primarily centered around constructing high-level guidelines and criteria, similar to legal ``constitutions''. However, it can be challenging to specify social norms and preferences consistently and accurately through constitutions alone. In this work, we take inspiration from legal systems and introduce ``case law grounding'' (CLG) -- a novel approach for grounding decision-making that uses past cases and decisions (precedents) to ground future decisions in a way that can be utilized by human-led processes or implemented through prompting large language models (LLMs). We evaluate how accurately CLG grounds decisions with five groups and communities spread across two decision task domains, comparing against a traditional constitutional grounding approach, and find that in 4 out of 5 groups, decisions produced with CLG were significantly more accurately aligned to ground truth: 16.0--23.3 %-points higher accuracy using the human-led process, and 20.8--32.9 %-points higher when prompting LLMs. We also evaluate the impact of different configurations of CLG, such as the case retrieval window size and whether to enforce binding decisions based on selected precedents, showing support for using binding decisions and preferring larger retrieval windows. Finally, we discuss the limitations of our case-based approach as well as how it may be best used to augment existing constitutional approaches when it comes to aligning human and AI decisions.
Knowledge distillation (KD) remains challenging due to the opaque nature of the knowledge transfer process from a Teacher to a Student, making it difficult to address certain issues related to KD. To address this, we proposed UniCAM, a novel gradient-based visual explanation method, which effectively interprets the knowledge learned during KD. Our experimental results demonstrate that with the guidance of the Teacher's knowledge, the Student model becomes more efficient, learning more relevant features while discarding those that are not relevant. We refer to the features learned with the Teacher's guidance as distilled features and the features irrelevant to the task and ignored by the Student as residual features. Distilled features focus on key aspects of the input, such as textures and parts of objects. In contrast, residual features demonstrate more diffused attention, often targeting irrelevant areas, including the backgrounds of the target objects. In addition, we proposed two novel metrics: the feature similarity score (FSS) and the relevance score (RS), which quantify the relevance of the distilled knowledge. Experiments on the CIFAR10, ASIRRA, and Plant Disease datasets demonstrate that UniCAM and the two metrics offer valuable insights to explain the KD process.
Large Language Models (LLMs) show impressive inductive reasoning capabilities, enabling them to generate hypotheses that could generalize effectively to new instances when guided by in-context demonstrations. However, in real-world applications, LLMs' hypothesis generation is not solely determined by these demonstrations but is significantly shaped by task-specific model priors. Despite their critical influence, the distinct contributions of model priors versus demonstrations to hypothesis generation have been underexplored. This study bridges this gap by systematically evaluating three inductive reasoning strategies across five real-world tasks with three LLMs. Our empirical findings reveal that, hypothesis generation is primarily driven by the model's inherent priors; removing demonstrations results in minimal loss of hypothesis quality and downstream usage. Further analysis shows the result is consistent across various label formats with different label configurations, and prior is hard to override, even under flipped labeling. These insights advance our understanding of the dynamics of hypothesis generation in LLMs and highlight the potential for better utilizing model priors in real-world inductive reasoning tasks.
This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.