Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into educational assessment: the AI Assessment Scale (AIAS). The AIAS empowers educators to select the appropriate level of GenAI usage in assessments based on the learning outcomes they seek to address. The AIAS offers greater clarity and transparency for students and educators, provides a fair and equitable policy tool for institutions to work with, and offers a nuanced approach which embraces the opportunities of GenAI while recognising that there are instances where such tools may not be pedagogically appropriate or necessary. By adopting a practical, flexible approach that can be implemented quickly, the AIAS can form a much-needed starting point to address the current uncertainty and anxiety regarding GenAI in education. As a secondary objective, we engage with the current literature and advocate for a refocused discourse on GenAI tools in education, one which foregrounds how technologies can help support and enhance teaching and learning, which contrasts with the current focus on GenAI as a facilitator of academic misconduct.
Gene set analysis, a popular approach for analysing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the 'right' choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a 'trial-and-error' approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of 'cherry-picking' and cause an optimistic bias, rendering the results non-replicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such over-optimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for over-optimism is particularly high for a group of methods frequently used despite being commonly criticised. We conclude by providing practical recommendations to counter over-optimism in research findings in gene set analysis and beyond.
Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). It offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets; (2) it safeguards patient privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized server. We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing it enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability.
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision. Yet, their ability to provide well-grounded rationalizations for knowledge-intensive tasks remains under-explored. Such tasks, like commonsense multiple-choice questions, require rationales based on world knowledge to support predictions and refute alternate options. We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner. Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations. Although LLMs-generated rationales were preferable, further improvements in conciseness and novelty are required. In another study, we show how rationalization of incorrect model predictions erodes humans' trust in LLM-generated rationales. Motivated by these observations, we create a two-stage pipeline to review task predictions and eliminate potential incorrect decisions before rationalization, enabling trustworthy rationale generation.
In recent years, many positivity-preserving schemes for initial value problems have been constructed by modifying a Runge--Kutta (RK) method by weighting the right-hand side of the system of differential equations with solution-dependent factors. These include the classes of modified Patankar--Runge--Kutta (MPRK) and Geometric Conservative (GeCo) methods. Compared to traditional RK methods, the analysis of accuracy and stability of these methods is more complicated. In this work, we provide a comprehensive and unifying theory of order conditions for such RK-like methods, which differ from original RK schemes in that their coefficients are solution-dependent. The resulting order conditions are themselves solution-dependent and obtained using the theory of NB-series, and thus, can easily be read off from labeled N-trees. We present for the first time order conditions for MPRK and GeCo schemes of arbitrary order; For MPRK schemes, the order conditions are given implicitly in terms of the stages. From these results, we recover as particular cases all known order conditions from the literature for first- and second-order GeCo as well as first-, second- and third-order MPRK methods. Additionally, we derive sufficient and necessary conditions in an explicit form for 3rd and 4th order GeCo schemes as well as 4th order MPRK methods. We also present a new 4th order MPRK method within this framework and numerically confirm its convergence rate.
Extended Vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g.~it is possible to obtain an image segmentation via (deep) clustering of data-cube's spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of Macro mapping X-Ray Fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.
Large language models are meticulously aligned to be both helpful and harmless. However, recent research points to a potential overkill which means models may refuse to answer benign queries. In this paper, we investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill. Based on these insights, we introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon. We first extract such over-attention by amplifying the difference in the model's output distributions when responding to system prompts that either include or omit an emphasis on safety. Then we determine the final next-token predictions by downplaying the over-attention from the model via contrastive decoding. Empirical results indicate that our method has achieved an average reduction of the refusal rate by 20\% while having almost no impact on safety.
Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaussian noise. Our model encourages images to preserve data-fidelity in both the spatial and frequency domains, resulting in enhanced reconstruction quality. We comprehensively evaluate the performance of our model on a variety of noisy inverse problems, including inpainting, denoising, and super-resolution. Our thorough evaluation demonstrates that SaFaRI achieves state-of-the-art performance on both the ImageNet datasets and FFHQ datasets, outperforming existing zero-shot IR methods in terms of LPIPS and FID metrics.
Dialogue State Tracking (DST) models often employ intricate neural network architectures, necessitating substantial training data, and their inference processes lack transparency. This paper proposes a method that extracts linguistic knowledge via an unsupervised framework and subsequently utilizes this knowledge to augment BERT's performance and interpretability in DST tasks. The knowledge extraction procedure is computationally economical and does not necessitate annotations or additional training data. The injection of the extracted knowledge necessitates the addition of only simple neural modules. We employ the Convex Polytopic Model (CPM) as a feature extraction tool for DST tasks and illustrate that the acquired features correlate with the syntactic and semantic patterns in the dialogues. This correlation facilitates a comprehensive understanding of the linguistic features influencing the DST model's decision-making process. We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
This paper is focused on the study of entropic regularization in optimal transport as a smoothing method for Wasserstein estimators, through the prism of the classical tradeoff between approximation and estimation errors in statistics. Wasserstein estimators are defined as solutions of variational problems whose objective function involves the use of an optimal transport cost between probability measures. Such estimators can be regularized by replacing the optimal transport cost by its regularized version using an entropy penalty on the transport plan. The use of such a regularization has a potentially significant smoothing effect on the resulting estimators. In this work, we investigate its potential benefits on the approximation and estimation properties of regularized Wasserstein estimators. Our main contribution is to discuss how entropic regularization may reach, at a lower computational cost, statistical performances that are comparable to those of un-regularized Wasserstein estimators in statistical learning problems involving distributional data analysis. To this end, we present new theoretical results on the convergence of regularized Wasserstein estimators. We also study their numerical performances using simulated and real data in the supervised learning problem of proportions estimation in mixture models using optimal transport.
Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN models, the GCN based recommendation models also suffer from the notorious over-smoothing problem - when stacking more layers, node embeddings become more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in the user's embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. To form the subgraphs, we design an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperform the state-of-the-art GCN-based recommendation models significantly.