Differential Privacy (DP) relies on random numbers to preserve privacy, typically utilising Pseudorandom Number Generators (PRNGs) as a source of randomness. In order to allow for consistent reproducibility, testing and bug-fixing in DP algorithms and results, it is important to allow for the seeding of the PRNGs used therein. In this work, we examine the landscape of Random Number Generators (RNGs), and the considerations software engineers should make when choosing and seeding a PRNG for DP. We hope it serves as a suitable guide for DP practitioners, and includes many lessons learned when implementing seeding for diffprivlib.
Large Vision-Language Models (LVLMs) have recently achieved remarkable success. However, LVLMs are still plagued by the hallucination problem, which limits the practicality in many scenarios. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input, which poses potential risks of substantial consequences. There has been limited work studying hallucination evaluation in LVLMs. In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework. HaELM achieves an approximate 95% performance comparable to ChatGPT and has additional advantages including low cost, reproducibility, privacy preservation and local deployment. Leveraging the HaELM, we evaluate the hallucination in current LVLMs. Furthermore, we analyze the factors contributing to hallucination in LVLMs and offer helpful suggestions to mitigate the hallucination problem. Our training data and human annotation hallucination data will be made public soon.
The Rowhammer vulnerability continues to get worse, with the Rowhammer Threshold (TRH) reducing from 139K activations to 4.8K activations over the last decade. Typical Rowhammer mitigations rely on tracking aggressor rows. The number of possible aggressors increases with lowering thresholds, making it difficult to reliably track such rows in a storage-efficient manner. At lower thresholds, academic trackers such as Graphene require prohibitive SRAM overheads (hundreds of KBs to MB). Recent in-DRAM trackers from industry, such as DSAC-TRR, perform approximate tracking, sacrificing guaranteed protection for reduced storage overheads, leaving DRAM vulnerable to Rowhammer attacks. Ideally, we seek a scalable tracker that tracks securely and precisely, and incurs negligible dedicated SRAM and performance overheads, while still being able to track arbitrarily low thresholds. To that end, we propose START - a Scalable Tracker for Any Rowhammer Threshold. Rather than relying on dedicated SRAM structures, START dynamically repurposes a small fraction the Last-Level Cache (LLC) to store tracking metadata. START is based on the observation that while the memory contains millions of rows, typical workloads touch only a small subset of rows within a refresh period of 64ms, so allocating tracking entries on demand significantly reduces storage. If the application does not access many rows in memory, START does not reserve any LLC capacity. Otherwise, START dynamically uses 1-way, 2-way, or 8-way of the cache set based on demand. START consumes, on average, 9.4% of the LLC capacity to store metadata, which is 5X lower compared to dedicating a counter in LLC for each row in memory. We also propose START-M, a memory-mapped START for large-memory systems. Our designs require only 4KB SRAM for newly added structures and perform within 1% of idealized tracking even at TRH of less than 100.
The emergence of generative Large Language Models (LLMs) emphasizes the need for accurate and efficient prompting approaches. LLMs are often applied in Few-Shot Learning (FSL) contexts, where tasks are executed with minimal training data. FSL has become popular in many Artificial Intelligence (AI) subdomains, including AI for health. Rare diseases affect a small fraction of the population. Rare disease identification from clinical notes inherently requires FSL techniques due to limited data availability. Manual data collection and annotation is both expensive and time-consuming. In this paper, we propose Models-Vote Prompting (MVP), a flexible prompting approach for improving the performance of LLM queries in FSL settings. MVP works by prompting numerous LLMs to perform the same tasks and then conducting a majority vote on the resulting outputs. This method achieves improved results to any one model in the ensemble on one-shot rare disease identification and classification tasks. We also release a novel rare disease dataset for FSL, available to those who signed the MIMIC-IV Data Use Agreement (DUA). Furthermore, in using MVP, each model is prompted multiple times, substantially increasing the time needed for manual annotation, and to address this, we assess the feasibility of using JSON for automating generative LLM evaluation.
We propose superfast (aka sublinear cost) algorithms for two fundamental problems of Matrix Computations, that is, Norm Estimation and Iterative Refinement of Low Rank Approximation (LRA). A superfast algorithm only accesses a small fraction of all entries of an input matrix and cannot accurately estimate their norms or output their close LRA for worst case inputs. Thus, we begin with some well-known algorithms solving these problems in linear or super linear time and then propose superfast modifications, which still promise to succeed for most or a large class of inputs, according to our tests with real world inputs.
The Multidepot Capacitated Vehicle Routing Problem (MCVRP) is a well-known variant of the classic Capacitated Vehicle Routing Problem (CVRP), where we need to route capacitated vehicles located in multiple depots to serve customers' demand such that each vehicle must return to the depot it starts, and the total traveling distance is minimized. There are three variants of MCVRP according to the property of the demand: unit-demand, splittable and unsplittable. We study approximation algorithms for $k$-MCVRP in metric graphs where $k$ is the capacity of each vehicle, and all three versions are APX-hard for any constant $k\geq 3$. Previously, Li and Simchi-Levi proposed a $(2\alpha+1-\alpha/k)$-approximation algorithm for splittable and unit-demand $k$-MCVRP and a $(2\alpha+2-2\alpha/k)$-approximation algorithm for unsplittable $k$-MCVRP, where $\alpha=3/2-10^{-36}$ is the current best approximation ratio for metric TSP. Harks et al. further improved the ratio to 4 for the unsplittable case. We give a $(4-1/1500)$-approximation algorithm for unit-demand and splittable $k$-MCVRP, and a $(4-1/50000)$-approximation algorithm for unsplittable $k$-MCVRP. Furthermore, we give a $(3+\ln2-\max\{\Theta(1/\sqrt{k}),1/9000\})$-approximation algorithm for splittable and unit-demand $k$-MCVRP, and a $(3+\ln2-\Theta(1/\sqrt{k}))$-approximation algorithm for unsplittable $k$-MCVRP under the assumption that the capacity $k$ is a fixed constant. Our results are based on recent progress in approximating CVRP.
It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities.
Model-Based Anomaly Detection has been a successful approach to identify deviations from the expected behavior of Cyber-Physical Production Systems. Since manual creation of these models is a time-consuming process, it is advantageous to learn them from data and represent them in a generic formalism like timed automata. However, these models - and by extension, the detected anomalies - can be challenging to interpret due to a lack of additional information about the system. This paper aims to improve model-based anomaly detection in CPPS by combining the learned timed automaton with a formal knowledge graph about the system. Both the model and the detected anomalies are described in the knowledge graph in order to allow operators an easier interpretation of the model and the detected anomalies. The authors additionally propose an ontology of the necessary concepts. The approach was validated on a five-tank mixing CPPS and was able to formally define both automata model as well as timing anomalies in automata execution.
Multi-modal magnetic resonance imaging (MRI) plays a crucial role in comprehensive disease diagnosis in clinical medicine. However, acquiring certain modalities, such as T2-weighted images (T2WIs), is time-consuming and prone to be with motion artifacts. It negatively impacts subsequent multi-modal image analysis. To address this issue, we propose an end-to-end deep learning framework that utilizes T1-weighted images (T1WIs) as auxiliary modalities to expedite T2WIs' acquisitions. While image pre-processing is capable of mitigating misalignment, improper parameter selection leads to adverse pre-processing effects, requiring iterative experimentation and adjustment. To overcome this shortage, we employ Optimal Transport (OT) to synthesize T2WIs by aligning T1WIs and performing cross-modal synthesis, effectively mitigating spatial misalignment effects. Furthermore, we adopt an alternating iteration framework between the reconstruction task and the cross-modal synthesis task to optimize the final results. Then, we prove that the reconstructed T2WIs and the synthetic T2WIs become closer on the T2 image manifold with iterations increasing, and further illustrate that the improved reconstruction result enhances the synthesis process, whereas the enhanced synthesis result improves the reconstruction process. Finally, experimental results from FastMRI and internal datasets confirm the effectiveness of our method, demonstrating significant improvements in image reconstruction quality even at low sampling rates.
Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at //github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.
Reasoning with knowledge expressed in natural language and Knowledge Bases (KBs) is a major challenge for Artificial Intelligence, with applications in machine reading, dialogue, and question answering. General neural architectures that jointly learn representations and transformations of text are very data-inefficient, and it is hard to analyse their reasoning process. These issues are addressed by end-to-end differentiable reasoning systems such as Neural Theorem Provers (NTPs), although they can only be used with small-scale symbolic KBs. In this paper we first propose Greedy NTPs (GNTPs), an extension to NTPs addressing their complexity and scalability limitations, thus making them applicable to real-world datasets. This result is achieved by dynamically constructing the computation graph of NTPs and including only the most promising proof paths during inference, thus obtaining orders of magnitude more efficient models. Then, we propose a novel approach for jointly reasoning over KBs and textual mentions, by embedding logic facts and natural language sentences in a shared embedding space. We show that GNTPs perform on par with NTPs at a fraction of their cost while achieving competitive link prediction results on large datasets, providing explanations for predictions, and inducing interpretable models. Source code, datasets, and supplementary material are available online at //github.com/uclnlp/gntp.