As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bugs, which, due to their wide prevalence in concurrent programs, have led to severe failures and potential security exploits. Over the past two decades, numerous researchers have dedicated their efforts to unveiling, detecting, mitigating, and preventing these bugs, with the last decade witnessing a surge in research within this domain. Among the spectrum of concurrency bugs, data races or race condition vulnerabilities stand out as the most prevalent, accounting for a staggering 80\% of all concurrency bugs. This survey paper is focused on the realm of race condition bug detectors. We systematically categorize these detectors based on the diverse methodologies they employ. Additionally, we delve into the techniques and algorithms associated with race detection, tracing the evolution of this field over time. Furthermore, we shed light on the application of fuzzing techniques in the detection of race condition vulnerabilities. By reviewing these detectors and their static analyses, we draw conclusions and outline potential future research directions, including enhancing accuracy, performance, applicability, and comprehensiveness in race condition vulnerability detection.
Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and PaLI-3. Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance $-$ a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization from language, and targeted challenge sets that probe properties such as hallucination; evaluations that provide calibrated, fine-grained insight into a VLM's capabilities. Second, we rigorously investigate VLMs along key design axes, including pretrained visual representations and quantifying the tradeoffs of using base vs. instruct-tuned language models, amongst others. We couple our analysis with three resource contributions: (1) a unified framework for evaluating VLMs, (2) optimized, flexible code for VLM training, and (3) checkpoints for all models, including a family of VLMs at the 7-13B scale that strictly outperform InstructBLIP and LLaVa v1.5, the state-of-the-art in open-source VLMs.
Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequate in offering a rigorous evaluation of the full extent of reasoning abilities that LLMs are capable of achieving. They are also prone to the risk of overfitting, as these benchmarks, being publicly accessible and static, allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, our research introduces a new benchmark, named NPHardEval. This benchmark is designed to evaluate the reasoning abilities of LLMs across a broad spectrum of 900 algorithmic questions, extending up to the NP-Hard complexity class. These questions are meticulously chosen to represent a wide range of complexity class below the NP-hard complexity class, offering a rigorous measure of the reasoning ability of LLMs. Through this study, we shed light on the current state of reasoning in LLMs, providing an objective and rigorous perspective through the comparison of LLMs' performance across complex classes. Moreover, this benchmark is designed with a dynamic update mechanism, where the datapoints are refreshed on a monthly basis. Such regular updates play a crucial role in mitigating the risk of LLMs overfitting to the benchmark, promoting a more accurate and reliable assessment of their reasoning capabilities. The benchmark dataset and code of NPHardEval are available at //github.com/casmlab/NPHardEval.
The aim of this workshop is to bring together experts working on open-domain dialogue research. In this speedily advancing research area many challenges still exist, such as learning information from conversations, engaging in realistic and convincing simulation of human intelligence and reasoning. SCI-CHAT follows previous workshops on open domain dialogue but with a focus on the simulation of intelligent conversation as judged in a live human evaluation. Models aim to include the ability to follow a challenging topic over a multi-turn conversation, while positing, refuting and reasoning over arguments. The workshop included both a research track and shared task. The main goal of this paper is to provide an overview of the shared task and a link to an additional paper that will include an in depth analysis of the shared task results following presentation at the workshop.
Network games provide a framework to study strategic decision making processes that are governed by structured interdependencies among agents. However, existing models do not account for environments in which agents simultaneously interact over multiple networks, or when agents operate over multiple action dimensions. In this paper, we propose new models of multiplex network games to capture the different modalities of interactions among strategic agents, and multilayer network games to capture their interactions over multiple action dimensions. We explore how the properties of the constituent networks of a multiplex/multilayer network can undermine or support the existence, uniqueness, and stability of the game's Nash equilibria. Notably, we highlight that both the largest and smallest eigenvalues of the constituent networks (reflecting their connectivity and two-sidedness, respectively) are instrumental in determining the uniqueness of the multiplex/multilayer network game's equilibrium. Together, our findings shed light on the reasons for the fragility of equilibria when agents interact over networks of networks, and point out potential interventions to alleviate them.
Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of this work, we propose the use of textual attention to adapt image quality assessment to the fusion task. To facilitate the implementation of the proposed text-guided fusion paradigm, and its adoption by the wider research community, we release a text-annotated image fusion dataset IVT. Extensive experiments demonstrate that our approach (TextFusion) consistently outperforms traditional appearance-based fusion methods. Our code and dataset will be publicly available at //github.com/AWCXV/TextFusion.
Recent advancements in synthetic speech generation have led to the creation of forged audio data that are almost indistinguishable from real speech. This phenomenon poses a new challenge for the multimedia forensics community, as the misuse of synthetic media can potentially cause adverse consequences. Several methods have been proposed in the literature to mitigate potential risks and detect synthetic speech, mainly focusing on the analysis of the speech itself. However, recent studies have revealed that the most crucial frequency bands for detection lie in the highest ranges (above 6000 Hz), which do not include any speech content. In this work, we extensively explore this aspect and investigate whether synthetic speech detection can be performed by focusing only on the background component of the signal while disregarding its verbal content. Our findings indicate that the speech component is not the predominant factor in performing synthetic speech detection. These insights provide valuable guidance for the development of new synthetic speech detectors and their interpretability, together with some considerations on the existing work in the audio forensics field.
The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations. This transformation seamlessly enables the execution of various DNN models using a single General-Purpose Matrix Multiplication (GEMM) accelerator. Extensive experimental results spanning different DNN models demonstrate that our approach preserves network accuracy while providing both generality and application-specific levels of computation efficiency. This allows a broad spectrum of DNN models to be executed using a single GEMM accelerator, eliminating the need for additional special function units.
Given the generational gap in available hardware between lay practitioners and the most endowed institutions, LLMs are becoming increasingly inaccessible as they grow in size. Whilst many approaches have been proposed to compress LLMs to make their resource consumption manageable, these methods themselves tend to be resource intensive, putting them out of the reach of the very user groups they target. In this work, we explore the problem of structured pruning of LLMs using only forward passes. We seek to empower practitioners to prune models so large that their available hardware has just enough memory to run inference. We develop Bonsai, a gradient-free, perturbative pruning method capable of delivering small, fast, and accurate pruned models. We observe that Bonsai outputs pruned models that (i) outperform those generated by more expensive gradient-based structured pruning methods, and (ii) are twice as fast (with comparable accuracy) as those generated by semi-structured pruning methods requiring comparable resources as Bonsai. We also leverage Bonsai to produce a new sub-2B model using a single A6000 that yields state-of-the-art performance on 4/6 tasks on the Huggingface Open LLM leaderboard.
Driven by the rapid ascent of artificial intelligence (AI), organizations are at the epicenter of a seismic shift, facing a crucial question: How can AI be successfully integrated into existing operations? To help answer it, manage expectations and mitigate frustration, this article introduces Computational Management, a systematic approach to task automation for enhancing the ability of organizations to harness AI's potential within existing workflows. Computational Management acts as a bridge between the strategic insights of management science with the analytical rigor of computational thinking. The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow. Such procedures focus on task (re)formulation, on the assessment of the automation potential of tasks, on the completion of task specification templates for AI selection and adaptation. Included in the article there are manual and automated methods, with prompt suggestions for publicly available LLMs, to complete these three procedures. The first procedure, task (re)formulation, focuses on breaking down work activities into basic units, so they can be completed by one agent, involve a single well-defined action, and produce a distinct outcome. The second, allows the assessment of the granular task and its suitability for automation, using the Task Automation Index to rank tasks based on whether they have standardized input, well-defined rules, repetitiveness, data dependency, and objective outputs. The third, focuses on a task specification template which details information on 16 critical components of tasks, and can be used as a checklist to select or adapt the most suitable AI solution for integration into existing workflows. Computational Management provides a roadmap and a toolkit for humans and AI to thrive together, while enhancing organizational efficiency and innovation.
Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as \emph{entity alignment} (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200,000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder -- Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Sample Mining Loss to smoothly select hard negative samples with reduced loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1,100 seconds, at least 10* faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%.