We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms of Franco-Proven\c{c}al. The corpus comes from field recordings, most of which are noisy, for which only 5 hrs have matching transcriptions, and for which forced alignment is of variable quality. The corpus contains an additional 20 hrs of unlabelled speech. We report baseline results from state-of-the-art multilingual speech foundation models with a best phone error rate of 30.4%, using a pipeline that continues pre-training on the foundation model using the unlabelled set.
This paper delves into an integrated sensing and communication (ISAC) system bolstered by a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS). Within this system, a base station (BS) is equipped with communication and radar capabilities, enabling it to communicate with ground terminals (GTs) and concurrently probe for echo signals from a target of interest. Moreover, to manage interference and improve communication quality, the rate splitting multiple access (RSMA) scheme is incorporated into the system. The signal-to-interference-plus-noise ratio (SINR) of the received sensing echo signals is a measure of sensing performance. We formulate a joint optimization problem of common rates, transmit beamforming at the BS, and passive beamforming vectors of the STAR-RIS. The objective is to maximize sensing SINR while guaranteeing the communication rate requirements for each GT. We present an iterative algorithm to address the non-convex problem by invoking Dinkelbach's transform, semidefinite relaxation (SDR), majorization-minimization, and sequential rank-one constraint relaxation (SROCR) theories. Simulation results manifest that the performance of the studied ISAC network enhanced by the STAR-RIS and RSMA surpasses other benchmarks considerably. The results evidently indicate the superior performance improvement of the ISAC system with the proposed RSMA-based transmission strategy design and the dynamic optimization of both transmission and reflection beamforming at STAR-RIS.
As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.
As Artificial Intelligence (AI) becomes integral to software development, understanding the social and cooperative dynamics that affect AI-driven organizational change is important. Yet, despite AI's rapid progress and influence, the human and cooperative facets of these shifts in software organizations remain relatively less explored. This study uses Behavioral Software Engineering (BSE) as a lens to examine these often-overlooked dimensions of AI transformation. Through a qualitative approach involving ten semi-structured interviews across four organizations that are undergoing AI transformations, we performed a thematic analysis that revealed numerous sub-themes linked to twelve BSE concepts across individual, group, and organizational levels. Since the organizations are at an early stage of transformation we found more emphasis on the individual level. Our findings further reveal six key challenges tied to these BSE aspects that the organizations face during their AI transformation. Aligned with change management literature, we emphasize that effective communication, proactive leadership, and resistance management are essential for successful AI integration. However, we also identify ethical considerations as critical in the AI context-an area largely overlooked in previous research. Furthermore, a narrative analysis illustrates how different roles within an organization experience the AI transition in unique ways. These insights underscore that AI transformation extends beyond technical solutions; it requires a thoughtful approach that balances technological and human factors.
Policy Space Response Oracle (PSRO) with policy population construction has been demonstrated as an effective method for approximating Nash Equilibrium (NE) in zero-sum games. Existing studies have attempted to improve diversity in policy space, primarily by incorporating diversity regularization into the Best Response (BR). However, these methods cause the BR to deviate from maximizing rewards, easily resulting in a population that favors diversity over performance, even when diversity is not always necessary. Consequently, exploitability is difficult to reduce until policies are fully explored, especially in complex games. In this paper, we propose Conflux-PSRO, which fully exploits the diversity of the population by adaptively selecting and training policies at state-level. Specifically, Conflux-PSRO identifies useful policies from the existing population and employs a routing policy to select the most appropriate policies at each decision point, while simultaneously training them to enhance their effectiveness. Compared to the single-policy BR of traditional PSRO and its diversity-improved variants, the BR generated by Conflux-PSRO not only leverages the specialized expertise of diverse policies but also synergistically enhances overall performance. Our experiments on various environments demonstrate that Conflux-PSRO significantly improves the utility of BRs and reduces exploitability compared to existing methods.
Tsetlin Machines (TMs) have emerged as a compelling alternative to conventional deep learning methods, offering notable advantages such as smaller memory footprint, faster inference, fault-tolerant properties, and interpretability. Although various adaptations of TMs have expanded their applicability across diverse domains, a fundamental gap remains in understanding how TMs quantify uncertainty in their predictions. In response, this paper introduces the Probabilistic Tsetlin Machine (PTM) framework, aimed at providing a robust, reliable, and interpretable approach for uncertainty quantification. Unlike the original TM, the PTM learns the probability of staying on each state of each Tsetlin Automaton (TA) across all clauses. These probabilities are updated using the feedback tables that are part of the TM framework: Type I and Type II feedback. During inference, TAs decide their actions by sampling states based on learned probability distributions, akin to Bayesian neural networks when generating weight values. In our experimental analysis, we first illustrate the spread of the probabilities across TA states for the noisy-XOR dataset. Then we evaluate the PTM alongside benchmark models using both simulated and real-world datasets. The experiments on the simulated dataset reveal the PTM's effectiveness in uncertainty quantification, particularly in delineating decision boundaries and identifying regions of high uncertainty. Moreover, when applied to multiclass classification tasks using the Iris dataset, the PTM demonstrates competitive performance in terms of predictive entropy and expected calibration error, showcasing its potential as a reliable tool for uncertainty estimation. Our findings underscore the importance of selecting appropriate models for accurate uncertainty quantification in predictive tasks, with the PTM offering a particularly interpretable and effective solution.
Ethereum, as a representative of Web3, adopts a novel framework called Proposer Builder Separation (PBS) to prevent the centralization of block profits in the hands of institutional Ethereum stakers. Introducing builders to generate blocks based on public transactions, PBS aims to ensure that block profits are distributed among all stakers. Through the auction among builders, only one will win the block in each slot. Ideally, the equilibrium strategy of builders under public information would lead them to bid all block profits. However, builders are now capable of extracting profits from private order flows. In this paper, we explore the effect of PBS with private order flows. Specifically, we propose the asymmetry auction model of MEV-Boost auction. Moreover, we conduct empirical study on Ethereum blocks from January 2023 to May 2024. Our analysis indicates that private order flows contribute to 54.59% of the block value, indicating that different builders will build blocks with different valuations. Interestingly, we find that builders with more private order flows (i.e., higher block valuations) are more likely to win the block, while retain larger proportion of profits. In return, such builders will further attract more private order flows, resulting in a monopolistic market gradually. Our findings reveal that PBS in current stage is unable to balance the profit distribution, which just transits the centralization of block profits from institutional stakers to the monopolistic builder.
Recent advances in Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers. However, LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production. This paper addresses the critical, yet underexplored, issue of license compliance in LLM-generated code by establishing a benchmark to evaluate the ability of LLMs to provide accurate license information for their generated code. To establish this benchmark, we conduct an empirical study to identify a reasonable standard for "striking similarity" that excludes the possibility of independent creation, indicating a copy relationship between the LLM output and certain open-source code. Based on this standard, we propose LiCoEval, to evaluate the license compliance capabilities of LLMs, i.e., the ability to provide accurate license or copyright information when they generate code with striking similarity to already existing copyrighted code. Using LiCoEval, we evaluate 14 popular LLMs, finding that even top-performing LLMs produce a non-negligible proportion (0.88% to 2.01%) of code strikingly similar to existing open-source implementations. Notably, most LLMs fail to provide accurate license information, particularly for code under copyleft licenses. These findings underscore the urgent need to enhance LLM compliance capabilities in code generation tasks. Our study provides a foundation for future research and development to improve license compliance in AI-assisted software development, contributing to both the protection of open-source software copyrights and the mitigation of legal risks for LLM users.
In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: //github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.