A distributed system is permissionless when participants can join and leave the network without permission from a central authority. Many modern distributed systems are naturally permissionless, in the sense that a central permissioning authority would defeat their design purpose: this includes blockchains, filesharing protocols, some voting systems, and more. By their permissionless nature, such systems are heterogeneous: participants may only have a partial view of the system, and they may also have different goals and beliefs. Thus, the traditional notion of consensus -- i.e. system-wide agreement -- may not be adequate, and we may need to generalise it. This is a challenge: how should we understand what heterogeneous consensus is; what mathematical framework might this require; and how can we use this to build understanding and mathematical models of robust, effective, and secure permissionless systems in practice? We analyse heterogeneous consensus using semitopology as a framework. This is like topology, but without the restriction that intersections of opens be open. Semitopologies have a rich theory which is related to topology, but with its own distinct character and mathematics. We introduce novel well-behavedness conditions, including an anti-Hausdorff property and a new notion of `topen set', and we show how these structures relate to consensus. We give a restriction of semitopologies to witness semitopologies, which are an algorithmically tractable subclass corresponding to Horn clause theories, having particularly good mathematical properties. We introduce and study several other basic notions that are specific and novel to semitopologies, and study how known quantities in topology, such as dense subsets and closures, display interesting and useful new behaviour in this new semitopological context.
The technical landscape of clinical machine learning is shifting in ways that destabilize pervasive assumptions about the nature and causes of algorithmic bias. On one hand, the dominant paradigm in clinical machine learning is narrow in the sense that models are trained on biomedical datasets for particular clinical tasks such as diagnosis and treatment recommendation. On the other hand, the emerging paradigm is generalist in the sense that general-purpose language models such as Google's BERT and PaLM are increasingly being adapted for clinical use cases via prompting or fine-tuning on biomedical datasets. Many of these next-generation models provide substantial performance gains over prior clinical models, but at the same time introduce novel kinds of algorithmic bias and complicate the explanatory relationship between algorithmic biases and biases in training data. This paper articulates how and in what respects biases in generalist models differ from biases in prior clinical models, and draws out practical recommendations for algorithmic bias mitigation.
With the widespread attention and application of artificial intelligence (AI) and blockchain technologies, privacy protection techniques arising from their integration are of notable significance. In addition to protecting privacy of individuals, these techniques also guarantee security and dependability of data. This paper initially presents an overview of AI and blockchain, summarizing their combination along with derived privacy protection technologies. It then explores specific application scenarios in data encryption, de-identification, multi-tier distributed ledgers, and k-anonymity methods. Moreover, the paper evaluates five critical aspects of AI-blockchain-integration privacy protection systems, including authorization management, access control, data protection, network security, and scalability. Furthermore, it analyzes the deficiencies and their actual cause, offering corresponding suggestions. This research also classifies and summarizes privacy protection techniques based on AI-blockchain application scenarios and technical schemes. In conclusion, this paper outlines the future directions of privacy protection technologies emerging from AI and blockchain integration, including enhancing efficiency and security to achieve a more comprehensive privacy protection of privacy.
Using observational data to learn causal relationships is essential when randomized experiments are not possible, such as in healthcare. Discovering causal relationships in time-series health data is even more challenging when relationships change over the course of a disease, such as medications that are most effective early on or for individuals with severe disease. Stage variables such as weeks of pregnancy, disease stages, or biomarkers like HbA1c, can influence what causal relationships are true for a patient. However, causal inference within each stage is often not possible due to limited amounts of data, and combining all data risks incorrect or missed inferences. To address this, we propose Causal Discovery with Stage Variables (CDSV), which uses stage variables to reweight data from multiple time-series while accounting for different causal relationships in each stage. In simulated data, CDSV discovers more causes with fewer false discoveries compared to baselines, in eICU it has a lower FDR than baselines, and in MIMIC-III it discovers more clinically relevant causes of high blood pressure.
This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used \v{C}ech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persistence diagrams of \v{C}ech complexes to be privatized. As an alternative, we show that the persistence diagram obtained by the $L^1$-distance to measure (DTM) has sensitivity $O(1/n)$. Based on the sensitivity analysis, we propose using the exponential mechanism whose utility function is defined in terms of the bottleneck distance of the $L^1$-DTM persistence diagrams. We also derive upper and lower bounds of the accuracy of our privacy mechanism; the obtained bounds indicate that the privacy error of our mechanism is near-optimal. We demonstrate the performance of our privatized persistence diagrams through simulations as well as on a real dataset tracking human movement.
Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability. When a fault occurs, only the measure values within specific attribute combinations are abnormal. Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data. This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze. We propose a generic property of root cause for multi-dimensional data, generalized ripple effect (GRE). Based on it, we propose a novel probabilistic cluster method and a robust heuristic search method. Moreover, we identify the importance of determining external root causes and propose an effective method for the first time in literature. Our experiments on two real-world datasets with 5400 faults show that the F1-score of PSqueeze outperforms baselines by 32.89%, while the localization time is around 10 seconds across all cases. The F1-score in determining external root causes of PSqueeze achieves 0.90. Furthermore, case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.
Ecological momentary assessment (EMA) data have a broad base of application in the study of time trends and relations. In EMA studies, there are a number of design considerations which influence the analysis of the data. One general modeling framework is particularly well-suited for these analyses: state-space modeling. Here, we present the state-space modeling framework with recommendations for the considerations that go into modeling EMA data. These recommendations can account for the issues that come up in EMA data analysis such as idiographic versus nomothetic modeling, missing data, and stationary versus non-stationary data. In addition, we suggest R packages in order to implement these recommendations in practice. Overall, well-designed EMA studies offer opportunities for researchers to handle the momentary minutiae in their assessment of psychological phenomena.
In the future 6G integrated sensing and communication (ISAC) cellular systems, networked sensing is a promising technique that can leverage the cooperation among the base stations (BSs) to perform high-resolution localization. However, a dense deployment of BSs to fully reap the networked sensing gain is not a cost-efficient solution in practice. Motivated by the advance in the intelligent reflecting surface (IRS) technology for 6G communication, this paper examines the feasibility of deploying the low-cost IRSs to enhance the anchor density for networked sensing. Specifically, we propose a novel heterogeneous networked sensing architecture, which consists of both the active anchors, i.e., the BSs, and the passive anchors, i.e., the IRSs. Under this framework, the BSs emit the orthogonal frequency division multiplexing (OFDM) communication signals in the downlink for localizing the targets based on their echoes reflected via/not via the IRSs. However, there are two challenges for using passive anchors in localization. First, it is impossible to utilize the round-trip signal between a passive IRS and a passive target for estimating their distance. Second, before localizing a target, we do not know which IRS is closest to it and serves as its anchor. In this paper, we show that the distance between a target and its associated IRS can be indirectly estimated based on the length of the BS-target-BS path and the BS-target-IRS-BS path. Moreover, we propose an efficient data association method to match each target to its associated IRS. Numerical results are given to validate the feasibility and effectiveness of our proposed heterogeneous networked sensing architecture with both active and passive anchors.
Federated learning (FL) has been proposed to protect data privacy and virtually assemble the isolated data silos by cooperatively training models among organizations without breaching privacy and security. However, FL faces heterogeneity from various aspects, including data space, statistical, and system heterogeneity. For example, collaborative organizations without conflict of interest often come from different areas and have heterogeneous data from different feature spaces. Participants may also want to train heterogeneous personalized local models due to non-IID and imbalanced data distribution and various resource-constrained devices. Therefore, heterogeneous FL is proposed to address the problem of heterogeneity in FL. In this survey, we comprehensively investigate the domain of heterogeneous FL in terms of data space, statistical, system, and model heterogeneity. We first give an overview of FL, including its definition and categorization. Then, We propose a precise taxonomy of heterogeneous FL settings for each type of heterogeneity according to the problem setting and learning objective. We also investigate the transfer learning methodologies to tackle the heterogeneity in FL. We further present the applications of heterogeneous FL. Finally, we highlight the challenges and opportunities and envision promising future research directions toward new framework design and trustworthy approaches.
Unmanned aerial vehicle (UAV) swarm enabled edge computing is envisioned to be promising in the sixth generation wireless communication networks due to their wide application sensories and flexible deployment. However, most of the existing works focus on edge computing enabled by a single or a small scale UAVs, which are very different from UAV swarm-enabled edge computing. In order to facilitate the practical applications of UAV swarm-enabled edge computing, the state of the art research is presented in this article. The potential applications, architectures and implementation considerations are illustrated. Moreover, the promising enabling technologies for UAV swarm-enabled edge computing are discussed. Furthermore, we outline challenges and open issues in order to shed light on the future research directions.