The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. Here, in a large observational study on two banned subreddits, r/The\_Donald and r/fatpeoplehate, we examine how factors associated with the RECRO radicalization framework relate to users' migration decisions. Specifically, we quantify how these factors affect users' decisions to post on fringe platforms and, for those who do, whether they continue posting on the mainstream platform. Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform. Whereas social-level factors, users' connection with the radical community, only affect the propensity to be coactive on both platforms. Overall, our findings pave the way for evidence-based moderation policies, as the decisions to migrate and remain coactive amplify unintended consequences of community bans.
The development of smart city transport systems, including self-driving cars, leads to an increase in the threat of hostile interference in the processes of vehicle control. This interference may disrupt the normal functioning of the transport system, and, if is performed covertly, the system can be negatively affected for a long period of time. This paper develops a simulation stochastic cellular automata model of traffic on a circular two-lane road based on the Sakai-Nishinari-Fukui-Schadschneider (S-NFS) rules. In the presented model, in addition to ordinary vehicles, there are covertly counteracting vehicles; their task is to reduce the quantity indicators (such as traffic flux) of the transport system using special rules of behavior. Three such rules are considered and compared: two lane-changing rules and one slow-down rule. It is shown that such counteracting vehicles can affect the traffic flow, mainly in the region of the maximum of the fundamental diagram, that is, at average values of the vehicle density. In free-flowing traffic or in a traffic jam, the influence of the counteracting vehicle is negligible regardless of its rules of behavior.
Platform trials offer a framework to study multiple interventions in a single trial with the opportunity of opening and closing arms. The use of a common control in platform trials can increase efficiency as compared to individual control arms or separate trials per treatment. However, the need for multiplicity adjustment as a consequence of common controls is currently a controversial debate among researchers, pharmaceutical companies, as well as regulators. We investigate the impact of a common control arm in platform trials on the type one error and power in comparison to what would have been obtained with a platform trial with individual control arms in a simulation study. Furthermore, we evaluate the impact on power in case multiplicity adjustment is required in a platform trial. In both study designs, the family-wise error rate (FWER) is inflated compared to a standard, two-armed randomized controlled trial when no multiplicity adjustment is applied. In case of a common control, the FWER inflation is smaller. In most circumstances, a platform trial with a common control is still beneficial in terms of sample size and power after multiplicity adjustment, whereas in some cases, the platform trial with a common control loses the efficiency gain. Therefore, we further discuss the need for adjustment in terms of a family definition or hypotheses dependencies.
Recent advancements in artificial intelligence (AI) have seen the emergence of smart video surveillance (SVS) in many practical applications, particularly for building safer and more secure communities in our urban environments. Cognitive tasks, such as identifying objects, recognizing actions, and detecting anomalous behaviors, can produce data capable of providing valuable insights to the community through statistical and analytical tools. However, artificially intelligent surveillance systems design requires special considerations for ethical challenges and concerns. The use and storage of personally identifiable information (PII) commonly pose an increased risk to personal privacy. To address these issues, this paper identifies the privacy concerns and requirements needed to address when designing AI-enabled smart video surveillance. Further, we propose the first end-to-end AI-enabled privacy-preserving smart video surveillance system that holistically combines computer vision analytics, statistical data analytics, cloud-native services, and end-user applications. Finally, we propose quantitative and qualitative metrics to evaluate intelligent video surveillance systems. The system shows the 17.8 frame-per-second (FPS) processing in extreme video scenes. However, considering privacy in designing such a system results in preferring the pose-based algorithm to the pixel-based one. This choice resulted in dropping accuracy in both action and anomaly detection tasks. The results drop from 97.48 to 73.72 in anomaly detection and 96 to 83.07 in the action detection task. On average, the latency of the end-to-end system is 36.1 seconds.
Probabilistic predictions can be evaluated through comparisons with observed label frequencies, that is, through the lens of calibration. Recent scholarship on algorithmic fairness has started to look at a growing variety of calibration-based objectives under the name of multi-calibration but has still remained fairly restricted. In this paper, we explore and analyse forms of evaluation through calibration by making explicit the choices involved in designing calibration scores. We organise these into three grouping choices and a choice concerning the agglomeration of group errors. This provides a framework for comparing previously proposed calibration scores and helps to formulate novel ones with desirable mathematical properties. In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions and formally demonstrate advantages of such approaches. We also characterise the space of suitable agglomeration functions for group errors, generalising previously proposed calibration scores. Complementary to such population-level scores, we explore calibration scores at the individual level and analyse their relationship to choices of grouping. We draw on these insights to introduce and axiomatise fairness deviation measures for population-level scores. We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.
State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand when exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions.
Blockchain has proven to be an emerging technology in the digital world, changing the way everyone thinks about data security and bringing efficiency to several industries. It has already been applied to a wide range of applications, from financial services and supply chain management to voting systems and identity verification. An organization must verify its candidates before selecting them. Choosing an unqualified candidate can ruin an organization's reputation. In this digital era, many key fraudulent schemes are rampant in many companies and one of them is certificate fraud. It is possible to validate a candidate's qualifications using traditional methods, but there are drawbacks such as security issues and time consumption. In this paper, a blockchain-based academic certificate authentication system will be used to ensure authenticity and make the assertion of the decentralized system secure. However, the system will generate, authenticate and make corrections on academic certificates. Ultimately, some blockchain-based authentication systems already exist, they can't correct any errors that occur during generation. The proposed system will help in many ways, such as providing a user-friendly university admission, and smooth job hiring process, etc. In conclusion, our proposed system can permanently eradicate certificate forgeries and create and promote trust in society.
Consider the problem of estimating the causal effect of some attribute of a text document; for example: what effect does writing a polite vs. rude email have on response time? To estimate a causal effect from observational data, we need to adjust for confounding aspects of the text that affect both the treatment and outcome -- e.g., the topic or writing level of the text. These confounding aspects are unknown a priori, so it seems natural to adjust for the entirety of the text (e.g., using a transformer). However, causal identification and estimation procedures rely on the assumption of overlap: for all levels of the adjustment variables, there is randomness leftover so that every unit could have (not) received treatment. Since the treatment here is itself an attribute of the text, it is perfectly determined, and overlap is apparently violated. The purpose of this paper is to show how to handle causal identification and obtain robust causal estimation in the presence of apparent overlap violations. In brief, the idea is to use supervised representation learning to produce a data representation that preserves confounding information while eliminating information that is only predictive of the treatment. This representation then suffices for adjustment and can satisfy overlap. Adapting results on non-parametric estimation, we find that this procedure is robust to conditional outcome misestimation, yielding a low-bias estimator with valid uncertainty quantification under weak conditions. Empirical results show strong improvements in bias and uncertainty quantification relative to the natural baseline.
Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.