Effective detection of organizations is essential for fighting crime and maintaining public safety, especially considering the limited human resources and tools to deal with each group that exhibits co-movement patterns. This paper focuses on solving the Network Structure Inference (NSI) challenge. Thus, we introduce two new approaches to detect network structure inferences based on agent trajectories. The first approach is based on the evaluation of graph entropy, while the second considers the quality of clustering indices. To evaluate the effectiveness of the new approaches, we conducted experiments using four scenario simulations based on the animal kingdom, available on the NetLogo platform: Ants, Wolf Sheep Predation, Flocking, and Ant Adaptation. Furthermore, we compare the results obtained with those of an approach previously proposed in the literature, applying all methods to simulations of the NetLogo platform. The results demonstrate that our new detection approaches can more clearly identify the inferences of organizations or networks in the simulated scenarios.
Telling lies and faking emotions is quite common in human-human interactions: though there are risks, in many situations such behaviours provide social benefits. In recent years, there have been many social robots and chatbots that fake emotions or behave deceptively with their users. In this paper, I present a few examples of such robots and chatbots, and analyze their ethical aspects. Three scenarios are presented where some kind of lying or deceptive behaviour might be justified. Then five approaches to deceptive behaviours - no deception, blatant deception, tactful deception, nudging, and self deception - are discussed and their implications are analyzed. I conclude by arguing that we need to develop localized and culture-specific solutions to incorporating deception in social robots and chatbots.
Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations.
Deep networks typically learn concepts via classifiers, which involves setting up a model and training it via gradient descent to fit the concept-labeled data. We will argue instead that learning a concept could be done by looking at its moment statistics matrix to generate a concrete representation or signature of that concept. These signatures can be used to discover structure across the set of concepts and could recursively produce higher-level concepts by learning this structure from those signatures. When the concepts are `intersected', signatures of the concepts can be used to find a common theme across a number of related `intersected' concepts. This process could be used to keep a dictionary of concepts so that inputs could correctly identify and be routed to the set of concepts involved in the (latent) generation of the input.
Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online algorithms. We propose a weighted rolling-validation procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators. Similar to batch cross-validation, it can boost base estimators to achieve a better, adaptive convergence rate. Our theoretical analysis is straightforward, relying mainly on some general statistical stability assumptions. The simulation study underscores the significance of diverging weights in rolling validation in practice and demonstrates its sensitivity even when there is only a slim difference between candidate estimators.
In recent years, DL has developed rapidly, and personalized services are exploring using DL algorithms to improve the performance of the recommendation system. For personalized services, a successful recommendation consists of two parts: attracting users to click the item and users being willing to consume the item. If both tasks need to be predicted at the same time, traditional recommendation systems generally train two independent models. This approach is cumbersome and does not effectively model the relationship between the two subtasks of "click-consumption". Therefore, in order to improve the success rate of recommendation and reduce computational costs, researchers are trying to model multi-task learning. At present, existing multi-task learning models generally adopt hard parameter sharing or soft parameter sharing architecture, but these two architectures each have certain problems. Therefore, in this work, we propose a novel recommendation model based on real recommendation scenarios, Deep Cross network based on RNN for partial parameter sharing (DCRNN). The model has three innovations: 1) It adopts the idea of cross network and uses RNN network to cross-process the features, thereby effectively improves the expressive ability of the model; 2) It innovatively proposes the structure of partial parameter sharing; 3) It can effectively capture the potential correlation between different tasks to optimize the efficiency and methods for learning different tasks.
Today, the security of many domains rely on the use of Machine Learning to detect threats, identify vulnerabilities, and safeguard systems from attacks. Recently, transformer architectures have improved the state-of-the-art performance on a wide range of tasks such as malware detection and network intrusion detection. But, before abandoning current approaches to transformers, it is crucial to understand their properties and implications on cybersecurity applications. In this paper, we evaluate the robustness of transformers to adversarial samples for system defenders (i.e., resiliency to adversarial perturbations generated on different types of architectures) and their adversarial strength for system attackers (i.e., transferability of adversarial samples generated by transformers to other target models). To that effect, we first fine-tune a set of pre-trained transformer, Convolutional Neural Network (CNN), and hybrid (an ensemble of transformer and CNN) models to solve different downstream image-based tasks. Then, we use an attack algorithm to craft 19,367 adversarial examples on each model for each task. The transferability of these adversarial examples is measured by evaluating each set on other models to determine which models offer more adversarial strength, and consequently, more robustness against these attacks. We find that the adversarial examples crafted on transformers offer the highest transferability rate (i.e., 25.7% higher than the average) onto other models. Similarly, adversarial examples crafted on other models have the lowest rate of transferability (i.e., 56.7% lower than the average) onto transformers. Our work emphasizes the importance of studying transformer architectures for attacking and defending models in security domains, and suggests using them as the primary architecture in transfer attack settings.
Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the personas that they aim to simulate, failing to capture the multidimensionality of people and perpetuating stereotypes. To bridge these gaps, we present CoMPosT, a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration. We evaluate the level of caricature in scenarios from existing work on LLM simulations. We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
We explore the application of a new theory of Semantic Information to the well-motivated problem of a resource foraging agent. Semantic information is defined as the subset of correlations, measured via the transfer entropy, between agent $A$ and environment $E$ that is necessary for the agent to maintain its viability $V$. Viability, in turn, is endogenously defined as opposed to the use of exogenous quantities like utility functions. In our model, the forager's movements are determined by its ability to measure, via a sensor, the presence of an individual unit of resource, while the viability function is its expected lifetime. Through counterfactual interventions -- scrambling the correlations between agent and environment via noising the sensor -- we demonstrate the presence of a critical value of the noise parameter, $\eta_c$, above which the forager's expected lifetime is dramatically reduced. On the other hand, for $\eta < \eta_c$ there is little-to-no effect on its ability to survive. We refer to this boundary as the semantic threshold, quantifying the subset of agent-environment correlations that the agent actually needs to maintain its desired state of staying alive. Each bit of information affects the agent's ability to persist both above and below the semantic threshold. Modeling the viability curve and its semantic threshold via forager/environment parameters, we show how the correlations are instantiated. Our work provides a useful model for studies of established agents in terms of semantic information. It also shows that such semantic thresholds may prove useful for understanding the role information plays in allowing systems to become autonomous agents.
In general insurance, claims are often lower-truncated and right-censored because insurance contracts may involve deductibles and maximal covers. Most classical statistical models are not (directly) suited to model lower-truncated and right-censored claims. A surprisingly flexible family of distributions that can cope with lower-truncated and right-censored claims is the class of MBBEFD distributions that originally has been introduced by Bernegger (1997) for reinsurance pricing, but which has not gained much attention outside the reinsurance literature. We derive properties of the class of MBBEFD distributions, and we extend it to a bigger family of distribution functions suitable for modeling lower-truncated and right-censored claims. Interestingly, in general insurance, we mainly rely on unimodal skewed densities, whereas the reinsurance literature typically proposes monotonically decreasing densities within the MBBEFD class.
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.