Mathematics is a limited component of solutions to real-world problems, as it expresses only what is expected to be true if all our assumptions are correct, including implicit assumptions that are omnipresent and often incorrect. Statistical methods are rife with implicit assumptions whose violation can be life-threatening when results from them are used to set policy. Among them are that there is human equipoise or unbiasedness in data generation, management, analysis, and reporting. These assumptions correspond to levels of cooperation, competence, neutrality, and integrity that are absent more often than we would like to believe. Given this harsh reality, we should ask what meaning, if any, we can assign to the P-values, 'statistical significance' declarations, 'confidence' intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data. By themselves, P-values and CI do not test any hypothesis, nor do they measure the significance of results or the confidence we should have in them. The sense otherwise is an ongoing cultural error perpetuated by large segments of the statistical and research community via misleading terminology. So-called 'inferential' statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations.
Computational models can advance affective science by shedding light onto the interplay between cognition and emotion from an information processing point of view. We propose a computational model of emotion that integrates reinforcement learning (RL) and appraisal theory, establishing a formal relationship between reward processing, goal-directed task learning, cognitive appraisal and emotional experiences. The model achieves this by formalizing evaluative checks from the component process model (CPM) in terms of temporal difference learning updates. We formalized novelty, goal relevance, goal conduciveness, and power. The formalization is task independent and can be applied to any task that can be represented as a Markov decision problem (MDP) and solved using RL. We investigated to what extent CPM-RL enables simulation of emotional responses cased by interactive task events. We evaluate the model by predicting a range of human emotions based on a series of vignette studies, highlighting its potential in improving our understanding of the role of reward processing in affective experiences.
A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.
When perceiving the world from multiple viewpoints, humans have the ability to reason about the complete objects in a compositional manner even when an object is completely occluded from certain viewpoints. Meanwhile, humans are able to imagine novel views after observing multiple viewpoints. Recent remarkable advances in multi-view object-centric learning still leaves some unresolved problems: 1) The shapes of partially or completely occluded objects can not be well reconstructed. 2) The novel viewpoint prediction depends on expensive viewpoint annotations rather than implicit rules in view representations. In this paper, we introduce a time-conditioned generative model for videos. To reconstruct the complete shape of an object accurately, we enhance the disentanglement between the latent representations of objects and views, where the latent representations of time-conditioned views are jointly inferred with a Transformer and then are input to a sequential extension of Slot Attention to learn object-centric representations. In addition, Gaussian processes are employed as priors of view latent variables for video generation and novel-view prediction without viewpoint annotations. Experiments on multiple datasets demonstrate that the proposed model can make object-centric video decomposition, reconstruct the complete shapes of occluded objects, and make novel-view predictions.
In this paper, we study a spline collocation method for a numerical solution to the optimal transport problem We mainly solve the \MAE with the second boundary condition numerically by proposing a center matching algorithm. We prove a pointwise convergence of our iterative algorithm under the assumption the boundedness of spline iterates. We use the \MAE with Dirichlet boundary condition and some known solutions to the \MAE with second boundary condition to demonstrate the effectiveness of our algorithm. Then we use our method to solve some real-life problems. One application problem is to use the optimal transportation for the conversion of fisheye view images into standard rectangular images.
To achieve a highly agile and flexible production, it is envisioned that industrial production systems gradually become more decentralized, interconnected, and intelligent. Within this vision, production assets collaborate with each other, exhibiting a high degree of autonomy. Furthermore, knowledge about individual production assets is readily available throughout their entire life-cycles. To realize this vision, adequate use of information technology is required. Two commonly applied software paradigms in this context are Software Agents (referred to as Agents) and Digital Twins (DTs). This work presents a systematic comparison of Agents and DTs in industrial applications. The goal of the study is to determine the differences, similarities, and potential synergies between the two paradigms. The comparison is based on the purposes for which Agents and DTs are applied, the properties and capabilities exhibited by these software paradigms, and how they can be allocated within the Reference Architecture Model Industry 4.0. The comparison reveals that Agents are commonly employed in the collaborative planning and execution of production processes, while DTs typically play a more passive role in monitoring production resources and processing information. Although these observations imply characteristic sets of capabilities and properties for both Agents and DTs, a clear and definitive distinction between the two paradigms cannot be made. Instead, the analysis indicates that production assets utilizing a combination of Agents and DTs would demonstrate high degrees of intelligence, autonomy, sociability, and fidelity. To achieve this, further standardization is required, particularly in the field of DTs.
Ambiguous questions persist in open-domain question answering, because formulating a precise question with a unique answer is often challenging. Previously, Min et al. (2020) have tackled this issue by generating disambiguated questions for all possible interpretations of the ambiguous question. This can be effective, but not ideal for providing an answer to the user. Instead, we propose to ask a clarification question, where the user's response will help identify the interpretation that best aligns with the user's intention. We first present CAMBIGNQ, a dataset consisting of 5,654 ambiguous questions, each with relevant passages, possible answers, and a clarification question. The clarification questions were efficiently created by generating them using InstructGPT and manually revising them as necessary. We then define a pipeline of tasks and design appropriate evaluation metrics. Lastly, we achieve 61.3 F1 on ambiguity detection and 40.5 F1 on clarification-based QA, providing strong baselines for future work.
We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward. Motivated by the fair resource allocation literature, we model this as an expected welfare maximization problem, for some non-linear fair welfare function of the vector of long-term cumulative rewards. One canonical example of such a function is the Nash Social Welfare, or geometric mean, the log transform of which is also known as the Proportional Fairness objective. We show that even approximately optimal optimization of the expected Nash Social Welfare is computationally intractable even in the tabular case. Nevertheless, we provide a novel adaptation of Q-learning that combines non-linear scalarized learning updates and non-stationary action selection to learn effective policies for optimizing nonlinear welfare functions. We show that our algorithm is provably convergent, and we demonstrate experimentally that our approach outperforms techniques based on linear scalarization, mixtures of optimal linear scalarizations, or stationary action selection for the Nash Social Welfare Objective.
Australia is a leading AI nation with strong allies and partnerships. Australia has prioritised robotics, AI, and autonomous systems to develop sovereign capability for the military. Australia commits to Article 36 reviews of all new means and methods of warfare to ensure weapons and weapons systems are operated within acceptable systems of control. Additionally, Australia has undergone significant reviews of the risks of AI to human rights and within intelligence organisations and has committed to producing ethics guidelines and frameworks in Security and Defence. Australia is committed to OECD's values-based principles for the responsible stewardship of trustworthy AI as well as adopting a set of National AI ethics principles. While Australia has not adopted an AI governance framework specifically for Defence; Defence Science has published 'A Method for Ethical AI in Defence' (MEAID) technical report which includes a framework and pragmatic tools for managing ethical and legal risks for military applications of AI.
The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this paper, we present a comprehensive survey of the existing research in the area of MMS.
Verifiability is one of the core editing principles in Wikipedia, where editors are encouraged to provide citations for the added statements. Statements can be any arbitrary piece of text, ranging from a sentence up to a paragraph. However, in many cases, citations are either outdated, missing, or link to non-existing references (e.g. dead URL, moved content etc.). In total, 20\% of the cases such citations refer to news articles and represent the second most cited source. Even in cases where citations are provided, there are no explicit indicators for the span of a citation for a given piece of text. In addition to issues related with the verifiability principle, many Wikipedia entity pages are incomplete, with relevant information that is already available in online news sources missing. Even for the already existing citations, there is often a delay between the news publication time and the reference time. In this thesis, we address the aforementioned issues and propose automated approaches that enforce the verifiability principle in Wikipedia, and suggest relevant and missing news references for further enriching Wikipedia entity pages.