There has been debate on whether the hazard function should be used for causal inference in time-to-event studies. The main criticism is that there is selection bias because the risk sets beyond the first event time are comprised of subsets of survivors who are no longer balanced in the risk factors, even in the absence of unmeasured confounding, measurement error, and model misspecification. In this short communication we use the potential outcomes framework and the single-world intervention graph to show that there is indeed no selection bias when estimating the average treatment effect, and that the hazard ratio over time can provide a useful interpretation in practical settings.
Software debloating techniques are applied to craft a specialized version of the program based on the user's requirements and remove irrelevant code accordingly. The debloated programs presumably maintain better performance and reduce the attack surface in contrast to the original programs. This work unleashes the effectiveness of applying software debloating techniques on the robustness of machine learning systems in the malware classification domain. We empirically study how an adversarial can leverage software debloating techniques to mislead machine learning malware classification models. We apply software debloating techniques to generate adversarial examples and demonstrate these adversarial examples can reduce the detection rate of VirusTotal. Our study opens new directions for research into adversarial machine learning not only in malware detection/classification but also in other software domains.
Lawvere showed that generalised metric spaces are categories enriched over $[0, \infty]$, the quantale of the positive extended reals. The statement of enrichment is a quantitative analogue of being a preorder. Towards seeking a logic for quantitative metric reasoning, we investigate three $[0,\infty]$-valued propositional logics over the Lawvere quantale. The basic logical connectives shared by all three logics are those that can be interpreted in any quantale, viz finite conjunctions and disjunctions, tensor (addition for the Lawvere quantale) and linear implication (here a truncated subtraction); to these we add, in turn, the constant $1$ to express integer values, and scalar multiplication by a non-negative real to express general affine combinations. Quantitative equational logic can be interpreted in the third logic if we allow inference systems instead of axiomatic systems. For each of these logics we develop a natural deduction system which we prove to be decidably complete w.r.t. the quantale-valued semantics. The heart of the completeness proof makes use of the Motzkin transposition theorem. Consistency is also decidable; the proof makes use of Fourier-Motzkin elimination of linear inequalities. Strong completeness does not hold in general, even (as is known) for theories over finitely-many propositional variables; indeed even an approximate form of strong completeness in the sense of Pavelka or Ben Yaacov -- provability up to arbitrary precision -- does not hold. However, we can show it for theories axiomatized by a (not necessarily finite) set of judgements in normal form over a finite set of propositional variables when we restrict to models that do not map variables to $\infty$; the proof uses Hurwicz's general form of the Farkas' Lemma.
Clones of operations of arity $\omega$ (referred to as $\omega$-operations) have been employed by Neumann to represent varieties of infinitary algebras defined by operations of at most arity $\omega$. More recently, clone algebras have been introduced to study clones of functions, including $\omega$-operations, within the framework of one-sorted universal algebra. Additionally, polymorphisms of arity $\omega$, which are $\omega$-operations preserving the relations of a given first-order structure, have recently been used to establish model theory results with applications in the field of complexity of CSP problems. In this paper, we undertake a topological and algebraic study of polymorphisms of arity $\omega$ and their corresponding invariant relations. Given a Boolean ideal $X$ on the set $A^\omega$, we propose a method to endow the set of $\omega$-operations on $A$ with a topology, which we refer to as $X$-topology. Notably, the topology of pointwise convergence can be retrieved as a special case of this approach. Polymorphisms and invariant relations are then defined parametrically, with respect to the $X$-topology. We characterise the $X$-closed clones of $\omega$-operations in terms of $Pol^\omega$-$Inv^\omega$ and present a method to relate $Inv^\omega$-$Pol^\omega$ to the classical (finitary) $Inv$-$Pol$.
Functional encryption is a powerful paradigm for public-key encryption which allows for controlled access to encrypted data. This primitive is generally impossible in the standard setting so we investigate possibilities in the bounded quantum storage model (BQSM) and the bounded classical storage model (BCSM). In these models, ciphertexts potentially disappear which nullifies impossibility results and allows us to obtain positive outcomes. Firstly, in the BQSM, we construct information-theoretically secure functional encryption with $\texttt{q}=O(\sqrt{\texttt{s}/\texttt{r}})$ where $\texttt{r}$ can be set to any value less than $\texttt{s}$. Here $\texttt{r}$ denotes the number of times that an adversary is restricted to $\texttt{s}$--qubits of quantum memory in the protocol and $\texttt{q}$ denotes the required quantum memory to run the protocol honestly. We then show that our scheme is optimal by proving that it is impossible to attain information-theoretically secure functional encryption with $\texttt{q} < \sqrt{\texttt{s}/\texttt{r}}$. However, by assuming the existence of post-quantum one-way functions, we can do far better and achieve functional encryption with classical keys and with $\texttt{q}=0$ and $\texttt{r}=1$. Secondly, in the BCSM, we construct $(O(\texttt{n}),\texttt{n}^2)$ functional encryption assuming the existence of $(\texttt{n},\texttt{n}^2)$ virtual weak grey-box obfuscation. Here, the pair $(\texttt{n},\texttt{n}^2)$ indicates the required memory to run honestly and the needed memory to break security, respectively. This memory gap is optimal and the assumption is minimal. In particular, we also construct $(O(\texttt{n}),\texttt{n}^2)$ virtual weak grey-box obfuscation assuming $(\texttt{n},\texttt{n}^2)$ functional encryption.
In matched observational studies, the inferred causal conclusions pretending that matching has taken into account all confounding can be sensitive to unmeasured confounding. In such cases, a sensitivity analysis is often conducted, which investigates whether the observed association between treatment and outcome is due to effects caused by the treatment or it is due to hidden confounding. In general, a sensitivity analysis tries to infer the minimum amount of hidden biases needed in order to explain away the observed association between treatment and outcome, assuming that the treatment has no effect. If the needed bias is large, then the treatment is likely to have significant effects. The Rosenbaum sensitivity analysis is a modern approach for conducting sensitivity analysis for matched observational studies. It investigates what magnitude the maximum of the hidden biases from all matched sets needs to be in order to explain away the observed association, assuming that the treatment has no effect. However, such a sensitivity analysis can be overly conservative and pessimistic, especially when the investigators believe that some matched sets may have exceptionally large hidden biases. In this paper, we generalize Rosenbaum's framework to conduct sensitivity analysis on quantiles of hidden biases from all matched sets, which are more robust than the maximum. Moreover, we demonstrate that the proposed sensitivity analysis on all quantiles of hidden biases is simultaneously valid and is thus a free lunch added to the conventional sensitivity analysis. The proposed approach works for general outcomes, general matched studies and general test statistics. Finally, we demonstrate that the proposed sensitivity analysis also works for bounded null hypotheses as long as the test statistic satisfies certain properties. An R package implementing the proposed method is also available online.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.