We develop a denotational semantics of muLL, a version of propositional Linear Logic with least and greatest fixed points extending David Baelde's propositional muMALL with exponentials. Our general categorical setting is based on the notion of Seely category and on strong functors acting on them. We exhibit two simple instances of this setting. In the first one, which is based on the category of sets and relations, least and greatest fixed points are interpreted in the same way. In the second one, based on a category of sets equipped with a notion of totality (non-uniform totality spaces) and relations preserving them, least and greatest fixed points have distinct interpretations. This latter model shows that muLL enjoys a denotational form of normalization of proofs.
This paper is a review of a particular approach to the method of maximum entropy as a general framework for inference. The discussion emphasizes the pragmatic elements in the derivation. An epistemic notion of information is defined in terms of its relation to the Bayesian beliefs of ideally rational agents. The method of updating from a prior to a posterior probability distribution is designed through an eliminative induction process. The logarithmic relative entropy is singled out as the unique tool for updating that (a) is of universal applicability; (b) that recognizes the value of prior information; and (c) that recognizes the privileged role played by the notion of independence in science. The resulting framework -- the ME method -- can handle arbitrary priors and arbitrary constraints. It includes MaxEnt and Bayes' rule as special cases and, therefore, it unifies entropic and Bayesian methods into a single general inference scheme. The ME method goes beyond the mere selection of a single posterior, but also addresses the question of how much less probable other distributions might be, which provides a direct bridge to the theories of fluctuations and large deviations.
Alignments provide sophisticated diagnostics that pinpoint deviations in a trace with respect to a process model and their severity. However, approaches based on trace alignments use crisp process models as reference and recent probabilistic conformance checking approaches check the degree of conformance of an event log with respect to a stochastic process model instead of finding trace alignments. In this paper, for the first time, we provide a conformance checking approach based on trace alignments using stochastic Workflow nets. Conceptually, this requires to handle the two possibly contrasting forces of the cost of the alignment on the one hand and the likelihood of the model trace with respect to which the alignment is computed on the other.
We consider the problem of allocating $m$ balls into $n$ bins with incomplete information. In the classical two-choice process, a ball first queries the load of $\textit{two}$ randomly chosen bins and is then placed in the least loaded bin. In our setting, each ball also samples two random bins but can only estimate a bin's load by sending $\textit{binary queries}$ of the form "Is the load at least the median?" or "Is the load at least $100$?". For the lightly loaded case $m=O(n)$, one can achieve an $O(\sqrt{\log n/\log \log n})$ maximum load with one query per chosen bin using an oblivious strategy, as shown by Feldheim and Gurel-Gurevich (2018). For the case $m=\Omega(n)$, the authors conjectured that the same strategy achieves a maximum load of $m/n+O(\sqrt{\log n/\log \log n})$. In this work, we disprove this conjecture by showing a lower bound of $m/n+\Omega( \sqrt{\log n})$ for a fixed $m=\Theta(n \sqrt{\log n})$, and a lower bound of $m/n+\Omega(\log n/\log\log n)$ for some $m$ depending on the used strategy. Surprisingly, these lower bounds hold even for any $\textit{adaptive strategy}$ with one query, i.e., queries may depend on the full history of the process. We complement this negative result by proving a positive result for multiple queries. In particular, we show that with only two binary queries per chosen bin, there is an oblivious strategy which ensures a maximum load of $m/n+O(\sqrt{\log n})$ whp for any $m \geq 1$. For any $k=O(\log \log n)$ binary queries, the upper bound on the maximum load improves to $m/n+O(k(\log n)^{1/k})$ whp for any $m \geq 1$. Hence for $k=\Theta(\log\log n)$, we recover the two-choice result up to a constant multiplicative factor, including the heavily loaded case where $m=\Omega(n)$. One novel aspect of our proof techniques is the use of multiple super-exponential potential functions, which might be of use in future work.
The paper studies a probabilistic notion of causes in Markov chains that relies on the counterfactuality principle and the probability-raising property. This notion is motivated by the use of causes for monitoring purposes where the aim is to detect faulty or undesired behaviours before they actually occur. A cause is a set of finite executions of the system after which the probability of the effect exceeds a given threshold. We introduce multiple types of costs that capture the consumption of resources from different perspectives, and study the complexity of computing cost-minimal causes.
We introduce infinitary action logic with exponentiation -- that is, the multiplicative-additive Lambek calculus extended with Kleene star and with a family of subexponential modalities, which allows some of the structural rules (contraction, weakening, permutation). The logic is presented in the form of an infinitary sequent calculus. We prove cut elimination and, in the case where at least one subexponential allows non-local contraction, establish exact complexity boundaries in two senses. First, we show that the derivability problem for this logic is $\Pi_1^1$-complete. Second, we show that the closure ordinal of its derivability operator is $\omega_1^{\mathrm{CK}}$. In the case where no subexponential allows contraction, we show that complexity is the same as for infinitary action logic itself. Namely, the derivability problem in this case is $\Pi^0_1$-complete and the closure ordinal is not greater than $\omega^\omega$.
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of $Zn$, where $n$ is the number of words of the sentence and $Z$ is the number of samples; the larger $Z$, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of $n$. Furthermore, we show that star trees maximize it, and devise a dynamic programming algorithm to retrieve the trees that minimize it.
Statistical tests for dataset shift are susceptible to false alarms: they are sensitive to minor differences where there is in fact adequate sample coverage and predictive performance. We propose instead a robust framework for tests of dataset shift based on outlier scores, D-SOS for short. D-SOS detects adverse shifts and can identify false alarms caused by benign ones. It posits that a new (test) sample is not substantively worse than an old (training) sample, and not that the two are equal. The key idea is to reduce observations to outlier scores and compare contamination rates. Beyond comparing distributions, users can define what worse means in terms of predictive performance and other relevant notions. We show how versatile and practical D-SOS is for a wide range of real and simulated datasets. Unlike tests of equal distribution and of goodness-of-fit, the D-SOS tests are uniquely tailored to serve as robust performance metrics to monitor model drift and dataset shift.
Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.
The Normalized Cut (NCut) objective function, widely used in data clustering and image segmentation, quantifies the cost of graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. However, this bias is so strong that it avoids any singleton partitions, even when vertices are very weakly connected to the rest of the graph. Motivated by the B\"uhler-Hein family of balanced cut costs, we propose the family of Compassionately Conservative Balanced (CCB) Cut costs, which are indexed by a parameter that can be used to strike a compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced. We show that CCB-Cut minimization can be relaxed into an orthogonally constrained $\ell_{\tau}$-minimization problem that coincides with the problem of computing Piecewise Flat Embeddings (PFE) for one particular index value, and we present an algorithm for solving the relaxed problem by iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ). Using images from the BSDS500 database, we show that image segmentation based on CCB-Cut minimization provides better accuracy with respect to ground truth and greater variability in region size than NCut-based image segmentation.