Angular Minkowski $p$-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski $p$-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski $p$-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter $p$, the dimensionality $m$ of the dataset, the number of neighbours $k$, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski $p$-distance with suitable values for $p$ than with classical cosine dissimilarity.
We revisit the topic of power-free morphisms, focusing on the properties of the class of complementary binary morphisms. Such morphisms map binary letters 0 and 1 to complementary words. We prove that every prefix of the famous Thue-Morse word $\mathbf{t}$ gives a complementary morphism that is $3^+$-free and then $\alpha$-free for any real $\alpha>3$. We also describe the lengths of all prefixes of $\mathbf{t}$ that give cubefree complementary morphisms by a 4-state binary finite automaton. Next we show that cubefree complementary morphisms of length $k$ exist for all $k\ne\{3,6\}$. Moreover, if $k$ is not representable as $3\cdot2^n$, then the images of letters can be chosen to be factors of $\mathbf{t}$. In addition to more traditional techniques of combinatorics on words, we also rely on the Walnut theorem-prover. Its use and limitations are discussed.
For a given set of points in a metric space and an integer $k$, we seek to partition the given points into $k$ clusters. For each computed cluster, one typically defines one point as the center of the cluster. A natural objective is to minimize the sum of the cluster center's radii, where we assign the smallest radius $r$ to each center such that each point in the cluster is at a distance of at most $r$ from the center. The best-known polynomial time approximation ratio for this problem is $3.389$. In the setting with outliers, i.e., we are given an integer $m$ and allow up to $m$ points that are not in any cluster, the best-known approximation factor is $12.365$. In this paper, we improve both approximation ratios to $3+\epsilon$. Our algorithms are primal-dual algorithms that use fundamentally new ideas to compute solutions and to guarantee the claimed approximation ratios. For example, we replace the classical binary search to find the best value of a Lagrangian multiplier $\lambda$ by a primal-dual routine in which $\lambda$ is a variable that is raised. Also, we show that for each connected component due to almost tight dual constraints, we can find one single cluster that covers all its points and we bound its cost via a new primal-dual analysis. We remark that our approximation factor of $3+\epsilon$ is a natural limit for the known approaches in the literature. Then, we extend our results to the setting of lower bounds. There are algorithms known for the case that for each point $i$ there is a lower bound $L_{i}$, stating that we need to assign at least $L_{i}$ clients to $i$ if $i$ is a cluster center. For this setting, there is a $ 3.83$ approximation if outliers are not allowed and a ${12.365}$-approximation with outliers. We improve both ratios to $3.5 + \epsilon$ and, at the same time, generalize the type of allowed lower bounds.
Two Latin squares of order $n$ are $r$-orthogonal if, when superimposed, there are exactly $r$ distinct ordered pairs. The spectrum of all values of $r$ for Latin squares of order $n$ is known. A Latin square $A$ of order $n$ is $r$-self-orthogonal if $A$ and its transpose are $r$-orthogonal. The spectrum of all values of $r$ is known for all orders $n\ne 14$. We develop randomized algorithms for computing pairs of $r$-orthogonal Latin squares of order $n$ and algorithms for computing $r$-self-orthogonal Latin squares of order $n$.
We consider the general problem of Bayesian binary regression and we introduce a new class of distributions, the Perturbed Unified Skew Normal (pSUN, henceforth), which generalizes the Unified Skew-Normal (SUN) class. We show that the new class is conjugate to any binary regression model, provided that the link function may be expressed as a scale mixture of Gaussian densities. We discuss in detail the popular logit case, and we show that, when a logistic regression model is combined with a Gaussian prior, posterior summaries such as cumulants and normalizing constants can be easily obtained through the use of an importance sampling approach, opening the way to straightforward variable selection procedures. For more general priors, the proposed methodology is based on a simple Gibbs sampler algorithm. We also claim that, in the p > n case, the proposed methodology shows better performances - both in terms of mixing and accuracy - compared to the existing methods. We illustrate the performance through several simulation studies and two data analyses.
The development of cubical type theory inspired the idea of "extension types" which has been found to have applications in other type theories that are unrelated to homotopy type theory or cubical type theory. This article describes these applications, including on records, metaprogramming, controlling unfolding, and some more exotic ones.
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
We develop a method to compute the $H^2$-conforming finite element approximation to planar fourth order elliptic problems without having to implement $C^1$ elements. The algorithm consists of replacing the original $H^2$-conforming scheme with pre-processing and post-processing steps that require only an $H^1$-conforming Poisson type solve and an inner Stokes-like problem that again only requires at most $H^1$-conformity. We then demonstrate the method applied to the Morgan-Scott elements with three numerical examples.
Complete observation of event histories is often impossible due to sampling effects such as right-censoring and left-truncation, but also due to reporting delays and incomplete event adjudication. This is for example the case during interim stages of clinical trials and for health insurance claims. In this paper, we develop a parametric method that takes the aforementioned effects into account, treating the latter two as partially exogenous. The method, which takes the form of a two-step M-estimation procedure, is applicable to multistate models in general, including competing risks and recurrent event models. The effect of reporting delays is derived via thinning, extending existing results for Poisson models. To address incomplete event adjudication, we propose an imputed likelihood approach which, compared to existing methods, has the advantage of allowing for dependencies between the event history and adjudication processes as well as allowing for unreported events and multiple event types. We establish consistency and asymptotic normality under standard identifiability, integrability, and smoothness conditions, and we demonstrate the validity of the percentile bootstrap. Finally, a simulation study shows favorable finite sample performance of our method compared to other alternatives, while an application to disability insurance data illustrates its practical potential.
We present a method for finding large fixed-size primes of the form $X^2+c$. We study the density of primes on the sets $E_c = \{N(X,c)=X^2+c,\ X \in (2\mathbb{Z}+(c-1))\}$, $c \in \mathbb{N}^*$. We describe an algorithm for generating values of $c$ such that a given prime $p$ is the minimum of the union of prime divisors of all elements in $E_c$. We also present quadratic forms generating divisors of Ec and study the prime divisors of its terms. This paper uses the results of Dirichlet's arithmetic progression theorem [1] and the article [6] to rewrite a conjecture of Shanks [2] on the density of primes in $E_c$. Finally, based on these results, we discuss the heuristics of large primes occurrences in the research set of our algorithm.
In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.