Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.
The joint retrieval of surface reflectances and atmospheric parameters in VSWIR imaging spectroscopy is a computationally challenging high-dimensional problem. Using NASA's Surface Biology and Geology mission as the motivational context, the uncertainty associated with the retrievals is crucial for further application of the retrieved results for environmental applications. Although Markov chain Monte Carlo (MCMC) is a Bayesian method ideal for uncertainty quantification, the full-dimensional implementation of MCMC for the retrieval is computationally intractable. In this work, we developed a block Metropolis MCMC algorithm for the high-dimensional VSWIR surface reflectance retrieval that leverages the structure of the forward radiative transfer model to enable tractable fully Bayesian computation. We use the posterior distribution from this MCMC algorithm to assess the limitations of optimal estimation, the state-of-the-art Bayesian algorithm in operational retrievals which is more computationally efficient but uses a Gaussian approximation to characterize the posterior. Analyzing the differences in the posterior computed by each method, the MCMC algorithm was shown to give more physically sensible results and reveals the non-Gaussian structure of the posterior, specifically in the atmospheric aerosol optical depth parameter and the low-wavelength surface reflectances.
High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operations to generating dynamic circuits that schedule operations at run time. Such circuits trade-off area utilisation for increased dynamism and throughput. However, existing lowering flows in dynamically scheduled HLS tools rely on conservative assumptions on their input program due to both the intermediate representations (IR) utilised as well as the lack of formal specifications on the translation into hardware. These assumptions cause suboptimal hardware performance. In this work, we lift these assumptions by proposing a new and efficient abstraction for hardware mapping; namely h-GSA, an extension of the Gated Single Static Assignment (GSA) IR. Using this abstraction, we propose a lowering flow that transforms GSA into h-GSA and maps h-GSA into dynamically scheduled hardware circuits. We compare the schedules generated by our approach to those by the state-of-the-art dynamic-scheduling HLS tool, Dynamatic, and illustrate the potential performance improvement from hardware mapping using the proposed abstraction.
Although compartmental dynamical systems are used in many different areas of science, model selection based on the maximum entropy principle (MaxEnt) is challenging because of the lack of methods for quantifying the entropy for this type of systems. Here, we take advantage of the interpretation of compartmental systems as continuous-time Markov chains to obtain entropy measures that quantify model information content. In particular, we quantify the uncertainty of a single particle's path as it travels through the system as described by path entropy and entropy rates. Path entropy measures the uncertainty of the entire path of a traveling particle from its entry into the system until its exit, whereas entropy rates measure the average uncertainty of the instantaneous future of a particle while it is in the system. We derive explicit formulas for these two types of entropy for compartmental systems in equilibrium based on Shannon information entropy and show how they can be used to solve equifinality problems in the process of model selection by means of MaxEnt.
One major problem in the study of numerical semigroups is determining the growth of the semigroup tree. In the present work, infinite chains of numerical semigroups in the semigroup tree, firstly introduced in Bras-Amor\'os and Nulygin (2009), are studied. Computational results show that these chains are rare, but without them the tree would not be infinite. It is proved that for each genus $g\geq 5$ there are more semigroups of that genus not belonging to infinite chains than semigroups belonging. The reference Bras-Amor\'os and Bulygin (2009) presented a characterization of the semigroups that belong to infinite chains in terms of the coprimality of the left elements of the semigroup as well as a result on the cardinality of the set of infinite chains to which a numerical semigroup belongs in terms of the primality of the greatest common divisor of these left elements. We revisit these results and fix an imprecision on the cardinality of the set of infinite chains to which a semigroup belongs in the case when the greatest common divisor of the left elements is a prime number. We then look at infinite chains in subtrees with fixed multiplicity. When the multiplicity is a prime number there is only one infinite chain in the tree of semigroups with such multiplicity. When the multpliplicity is $4$ or $6$ we prove a self-replication behavior in the subtree and prove a formula for the number of semigroups in infinite chains of a given genus and multiplicity $4$ and $6$, respectively.
In the context of reconstructing phylogenetic networks from a collection of phylogenetic trees, several characterisations and subsequently algorithms have been established to reconstruct a phylogenetic network that collectively embeds all trees in the input in some minimum way. For many instances however, the resulting network also embeds additional phylogenetic trees that are not part of the input. However, little is known about these inferred trees. In this paper, we explore the relationships among all phylogenetic trees that are embedded in a given phylogenetic network. First, we investigate some combinatorial properties of the collection P of all rooted binary phylogenetic trees that are embedded in a rooted binary phylogenetic network N. To this end, we associated a particular graph G, which we call rSPR graph, with the elements in P and show that, if |P|=2^k, where k is the number of vertices with in-degree two in N, then G has a Hamiltonian cycle. Second, by exploiting rSPR graphs and properties of hypercubes, we turn to the well-studied class of rooted binary level-1 networks and give necessary and sufficient conditions for when a set of rooted binary phylogenetic trees can be embedded in a level-1 network without inferring any additional trees. Lastly, we show how these conditions translate into a polynomial-time algorithm to reconstruct such a network if it exists.
Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono\"i cells, that can be seen as clusters, and constructs a discrete distribution based on their centroids and probabilistic masses. Lloyd's algorithm estimates the optimal centroids in a minimal expected distance sense, but this approach poses significant challenges in scenarios where data evaluation is costly, and relates to rare events. Then, the single cluster associated to no event takes the majority of the probability mass. In this context, a metamodel is required and adapted sampling methods are necessary to increase the precision of the computations on the rare clusters.
A central problem in computational statistics is to convert a procedure for sampling combinatorial from an objects into a procedure for counting those objects, and vice versa. Weconsider sampling problems coming from *Gibbs distributions*, which are probability distributions of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_\min, \beta_\max]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The *partition function* is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters are the log partition ratio $q = \log \tfrac{Z(\beta_\max)}{Z(\beta_\min)}$ and the vector of counts $c_x = |H^{-1}(x)|$. Our first result is an algorithm to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\epsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\epsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine for global estimation of the partition function. Specifically, we produce a data structure to estimate $Z(\beta)$ for \emph{all} values $\beta$, without further samples. Constructing the data structure requires $O(\frac{q \log n}{\epsilon^2})$ samples for general Gibbs distributions and $O(\frac{n^2 \log n}{\epsilon^2} + n \log q)$ samples for integer-valued distributions. This improves over a prior algorithm of Kolmogorov (2018) which computes the single point estimate $Z(\beta_\max)$ using $\tilde O(\frac{q}{\epsilon^2})$ samples. We also show that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms.
The probe and singular sources methods are well-known two classical direct reconstruction methods in inverse obstacle problems governed by partial differential equations. The common part of both methods is the notion of the indicator functions which are defined outside an unknown obstacle and blow up on the surface of the obstacle. However, their appearance is completely different. In this paper, by considering an inverse obstacle problem governed by the Laplace equation in a bounded domain as a prototype case, an integrated version of the probe and singular sources methods which fills the gap between their indicator functions is introduced. The main result is decomposed into three parts. First, the singular sources method combined with the probe method and notion of the Carleman function is formulated. Second, the indicator functions of both methods can be obtained as a result of decomposing a third indicator function into two ways. The third indicator function blows up on both the outer and obstacle surfaces. Third, the probe and singular sources methods are reformulated and it is shown that the indicator functions on which both reformulated methods based, completely coincide with each other. As a byproduct, it turns out that the reformulated singular sources method has also the Side B of the probe method, which is a characterization of the unknown obstacle by means of the blowing up property of an indicator sequence.
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.