We construct neural network regression models to predict key metrics of complexity for Gr\"obner bases of binomial ideals. This work illustrates why predictions with neural networks from Gr\"obner computations are not a straightforward process. Using two probabilistic models for random binomial ideals, we generate and make available a large data set that is able to capture sufficient variability in Gr\"obner complexity. We use this data to train neural networks and predict the cardinality of a reduced Gr\"obner basis and the maximum total degree of its elements. While the cardinality prediction problem is unlike classical problems tackled by machine learning, our simulations show that neural networks, providing performance statistics such as $r^2 = 0.401$, outperform naive guess or multiple regression models with $r^2 = 0.180$.
Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. To address this issue, we propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models while mutually distilling knowledge. The CTCD framework successfully achieves this based on two significant findings: 1) Distilling knowledge from the smaller model to the larger model during co-training improves the performance of the larger model. 2) The enhanced performance of the larger model further boosts the performance of the smaller model. The CTCD framework shows promise as it can be combined with existing techniques like architecture design or data augmentation, replacing one-way KD methods, to achieve further performance improvement. Extensive ablation studies demonstrate the effectiveness of CTCD, and the small model distilled by CTCD outperforms the original larger model by a significant margin of 1.66 on the GLUE benchmark.
The author introduced models of linear logic known as ''Interaction Graphs'' which generalise Girard's various geometry of interaction constructions. In this work, we establish how these models essentially rely on a deep connection between zeta functions and the execution of programs, expressed as a cocycle. This is first shown in the simple case of graphs, before begin lifted to dynamical systems. Focussing on probabilistic models, we then explain how the notion of graphings used in Interaction Graphs captures a natural class of sub-Markov processes. We then extend the realisability constructions and the notion of zeta function to provide a realisability model of second-order linear logic over the set of all (discrete-time) sub-Markov processes.
We consider twisted permutation codes, a class of frequency permutation arrays obtained from finite groups with multiple permutation representations of the same degree, introduced by Gillespie, Praeger and Spiga (and later studied by Akbari, Gillespie and Praeger), and develop a decoding algorithm for such codes based on earlier work of the first author for permutation group codes. In particular, we show how to implement this algorithm for an infinite family of groups considered by Akbari, Gillespie and Praeger.
We import the algebro-geometric notion of a complete collineation into the study of maximum likelihood estimation in directed Gaussian graphical models. A complete collineation produces a perturbation of sample data, which we call a stabilisation of the sample. While a maximum likelihood estimate (MLE) may not exist or be unique given sample data, it is always unique given a stabilisation. We relate the MLE given a stabilisation to the MLE given original sample data, when one exists, providing necessary and sufficient conditions for the MLE given a stabilisation to be one given the original sample. For linear regression models, we show that the MLE given any stabilisation is the minimal norm choice among the MLEs given an original sample. We show that the MLE has a well-defined limit as the stabilisation of a sample tends to the original sample, and that the limit is an MLE given the original sample, when one exists. Finally, we study which MLEs given a sample can arise as such limits. We reduce this to a question regarding the non-emptiness of certain algebraic varieties.
Estimating a prediction function is a fundamental component of many data analyses. The Super Learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many applications. Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms. However, the performance of a Super Learner using the lasso for dimension reduction has not been fully explored in cases where the lasso is known to perform poorly. We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen, similar to the guidance for choosing a library of prediction algorithms for the Super Learner.
Clinical neuroimaging data is naturally hierarchical. Different magnetic resonance imaging (MRI) sequences within a series, different slices covering the head, and different regions within each slice all confer different information. In this work we present a hierarchical attention network for abnormality detection using MRI scans obtained in a clinical hospital setting. The proposed network is suitable for non-volumetric data (i.e. stacks of high-resolution MRI slices), and can be trained from binary examination-level labels. We show that this hierarchical approach leads to improved classification, while providing interpretability through either coarse inter- and intra-slice abnormality localisation, or giving importance scores for different slices and sequences, making our model suitable for use as an automated triaging system in radiology departments.
This is the second in a series of articles aimed at exploring the relationship between the complexity classes of P and NP. The research in this article aims to find conditions of an algorithmic nature that are necessary and sufficient to transform any Boolean function in conjunctive normal form into a specific form that guarantees the satisfiability of this function. To find such conditions, we use the concept of a special covering of a set introduced in [13], and investigate the connection between this concept and the notion of satisfiability of Boolean functions. As shown, the problem of existence of a special covering for a set is equivalent to the Boolean satisfiability problem. Thus, an important result is the proof of the existence of necessary and sufficient conditions that make it possible to find out if there is a special covering for the set under the special decomposition. This result allows us to formulate the necessary and sufficient algorithmic conditions for Boolean satisfiability, considering the function in conjunctive normal form as a set of clauses. In parallel, as a result of the aforementioned algorithmic procedure, we obtain the values of the variables that ensure the satisfiability of this function. The terminology used related to graph theory, set theory, Boolean functions and complexity theory is consistent with the terminology in [1], [2], [3], [4]. The newly introduced terms are not found in use by other authors and do not contradict to other terms.
We define and study the model of patterned non-determinism in bipartite communication complexity, denoted by $PNP^{X\leftrightarrow Y}$. It generalises the known models $UP^{X\leftrightarrow Y}$ and $FewP^{X\leftrightarrow Y}$ through relaxing the constraints on the witnessing structure of the underlying $NP^{X\leftrightarrow Y}$-protocol. It is shown that for the case of total functions $PNP^{X\leftrightarrow Y}$ equals $P^{X\leftrightarrow Y}$ (similarly to $UP^{X\leftrightarrow Y}$ and $FewP^{X\leftrightarrow Y}$). Moreover, the corresponding exhaustive witness-searching problem -- determining the full set of witnesses that lead to the acceptance of a given input pair -- also has an efficient deterministic protocol. The possibility of efficient exhaustive $PNP^{X\leftrightarrow Y}$-search is used to analyse certain three-party communication regime (under the "number in hand" input partition): The corresponding three-party model is shown to be as strong qualitatively as the weakest among its two-party amplifications obtained by allowing free communication between a pair of players.
The contraction$^*$-depth is the matroid depth parameter analogous to tree-depth of graphs. We establish the matroid analogue of the classical graph theory result asserting that the tree-depth of a graph $G$ is the minimum height of a rooted forest whose closure contains $G$ by proving the following for every matroid $M$ (except the trivial case when $M$ consists of loops and bridges only): the contraction$^*$-depth of $M$ plus one is equal to the minimum contraction-depth of a matroid containing $M$ as a restriction.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.