We establish that constructive continued fraction dimension originally defined using $s$-gales is robust, but surprisingly, that the effective continued fraction dimension and effective (base-$b$) Hausdorff dimension of the same real can be unequal in general. We initially provide an equivalent characterization of continued fraction dimension using Kolmogorov complexity. In the process, we construct an optimal lower semi-computable $s$-gale for continued fractions. We also prove new bounds on the Lebesgue measure of continued fraction cylinders, which may be of independent interest. We apply these bounds to reveal an unexpected behavior of continued fraction dimension. It is known that feasible dimension is invariant with respect to base conversion. We also know that Martin-L\"of randomness and computable randomness are invariant not only with respect to base conversion, but also with respect to the continued fraction representation. In contrast, for any $0 < \varepsilon < 0.5$, we prove the existence of a real whose effective Hausdorff dimension is less than $\varepsilon$, but whose effective continued fraction dimension is greater than or equal to $0.5$. This phenomenon is related to the ``non-faithfulness'' of certain families of covers, investigated by Peres and Torbin and by Albeverio, Ivanenko, Lebid and Torbin. We also establish that for any real, the constructive Hausdorff dimension is at most its effective continued fraction dimension.
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.
Solving a linear system $Ax=b$ is a fundamental scientific computing primitive for which numerous solvers and preconditioners have been developed. These come with parameters whose optimal values depend on the system being solved and are often impossible or too expensive to identify; thus in practice sub-optimal heuristics are used. We consider the common setting in which many related linear systems need to be solved, e.g. during a single numerical simulation. In this scenario, can we sequentially choose parameters that attain a near-optimal overall number of iterations, without extra matrix computations? We answer in the affirmative for Successive Over-Relaxation (SOR), a standard solver whose parameter $\omega$ has a strong impact on its runtime. For this method, we prove that a bandit online learning algorithm -- using only the number of iterations as feedback -- can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $\omega$ as the sequence length increases. Furthermore, when given additional structural information, we show that a contextual bandit method asymptotically achieves the performance of the instance-optimal policy, which selects the best $\omega$ for each instance. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing, demonstrating theoretically the potential to speed up numerical methods using well-understood learning algorithms.
Tsetlin Machines (TMs) have garnered increasing interest for their ability to learn concepts via propositional formulas and their proven efficiency across various application domains. Despite this, the convergence proof for the TMs, particularly for the AND operator (\emph{conjunction} of literals), in the generalized case (inputs greater than two bits) remains an open problem. This paper aims to fill this gap by presenting a comprehensive convergence analysis of Tsetlin automaton-based Machine Learning algorithms. We introduce a novel framework, referred to as Probabilistic Concept Learning (PCL), which simplifies the TM structure while incorporating dedicated feedback mechanisms and dedicated inclusion/exclusion probabilities for literals. Given $n$ features, PCL aims to learn a set of conjunction clauses $C_i$ each associated with a distinct inclusion probability $p_i$. Most importantly, we establish a theoretical proof confirming that, for any clause $C_k$, PCL converges to a conjunction of literals when $0.5<p_k<1$. This result serves as a stepping stone for future research on the convergence properties of Tsetlin automaton-based learning algorithms. Our findings not only contribute to the theoretical understanding of Tsetlin Machines but also have implications for their practical application, potentially leading to more robust and interpretable machine learning models.
The Geometric Bin Packing (GBP) problem is a generalization of Bin Packing where the input is a set of $d$-dimensional rectangles, and the goal is to pack them into unit $d$-dimensional cubes efficiently. It is NP-Hard to obtain a PTAS for the problem, even when $d=2$. For general $d$, the best-known approximation algorithm has an approximation guarantee exponential in $d$, while the best hardness of approximation is still a small constant inapproximability from the case when $d=2$. In this paper, we show that the problem cannot be approximated within $d^{1-\epsilon}$ factor unless NP=ZPP. Recently, $d$-dimensional Vector Bin Packing, a closely related problem to the GBP, was shown to be hard to approximate within $\Omega(\log d)$ when $d$ is a fixed constant, using a notion of Packing Dimension of set families. In this paper, we introduce a geometric analog of it, the Geometric Packing Dimension of set families. While we fall short of obtaining similar inapproximability results for the Geometric Bin Packing problem when $d$ is fixed, we prove a couple of key properties of the Geometric Packing Dimension which highlight fundamental differences between Geometric Bin Packing and Vector Bin Packing.
Mesh deformation is a core task for 3D mesh reconstruction, but defining an efficient discrepancy between predicted and target meshes remains an open problem. A prevalent approach in current deep learning is the set-based approach which measures the discrepancy between two surfaces by comparing two randomly sampled point-clouds from the two meshes with Chamfer pseudo-distance. Nevertheless, the set-based approach still has limitations such as lacking a theoretical guarantee for choosing the number of points in sampled point-clouds, and the pseudo-metricity and the quadratic complexity of the Chamfer divergence. To address these issues, we propose a novel metric for learning mesh deformation. The metric is defined by sliced Wasserstein distance on meshes represented as probability measures that generalize the set-based approach. By leveraging probability measure space, we gain flexibility in encoding meshes using diverse forms of probability measures, such as continuous, empirical, and discrete measures via \textit{varifold} representation. After having encoded probability measures, we can compare meshes by using the sliced Wasserstein distance which is an effective optimal transport distance with linear computational complexity and can provide a fast statistical rate for approximating the surface of meshes. Furthermore, we employ a neural ordinary differential equation (ODE) to deform the input surface into the target shape by modeling the trajectories of the points on the surface. Our experiments on cortical surface reconstruction demonstrate that our approach surpasses other competing methods in multiple datasets and metrics.
Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on user-defined types than on the most common built-in types. Limiting their real-world usefulness even more, they rarely integrate with user-facing applications. We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations. For effective result retrieval and re-integration, we extract usage slices from a program's code property graph. Comparing our approach against recent neural type inference systems, our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore, we present JoernTI, an integration of our approach into Joern, an open source static analysis tool, and demonstrate that the analysis benefits from the additional type information. As our model allows for fast inference times even on commodity CPUs, making our system available through Joern leads to high accessibility and facilitates security research.
A unique sink orientation (USO) is an orientation of the $n$-dimensional hypercube graph such that every non-empty face contains a unique sink. Schurr showed that given any $n$-dimensional USO and any dimension $i$, the set of edges $E_i$ in that dimension can be decomposed into equivalence classes (so-called phases), such that flipping the orientation of a subset $S$ of $E_i$ yields another USO if and only if $S$ is a union of a set of these phases. In this paper we prove various results on the structure of phases. Using these results, we show that all phases can be computed in $O(3^n)$ time, significantly improving upon the previously known $O(4^n)$ trivial algorithm. Furthermore, we show that given a boolean circuit of size $poly(n)$ succinctly encoding an $n$-dimensional (acyclic) USO, it is PSPACE-complete to determine whether two given edges are in the same phase. The problem is thus equally difficult as determining whether the hypercube orientation encoded by a given circuit is an acyclic USO [G\"artner and Thomas, STACS'15].
Generalised hypertree width ($ghw$) is a hypergraph parameter that is central to the tractability of many prominent problems with natural hypergraph structure. Computing $ghw$ of a hypergraph is notoriously hard. The decision version of the problem, checking whether $ghw(H) \leq k$, is paraNP-hard when parameterised by $k$. Furthermore, approximation of $ghw$ is at least as hard as approximation of Set-Cover, which is known to not admit any fpt approximation algorithms. Research in the computation of ghw so far has focused on identifying structural restrictions to hypergraphs -- such as bounds on the size of edge intersections -- that permit XP algorithms for $ghw$. Yet, even under these restrictions that problem has so far evaded any kind of fpt algorithm. In this paper we make the first step towards fpt algorithms for $ghw$ by showing that the parameter can be approximated in fpt time for graphs of bounded edge intersection size. In concrete terms we show that there exists an fpt algorithm, parameterised by $k$ and $d$, that for input hypergraph $H$ with maximal cardinality of edge intersections $d$ and integer $k$ either outputs a tree decomposition with $ghw(H) \leq 4k(k+d+1+)(2k-1)$, or rejects, in which case it is guaranteed that $ghw(H) > k$. Thus, in the special case, of hypergraphs of bounded edge intersection, we obtain an fpt $O(k^3)$-approximation algorithm for $ghw$.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.