Learning methods in Banach spaces are often formulated as regularization problems which minimize the sum of a data fidelity term in a Banach norm and a regularization term in another Banach norm. Due to the infinite dimensional nature of the space, solving such regularization problems is challenging. We construct a direct sum space based on the Banach spaces for the data fidelity term and the regularization term, and then recast the objective function as the norm of a suitable quotient space of the direct sum space. In this way, we express the original regularized problem as an unregularized problem on the direct sum space, which is in turn reformulated as a dual optimization problem in the dual space of the direct sum space. The dual problem is to find the maximum of a linear function on a convex polytope, which may be solved by linear programming. A solution of the original problem is then obtained by using related extremal properties of norming functionals from a solution of the dual problem. Numerical experiments are included to demonstrate that the proposed duality approach leads to an implementable numerical method for solving the regularization learning problems.
Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.
Deep subspace clustering (DSC) networks based on self-expressive model learn representation matrix, often implemented in terms of fully connected network, in the embedded space. After the learning is finished, representation matrix is used by spectral clustering module to assign labels to clusters. However, such approach ignores complementary information that exist in other layers of the encoder (including the input data themselves). Herein, we apply selected linear subspace clustering algorithm to learn representation matrices from representations learned by all layers of encoder network including the input data. Afterward, we learn a multilayer graph that in a multi-view like manner integrates information from graph Laplacians of all used layers. That improves further performance of selected DSC network. Furthermore, we also provide formulation of our approach to cluster out-of-sample/test data points. We validate proposed approach on four well-known datasets with two DSC networks as baseline models. In almost all the cases, proposed approach achieved statistically significant improvement in three performance metrics. MATLAB code of proposed algorithm is posted on //github.com/lovro-sinda/MLG-DSC.
Embedding methods transform the knowledge graph into a continuous, low-dimensional space, facilitating inference and completion tasks. Existing methods are mainly divided into two types: translational distance models and semantic matching models. A key challenge in translational distance models is their inability to effectively differentiate between 'head' and 'tail' entities in graphs. To address this problem, a novel location-sensitive embedding (LSE) method has been developed. LSE innovatively modifies the head entity using relation-specific mappings, conceptualizing relations as linear transformations rather than mere translations. The theoretical foundations of LSE, including its representational capabilities and its connections to existing models, have been thoroughly examined. A more streamlined variant, LSE-d, which employs a diagonal matrix for transformations to enhance practical efficiency, is also proposed. Experiments conducted on four large-scale KG datasets for link prediction show that LSEd either outperforms or is competitive with state-of-the-art related works.
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.
Heterogeneous Bayesian decentralized data fusion captures the set of problems in which two robots must combine two probability density functions over non-equal, but overlapping sets of random variables. In the context of multi-robot dynamic systems, this enables robots to take a "divide and conquer" approach to reason and share data over complementary tasks instead of over the full joint state space. For example, in a target tracking application, this allows robots to track different subsets of targets and share data on only common targets. This paper presents a framework by which robots can each use a local factor graph to represent relevant partitions of a complex global joint probability distribution, thus allowing them to avoid reasoning over the entirety of a more complex model and saving communication as well as computation costs. From a theoretical point of view, this paper makes contributions by casting the heterogeneous decentralized fusion problem in terms of a factor graph, analyzing the challenges that arise due to dynamic filtering, and then developing a new conservative filtering algorithm that ensures statistical correctness. From a practical point of view, we show how this framework can be used to represent different multi-robot applications and then test it with simulations and hardware experiments to validate and demonstrate its statistical conservativeness, applicability, and robustness to real-world challenges.
In hypothesis testing problems, taking samples sequentially and stopping opportunistically to make the inference greatly enhances the reliability. The design of the stopping and inference policy, however, critically relies on the knowledge of the underlying distribution of each hypothesis. When the knowledge of distributions, say, $P_0$ and $P_1$ in the binary-hypothesis case, is replaced by empirically observed statistics from the respective distributions, the gain of sequentiality is less understood when subject to universality constraints. In this work, the gap is mended by a unified study on sequentiality in the universal binary classification problem. We propose a unified framework where the universality constraints are set on the expected stopping time as well as the type-I error exponent. The type-I error exponent is required to achieve a pre-set distribution-dependent constraint $\lambda(P_0,P_1)$ for all $P_0,P_1$. The framework is employed to investigate a semi-sequential and a fully-sequential setup, so that fair comparison can be made with the fixed-length setup. The optimal type-II error exponents in different setups are characterized when the function $\lambda$ satisfies mild continuity conditions. The benefit of sequentiality is shown by comparing the semi-sequential, the fully-sequential, and the fixed-length cases in representative examples of $\lambda$. Conditions under which sequentiality eradicates the trade-off between error exponents are also derived.
Although continuous advances in theoretical modelling of Molecular Communications (MC) are observed, there is still an insuperable gap between theory and experimental testbeds, especially at the microscale. In this paper, the development of the first testbed incorporating engineered yeast cells is reported. Different from the existing literature, eukaryotic yeast cells are considered for both the sender and the receiver, with {\alpha}-factor molecules facilitating the information transfer. The use of such cells is motivated mainly by the well understood biological mechanism of yeast mating, together with their genetic amenability. In addition, recent advances in yeast biosensing establish yeast as a suitable detector and a neat interface to in-body sensor networks. The system under consideration is presented first, and the mathematical models of the underlying biological processes leading to an end-to-end (E2E) system are given. The experimental setup is then described and used to obtain experimental results which validate the developed mathematical models. Beyond that, the ability of the system to effectively generate output pulses in response to repeated stimuli is demonstrated, reporting one event per two hours. However, fast RNA fluctuations indicate cell responses in less than three minutes, demonstrating the potential for much higher rates in the future.
We use Markov categories to develop generalizations of the theory of Markov chains and hidden Markov models in an abstract setting. This comprises characterizations of hidden Markov models in terms of local and global conditional independences as well as existing algorithms for Bayesian filtering and smoothing applicable in all Markov categories with conditionals. We show that these algorithms specialize to existing ones such as the Kalman filter, forward-backward algorithm, and the Rauch-Tung-Striebel smoother when instantiated in appropriate Markov categories. Under slightly stronger assumptions, we also prove that the sequence of outputs of the Bayes filter is itself a Markov chain with a concrete formula for its transition maps. There are two main features of this categorical framework. The first is its generality, as it can be used in any Markov category with conditionals. In particular, it provides a systematic unified account of hidden Markov models and algorithms for filtering and smoothing in discrete probability, Gaussian probability, measure-theoretic probability, possibilistic nondeterminism and others at the same time. The second feature is the intuitive visual representation of information flow in these algorithms in terms of string diagrams.
Several theorems and conjectures in communication complexity state or speculate that the complexity of a matrix in a given communication model is controlled by a related analytic or algebraic matrix parameter, e.g., rank, sign-rank, discrepancy, etc. The forward direction is typically easy as the structural implications of small complexity often imply a bound on some matrix parameter. The challenge lies in establishing the reverse direction, which requires understanding the structure of Boolean matrices for which a given matrix parameter is small or large. We will discuss several research directions that align with this overarching theme.
Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.