Neural-symbolic approaches to machine learning incorporate the advantages from both connectionist and symbolic methods. Typically, these models employ a first module based on a neural architecture to extract features from complex data. Then, these features are processed as symbols by a symbolic engine that provides reasoning, concept structures, composability, better generalization and out-of-distribution learning among other possibilities. However, neural approaches to the grounding of symbols in sensory data, albeit powerful, still require heavy training and tedious labeling for the most part. This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex spatial sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. Following his suggestion, the model extracts atomic features from raw data by computing elemental sequential comparisons in a stream of multivariate numerical values. Higher-level constructs are built from these features by subjecting them to further comparisons in a recursive process. At any stage in the recursion, a concept structure may be obtained from these constructs and features by means of Formal Concept Analysis. Results show that the model is able to produce fairly rich yet human-readable conceptual representations without training. Additionally, the concept structures obtained through the model (i) present high composability, which potentially enables the generation of 'unseen' concepts, (ii) allow formal reasoning, and (iii) have inherent abilities for generalization and out-of-distribution learning. Consequently, this method may offer an interesting angle to current neural-symbolic research. Future work is required to develop a training methodology so that the model can be tested against a larger dataset.
Computational tools for rigorously verifying the performance of large-scale machine learning (ML) models have progressed significantly in recent years. The most successful solvers employ highly specialized, GPU-accelerated branch and bound routines. Such tools are crucial for the successful deployment of machine learning applications in safety-critical systems, such as power systems. Despite their successes, however, barriers prevent out-of-the-box application of these routines to power system problems. This paper addresses this issue in two key ways. First, for the first time to our knowledge, we enable the simultaneous verification of multiple verification problems (e.g., checking for the violation of all line flow constraints simultaneously and not by solving individual verification problems). For that, we introduce an exact transformation that converts the "worst-case" violation across a set of potential violations to a series of ReLU-based layers that augment the original neural network. This allows verifiers to interpret them directly. Second, power system ML models often must be verified to satisfy power flow constraints. We propose a dualization procedure which encodes linear equality and inequality constraints directly into the verification problem; and in a manner which is mathematically consistent with the specialized verification tools. To demonstrate these innovations, we verify problems associated with data-driven security constrained DC-OPF solvers. We build and test our first set of innovations using the $\alpha,\beta$-CROWN solver, and we benchmark against Gurobi 10.0. Our contributions achieve a speedup that can exceed 100x and allow higher degrees of verification flexibility.
This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small.
Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is $27\times$ faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and $1.34\times$ faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is $2.8\times$ and $3.2\times$ than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems.
We consider the problem of learning observation models for robot state estimation with incremental non-differentiable optimizers in the loop. Convergence to the correct belief over the robot state is heavily dependent on a proper tuning of observation models which serve as input to the optimizer. We propose a gradient-based learning method which converges much quicker to model estimates that lead to solutions of much better quality compared to an existing state-of-the-art method as measured by the tracking accuracy over unseen robot test trajectories.
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.