Previous soft tissue manipulation studies assumed that the grasping point was known and the target deformation can be achieved. During the operation, the constraints are supposed to be constant, and there is no obstacles around the soft tissue. To go beyond these assumptions, a deep reinforcement learning framework with prior knowledge is proposed for soft tissue manipulation under unknown constraints, such as the force applied by fascia. The prior knowledge is represented through an intuitive manipulation strategy. As an action of the agent, a regulator factor is used to coordinate the intuitive approach and the deliberate network. A reward function is designed to balance the exploration and exploitation for large deformation. Successful simulation results verify that the proposed framework can manipulate the soft tissue while avoiding obstacles and adding new position constraints. Compared with the soft actor-critic (SAC) algorithm, the proposed framework can accelerate the training procedure and improve the generalization.
This study delves into the domain of dynamical systems, specifically the forecasting of dynamical time series defined through an evolution function. Traditional approaches in this area predict the future behavior of dynamical systems by inferring the evolution function. However, these methods may confront obstacles due to the presence of missing variables, which are usually attributed to challenges in measurement and a partial understanding of the system of interest. To overcome this obstacle, we introduce the autoregressive with slack time series (ARS) model, that simultaneously estimates the evolution function and imputes missing variables as a slack time series. Assuming time-invariance and linearity in the (underlying) entire dynamical time series, our experiments demonstrate the ARS model's capability to forecast future time series. From a theoretical perspective, we prove that a 2-dimensional time-invariant and linear system can be reconstructed by utilizing observations from a single, partially observed dimension of the system.
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.
Face recognition (FR) has reached a high technical maturity. However, its use needs to be carefully assessed from an ethical perspective, especially in sensitive scenarios. This is precisely the focus of this paper: the use of FR for the identification of specific subjects in moderately to densely crowded spaces (e.g. public spaces, sports stadiums, train stations) and law enforcement scenarios. In particular, there is a need to consider the trade-off between the need to protect privacy and fundamental rights of citizens as well as their safety. Recent Artificial Intelligence (AI) policies, notably the European AI Act, propose that such FR interventions should be proportionate and deployed only when strictly necessary. Nevertheless, concrete guidelines on how to address the concept of proportional FR intervention are lacking to date. This paper proposes a framework to contribute to assessing whether an FR intervention is proportionate or not for a given context of use in the above mentioned scenarios. It also identifies the main quantitative and qualitative variables relevant to the FR intervention decision (e.g. number of people in the scene, level of harm that the person(s) in search could perpetrate, consequences to individual rights and freedoms) and propose a 2D graphical model making it possible to balance these variables in terms of ethical cost vs security gain. Finally, different FR scenarios inspired by real-world deployments validate the proposed model. The framework is conceived as a simple support tool for decision makers when confronted with the deployment of an FR system.
We consider the problem of estimating fold-changes in the expected value of a multivariate outcome that is observed subject to unknown sample-specific and category-specific perturbations. We are motivated by high-throughput sequencing studies of the abundance of microbial taxa, in which microbes are systematically over- and under-detected relative to their true abundances. Our log-linear model admits a partially identifiable estimand, and we establish full identifiability by imposing interpretable parameter constraints. To reduce bias and guarantee the existence of parameter estimates in the presence of sparse observations, we apply an asymptotically negligible and constraint-invariant penalty to our estimating function. We develop a fast coordinate descent algorithm for estimation, and an augmented Lagrangian algorithm for estimation under null hypotheses. We construct a model-robust score test, and demonstrate valid inference even for small sample sizes and violated distributional assumptions. The flexibility of the approach and comparisons to related methods are illustrated via a meta-analysis of microbial associations with colorectal cancer.
Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact. Adopting a straightforward decomposition of the problem into entity detection, entity linking, relation prediction, and evidence combination, we explore simple yet strong baselines. On the popular SimpleQuestions dataset, we find that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach the state of the art, and techniques that do not use neural networks also perform reasonably well. These results show that gains from sophisticated deep learning techniques proposed in the literature are quite modest and that some previous models exhibit unnecessary complexity.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
Recently, deep learning has achieved very promising results in visual object tracking. Deep neural networks in existing tracking methods require a lot of training data to learn a large number of parameters. However, training data is not sufficient for visual object tracking as annotations of a target object are only available in the first frame of a test sequence. In this paper, we propose to learn hierarchical features for visual object tracking by using tree structure based Recursive Neural Networks (RNN), which have fewer parameters than other deep neural networks, e.g. Convolutional Neural Networks (CNN). First, we learn RNN parameters to discriminate between the target object and background in the first frame of a test sequence. Tree structure over local patches of an exemplar region is randomly generated by using a bottom-up greedy search strategy. Given the learned RNN parameters, we create two dictionaries regarding target regions and corresponding local patches based on the learned hierarchical features from both top and leaf nodes of multiple random trees. In each of the subsequent frames, we conduct sparse dictionary coding on all candidates to select the best candidate as the new target location. In addition, we online update two dictionaries to handle appearance changes of target objects. Experimental results demonstrate that our feature learning algorithm can significantly improve tracking performance on benchmark datasets.