Human activity discovery aims to distinguish the activities performed by humans, without any prior information of what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled inputs to train the system. In reality, it is difficult to label data because of its huge volume and the variety of activities performed by humans. In this paper, a novel unsupervised approach is proposed to perform human activity discovery in 3D skeleton sequences. First, important frames are selected based on kinetic energy. Next, the displacement of joints, set of statistical, angles, and orientation features are extracted to represent the activities information. Since not all extracted features have useful information, the dimension of features is reduced using PCA. Most human activity discovery proposed are not fully unsupervised. They use pre-segmented videos before categorizing activities. To deal with this, we used the fragmented sliding time window method to segment the time series of activities with some overlapping. Then, activities are discovered by a novel hybrid particle swarm optimization with a Gaussian mutation algorithm to avoid getting stuck in the local optimum. Finally, k-means is applied to the outcome centroids to overcome the slow rate of PSO. Experiments on three datasets have been presented and the results show the proposed method has superior performance in discovering activities in all evaluation parameters compared to the other state-of-the-art methods and has increased accuracy of at least 4 % on average. The code is available here: //github.com/parhamhadikhani/Human-Activity-Discovery-HPGMK
Context. Technical Debt (TD) is a metaphor for technical problems that are not visible to users and customers but hinder developers in their work, making future changes more difficult. TD is often incurred due to tight project deadlines and can make future changes more costly or impossible. Project Management usually focuses on customer benefits and pays less attention to their IT systems' internal quality. TD prevention should be preferred over TD repayment because subsequent refactoring and re-engineering are expensive. Objective. This paper evaluates a framework focusing on both TD prevention and TD repayment in the context of agile-managed projects. The framework was developed and applied in an IT unit of a publishing house. The unique contribution of this framework is the integration of TD management into project management. Method. The evaluation was performed as a comparative case study based on ticket statistics and two structured surveys. The surveys were conducted in the observed IT unit using the framework and a comparison unit not using the framework. The first survey targeted team members, the second one IT managers. Results. The evaluation shows that in this IT unit, the TAP framework led to a raised awareness for the incurrence of TD. Decisions to incur TD are intentional, and TD is repaid timelier. Unintentional TD incurred by unconscious decisions is prevented. Furthermore, better communication and better planning of the project pipeline can be observed. Conclusions. We provide an insight into practitioners' ways to identify, monitor, prevent and repay TD. The presented framework includes a feasible method for TD prevention despite tight timelines by making TD repayment part of project management.
Per-instance algorithm selection seeks to recommend, for a given problem instance and a given performance criterion, one or several suitable algorithms that are expected to perform well for the particular setting. The selection is classically done offline, using openly available information about the problem instance or features that are extracted from the instance during a dedicated feature extraction step. This ignores valuable information that the algorithms accumulate during the optimization process. In this work, we propose an alternative, online algorithm selection scheme which we coin per-run algorithm selection. In our approach, we start the optimization with a default algorithm, and, after a certain number of iterations, extract instance features from the observed trajectory of this initial optimizer to determine whether to switch to another optimizer. We test this approach using the CMA-ES as the default solver, and a portfolio of six different optimizers as potential algorithms to switch to. In contrast to other recent work on online per-run algorithm selection, we warm-start the second optimizer using information accumulated during the first optimization phase. We show that our approach outperforms static per-instance algorithm selection. We also compare two different feature extraction principles, based on exploratory landscape analysis and time series analysis of the internal state variables of the CMA-ES, respectively. We show that a combination of both feature sets provides the most accurate recommendations for our test cases, taken from the BBOB function suite from the COCO platform and the YABBOB suite from the Nevergrad platform.
The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.
The most common sensing modalities found in a robot perception system are vision and touch, which together can provide global and highly localized data for manipulation. However, these sensing modalities often fail to adequately capture the behavior of target objects during the critical moments as they transition out of static, controlled contact with an end-effector to dynamic and uncontrolled motion. In this work, we present a novel multimodal visuotactile sensor that provides simultaneous visuotactile and proximity depth data. The sensor integrates an RGB camera and air pressure sensor to sense touch with an infrared time-of-flight (ToF) camera to sense proximity by leveraging a selectively transmissive soft membrane to enable the dual sensing modalities. We present the mechanical design, fabrication techniques, algorithm implementations, and evaluation of the sensor's tactile and proximity modalities. The sensor is demonstrated in three open-loop robotic tasks: approaching and contacting an object, catching, and throwing. The fusion of tactile and proximity data could be used to capture key information about a target object's transition behavior for sensor-based control in dynamic manipulation.
Gaussian process regression is increasingly applied for learning unknown dynamical systems. In particular, the implicit quantification of the uncertainty of the learned model makes it a promising approach for safety-critical applications. When using Gaussian process regression to learn unknown systems, a commonly considered approach consists of learning the residual dynamics after applying some generic discretization technique, which might however disregard properties of the underlying physical system. Variational integrators are a less common yet promising approach to discretization, as they retain physical properties of the underlying system, such as energy conservation and satisfaction of explicit kinematic constraints. In this work, we present a novel structure-preserving learning-based modelling approach that combines a variational integrator for the nominal dynamics of a mechanical system and learning residual dynamics with Gaussian process regression. We extend our approach to systems with known kinematic constraints and provide formal bounds on the prediction uncertainty. The simulative evaluation of the proposed method shows desirable energy conservation properties in accordance with general theoretical results and demonstrates exact constraint satisfaction for constrained dynamical systems.
Nanodrone swarm is formulated by multiple light-weight and low-cost nanodrones to perform the tasks in very challenging environments. Therefore, it is essential to estimate the relative position of nanodrones in the swarm for accurate and safe platooning in inclement indoor environment. However, the vision and infrared sensors are constrained by the line-of-sight perception, and instrumenting extra motion sensors on drone's body is constrained by the nanodrone's form factor and energy-efficiency. This paper presents the design, implementation and evaluation of RFDrone, a system that can sense the relative position of nanodrone in the swarm using wireless signals, which can naturally identify each individual nanodrone. To do so, each light-weight nanodrone is attached with a RF sticker (i.e., called RFID tag), which will be localized by the external RFID reader in the inclement indoor environment. Instead of accurately localizing each RFID-tagged nanodrone, we propose to estimate the relative position of all the RFID-tagged nanodrones in the swarm based on the spatial-temporal phase profiling. We implement an end-to-end physical prototype of RFDrone. Our experimental results show that RFDrone can accurately estimate the relative position of nanodrones in the swarm with average relative localization accuracy of around 0.95 across x, y and z axis, and average accuracy of around 0.93 for nanodrone swarm's geometry estimation.
With the rapid growth of surveillance cameras in many public places to mon-itor human activities such as in malls, streets, schools and, prisons, there is a strong demand for such systems to detect violence events automatically. Au-tomatic analysis of video to detect violence is significant for law enforce-ment. Moreover, it helps to avoid any social, economic and environmental damages. Mostly, all systems today require manual human supervisors to de-tect violence scenes in the video which is inefficient and inaccurate. in this work, we interest in physical violence that involved two persons or more. This work proposed a novel method to detect violence using a fusion tech-nique of two significantly different convolutional neural networks (CNNs) which are AlexNet and SqueezeNet networks. Each network followed by separate Convolution Long Short Term memory (ConvLSTM) to extract ro-bust and richer features from a video in the final hidden state. Then, making a fusion of these two obtained states and fed to the max-pooling layer. Final-ly, features were classified using a series of fully connected layers and soft-max classifier. The performance of the proposed method is evaluated using three standard benchmark datasets in terms of detection accuracy: Hockey Fight dataset, Movie dataset and Violent Flow dataset. The results show an accuracy of 97%, 100%, and 96% respectively. A comparison of the results with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.
We present a method for the control of robot swarms which allows the shaping and the translation of patterns of simple robots ("smart particles"), using two types of devices. These two types represent a hierarchy: a larger group of simple, oblivious robots (which we call the workers) that is governed by simple local attraction forces, and a smaller group (the guides) with sufficient mission knowledge to create and maintain a desired pattern by operating on the local forces of the former. This framework exploits the knowledge of the guides, which coordinate to shape the workers like smart particles by changing their interaction parameters. We study the approach with a large scale simulation experiment in a physics based simulator with up to 1000 robots forming three different patterns. Our experiments reveal that the approach scales well with increasing robot numbers, and presents little pattern distortion for a set of target moving shapes. We evaluate the approach on a physical swarm of robots that use visual inertial odometry to compute their relative positions and obtain results that are comparable with simulation. This work lays foundation for designing and coordinating configurable smart particles, with applications in smart materials and nanomedicine.
Deep learning depends on large amounts of labeled training data. Manual labeling is expensive and represents a bottleneck, especially for tasks such as segmentation, where labels must be assigned down to the level of individual points. That challenge is even more daunting for 3D data: 3D point clouds contain millions of points per scene, and their accurate annotation is markedly more time-consuming. The situation is further aggravated by the added complexity of user interfaces for 3D point clouds, which slows down annotation even more. For the case of 2D image segmentation, interactive techniques have become common, where user feedback in the form of a few clicks guides a segmentation algorithm -- nowadays usually a neural network -- to achieve an accurate labeling with minimal effort. Surprisingly, interactive segmentation of 3D scenes has not been explored much. Previous work has attempted to obtain accurate 3D segmentation masks using human feedback from the 2D domain, which is only possible if correctly aligned images are available together with the 3D point cloud, and it involves switching between the 2D and 3D domains. Here, we present an interactive 3D object segmentation method in which the user interacts directly with the 3D point cloud. Importantly, our model does not require training data from the target domain: when trained on ScanNet, it performs well on several other datasets with different data characteristics as well as different object classes. Moreover, our method is orthogonal to supervised (instance) segmentation methods and can be combined with them to refine automatic segmentations with minimal human effort.
The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this paper, we present a comprehensive survey of the existing research in the area of MMS.