We present a deep-learning based tracking objects of interest in walking droplet and granular intruder experiments. In a typical walking droplet experiment, a liquid droplet, known as \textit{walker}, propels itself laterally on the free surface of a vibrating bath of the same liquid. This motion is the result of the interaction between the droplets and the surface waves generated by the droplet itself after each successive bounce. A walker can exhibit a highly irregular trajectory over the course of its motion, including rapid acceleration and complex interactions with the other walkers present in the same bath. In analogy with the hydrodynamic experiments, the granular matter experiments consist of a vibrating bath of very small solid particles and a larger solid \textit{intruder}. Like the fluid droplets, the intruder interacts with and travels the domain due to the waves of the bath but tends to move much slower and much less smoothly than the droplets. When multiple intruders are introduced, they also exhibit complex interactions with each other. We leverage the state-of-art object detection model YOLO and the Hungarian Algorithm to accurately extract the trajectory of a walker or intruder in real-time. Our proposed methodology is capable of tracking individual walker(s) or intruder(s) in digital images acquired from a broad spectrum of experimental settings and does not suffer from any identity-switch issues. Thus, the deep learning approach developed in this work could be used to automatize the efficient, fast and accurate extraction of observables of interests in walking droplet and granular flow experiments. Such extraction capabilities are critically enabling for downstream tasks such as building data-driven dynamical models for the coarse-grained dynamics and interactions of the objects of interest.
Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
Recently, there has been a growing interest in learning and explaining causal effects within Neural Network (NN) models. By virtue of NN architectures, previous approaches consider only direct and total causal effects assuming independence among input variables. We view an NN as a structural causal model (SCM) and extend our focus to include indirect causal effects by introducing feedforward connections among input neurons. We propose an ante-hoc method that captures and maintains direct, indirect, and total causal effects during NN model training. We also propose an algorithm for quantifying learned causal effects in an NN model and efficient approximation strategies for quantifying causal effects in high-dimensional data. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the causal effects learned by our ante-hoc method better approximate the ground truth effects compared to existing methods.
Soccer Simulation 2D (SS2D) is a simulation of a real soccer game in two dimensions. In soccer, passing behavior is an essential action for keeping the ball in possession of our team and creating goal opportunities. Similarly, for SS2D, predicting the passing behaviors of both opponents and our teammates helps manage resources and score more goals. Therefore, in this research, we have tried to address the modeling of passing behavior of soccer 2D players using Deep Neural Networks (DNN) and Random Forest (RF). We propose an embedded data extraction module that can record the decision-making of agents in an online format. Afterward, we apply four data sorting techniques for training data preparation. After, we evaluate the trained models' performance playing against 6 top teams of RoboCup 2019 that have distinctive playing strategies. Finally, we examine the importance of different feature groups on the prediction of a passing strategy. All results in each step of this work prove our suggested methodology's effectiveness and improve the performance of the pass prediction in Soccer Simulation 2D games ranging from 5\% (e.g., playing against the same team) to 10\% (e.g., playing against Robocup top teams).
Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.
We propose a novel set of Poisson Cluster Process (PCP) models to detect Ultra-Diffuse Galaxies (UDGs), a class of extremely faint, enigmatic galaxies of substantial interest in modern astrophysics. We model the unobserved UDG locations as parent points in a PCP, and infer their positions based on the observed spatial point patterns of their old star cluster systems. Many UDGs have somewhere from a few to hundreds of these old star clusters, which we treat as offspring points in our models. We also present a new framework to construct a marked PCP model using the marks of star clusters. The marked PCP model may enhance the detection of UDGs and offers broad applicability to problems in other disciplines. To assess the overall model performance, we design an innovative assessment tool for spatial prediction problems where only point-referenced ground truth is available, overcoming the limitation of standard ROC analyses where spatial Boolean reference maps are required. We construct a bespoke blocked Gibbs adaptive spatial birth-death-move MCMC algorithm to infer the locations of UDGs using real data from a \textit{Hubble Space Telescope} imaging survey. Based on our performance assessment tool, our novel models significantly outperform existing approaches using the Log-Gaussian Cox Process. We also obtained preliminary evidence that the marked PCP model improves UDG detection performance compared to the model without marks. Furthermore, we find evidence of a potential new ``dark galaxy'' that was not detected by previous methods.
Federated learning (FL) goes beyond traditional, centralized machine learning by distributing model training among a large collection of edge clients. These clients cooperatively train a global, e.g., cloud-hosted, model without disclosing their local, private training data. The global model is then shared among all the participants which use it for local predictions. In this paper, we put forward a novel attacker model aiming at turning FL systems into covert channels to implement a stealth communication infrastructure. The main intuition is that, during federated training, a malicious sender can poison the global model by submitting purposely crafted examples. Although the effect of the model poisoning is negligible to other participants, and does not alter the overall model performance, it can be observed by a malicious receiver and used to transmit a single bit.
Multimodal-based action recognition methods have achieved high success using pose and RGB modality. However, skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations. To address this, we introduce human parsing feature map as a novel modality, since it can selectively retain effective semantic features of the body parts, while filtering out most irrelevant noise. We propose a new dual-branch framework called Ensemble Human Parsing and Pose Network (EPP-Net), which is the first to leverage both skeletons and human parsing modalities for action recognition. The first human pose branch feeds robust skeletons in graph convolutional network to model pose features, while the second human parsing branch also leverages depictive parsing feature maps to model parsing festures via convolutional backbones. The two high-level features will be effectively combined through a late fusion strategy for better action recognition. Extensive experiments on NTU RGB+D and NTU RGB+D 120 benchmarks consistently verify the effectiveness of our proposed EPP-Net, which outperforms the existing action recognition methods. Our code is available at: //github.com/liujf69/EPP-Net-Action.
Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.
Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.