In this work, we introduce PokeRRT, a novel motion planning algorithm that demonstrates poking as an effective non-prehensile manipulation skill to enable fast manipulation of objects and increase the size of a robot's reachable workspace. We showcase poking as a failure recovery tactic used synergistically with pick-and-place for resiliency in cases where pick-and-place initially fails or is unachievable. Our experiments demonstrate the efficiency of the proposed framework in planning object trajectories using poking manipulation in uncluttered and cluttered environments. In addition to quantitatively and qualitatively demonstrating the adaptability of PokeRRT to different scenarios in both simulation and real-world settings, our results show the advantages of poking over pushing and grasping in terms of success rate and task time.
In this work we propose a novel method to ensure important entropy inequalities are satisfied semi-discretely when constructing reduced order models (ROMs) on nonlinear reduced manifolds. We are in particular interested in ROMs of systems of nonlinear hyperbolic conservation laws. The so-called entropy stability property endows the semi-discrete ROMs with physically admissible behaviour. The method generalizes earlier results on entropy-stable ROMs constructed on linear spaces. The ROM works by evaluating the projected system on a well-chosen approximation of the state that ensures entropy stability. To ensure accuracy of the ROM after this approximation we locally enrich the tangent space of the reduced manifold with important quantities. Using numerical experiments on some well-known equations (the inviscid Burgers equation, shallow water equations and compressible Euler equations) we show the improved structure-preserving properties of our ROM compared to standard approaches and that our approximations have minimal impact on the accuracy of the ROM. We additionally generalize the recently proposed polynomial reduced manifolds to rational polynomial manifolds and show that this leads to an increase in accuracy for our experiments.
In this paper, we propose a simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) and energy buffer aided multiple-input single-output (MISO) simultaneous wireless information and power transfer (SWIPT) non-orthogonal multiple access (NOMA) system, which consists of a STAR-RIS, an access point (AP), and reflection users and transmission users with energy buffers. In the proposed system, the multi-antenna AP can transmit information and energy to several single-antenna reflection and transmission users simultaneously by the NOMA fashion in the downlink, where the power transfer and information transmission states of the users are modeled using Markov chains. The reflection and transmission users harvest and store the energy in energy buffers as additional power supplies, which are partially utilized for uplink information transmission. The power outage probability, information outage probability, sum throughput, and joint outage probability closed-form expressions of the proposed system are derived over Nakagami-m fading channels, which are validated via simulations. Results demonstrate that the proposed system achieves better performance as compared to the proposed system with discrete phase shifts, the STAR-RIS aided MISO SWIPT-NOMA buffer-less, conventional RIS and energy buffer aided MISO SWIPT-NOMA, and STAR-RIS and energy buffer aided MISO SWIPT-time-division multiple access (TDMA) systems. Furthermore, a particle swarm optimization-based power allocation (PSO-PA) algorithm is designed to maximize the uplink sum throughput with a constraint on the uplink joint outage probability and Jain's fairness index (JFI). Simulation results illustrate that the proposed PSO-PA algorithm can achieve an improved sum throughput performance of the proposed system.
Image composition is a complex task which requires a lot of information about the scene for an accurate and realistic composition, such as perspective, lighting, shadows, occlusions, and object interactions. Previous methods have predominantly used 2D information for image composition, neglecting the potentials of 3D spatial information. In this work, we propose DepGAN, a Generative Adversarial Network that utilizes depth maps and alpha channels to rectify inaccurate occlusions and enhance transparency effects in image composition. Central to our network is a novel loss function called Depth Aware Loss which quantifies the pixel wise depth difference to accurately delineate occlusion boundaries while compositing objects at different depth levels. Furthermore, we enhance our network's learning process by utilizing opacity data, enabling it to effectively manage compositions involving transparent and semi-transparent objects. We tested our model against state-of-the-art image composition GANs on benchmark (both real and synthetic) datasets. The results reveal that DepGAN significantly outperforms existing methods in terms of accuracy of object placement semantics, transparency and occlusion handling, both visually and quantitatively. Our code is available at //amrtsg.github.io/DepGAN/.
In this paper, we propose and analyze an efficient preconditioning method for the elliptic problem based on the reconstructed discontinuous approximation method. We reconstruct a high-order piecewise polynomial space that arbitrary order can be achieved with one degree of freedom per element. This space can be directly used with the symmetric/nonsymmetric interior penalty discontinuous Galerkin method. Compared with the standard DG method, we can enjoy the advantage on the efficiency of the approximation. Besides, we establish an norm equivalence result between the reconstructed high-order space and the piecewise constant space. This property further allows us to construct an optimal preconditioner from the piecewise constant space. The upper bound of the condition number to the preconditioned symmetric/nonsymmetric system is shown to be independent of the mesh size. Numerical experiments are provided to demonstrate the validity of the theory and the efficiency of the proposed method.
We present a novel formulation for motion planning under uncertainties based on variational inference where the optimal motion plan is modeled as a posterior distribution. We propose a Gaussian variational inference-based framework, termed Gaussian Variational Inference Motion Planning (GVI-MP), to approximate this posterior by a Gaussian distribution over the trajectories. We show that the GVI-MP framework is dual to a special class of stochastic control problems and brings robustness into the decision-making in motion planning. We develop two algorithms to numerically solve this variational inference and the equivalent control formulations for motion planning. The first algorithm uses a natural gradient paradigm to iteratively update a Gaussian proposal distribution on the sparse motion planning factor graph. We propose a second algorithm, the Proximal Covariance Steering Motion Planner (PCS-MP), to solve the same inference problem in its stochastic control form with an additional terminal constraint. We leverage a proximal gradient paradigm where, at each iteration, we quadratically approximate nonlinear state costs and solve a linear covariance steering problem in closed form. The efficacy of the proposed algorithms is demonstrated through extensive experiments on various robot models. An implementation is provided in //github.com/hzyu17/VIMP.
In this work, we propose a novel method for Bayesian Networks (BNs) structure elicitation that is based on the initialization of several LLMs with different experiences, independently querying them to create a structure of the BN, and further obtaining the final structure by majority voting. We compare the method with one alternative method on various widely and not widely known BNs of different sizes and study the scalability of both methods on them. We also propose an approach to check the contamination of BNs in LLM, which shows that some widely known BNs are inapplicable for testing the LLM usage for BNs structure elicitation. We also show that some BNs may be inapplicable for such experiments because their node names are indistinguishable. The experiments on the other BNs show that our method performs better than the existing method with one of the three studied LLMs; however, the performance of both methods significantly decreases with the increase in BN size.
Behavioural biometric authentication systems entail an enrolment period that is burdensome for the user. In this work, we explore generating synthetic gestures from a few real user gestures with generative deep learning, with the application of training a simple (i.e. non-deep-learned) authentication model. Specifically, we show that utilising synthetic data alongside real data can reduce the number of real datapoints a user must provide to enrol into a biometric system. To validate our methods, we use the publicly available dataset of WatchAuth, a system proposed in 2022 for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. We develop a regularised autoencoder model for generating synthetic user-specific wrist motion data representing these physical gestures, and demonstrate the diversity and fidelity of our synthetic gestures. We show that using synthetic gestures in training can improve classification ability for a real-world system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system by more than 40% without negatively impacting its error rates.
This article presents the affordances that Generative Artificial Intelligence can have in disinformation context, one of the major threats to our digitalized society. We present a research framework to generate customized agent-based social networks for disinformation simulations that would enable understanding and evaluation of the phenomena whilst discussing open challenges.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.