Wiping behavior is a task of tracing the surface of an object while feeling the force with the palm of the hand. It is necessary to adjust the force and posture appropriately considering the various contact conditions felt by the hand. Several studies have been conducted on the wiping motion, however, these studies have only dealt with a single surface material, and have only considered the application of the amount of appropriate force, lacking intelligent movements to ensure that the force is applied either evenly to the entire surface or to a certain area. Depending on the surface material, the hand posture and pressing force should be varied appropriately, and this is highly dependent on the definition of the task. Also, most of the movements are executed by high-rigidity robots that are easy to model, and few movements are executed by robots that are low-rigidity but therefore have a small risk of damage due to excessive contact. So, in this study, we develop a method of motion generation based on the learned prediction of contact force during the wiping motion of a low-rigidity robot. We show that MyCobot, which is made of low-rigidity resin, can appropriately perform wiping behaviors on a plane with multiple surface materials based on various task definitions.
Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i)~accelerate designing new graph algorithms, (ii)~to evaluate the contributions on a wide range of graph algorithms, and (iii)~to facilitate easy and fast comparison over different graph frameworks. To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats. ParaGrapher is available online on //blogs.qub.ac.uk/DIPSA/ParaGrapher/.
Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between different types of biological entities(\textit{e.g.}, cell cluster and tissue block) across multiple scales, while such interactions are crucial for patient survival prediction. In light of this, we propose a novel multi-scale heterogeneity-aware hypergraph representation framework. Specifically, our framework first constructs a multi-scale heterogeneity-aware hypergraph and assigns each node with its biological entity type. It then mines diverse interactions between nodes on the graph structure to obtain a global representation. Experimental results demonstrate that our method outperforms state-of-the-art approaches on three benchmark datasets. Code is publicly available at \href{//github.com/Hanminghao/H2GT}{//github.com/Hanminghao/H2GT}.
Objective We model the dynamic trust of human subjects in a human-autonomy-teaming screen-based task. Background Trust is an emerging area of study in human-robot collaboration. Many studies have looked at the issue of robot performance as a sole predictor of human trust, but this could underestimate the complexity of the interaction. Method Subjects were paired with autonomous agents to search an on-screen grid to determine the number of outlier objects. In each trial, a different autonomous agent with a preassigned capability used one of three search strategies and then reported the number of outliers it found as a fraction of its capability. Then, the subject reported their total outlier estimate. Human subjects then evaluated statements about the agent's behavior, reliability, and their trust in the agent. Results 80 subjects were recruited. Self-reported trust was modeled using Ordinary Least Squares, but the group that interacted with varying capability agents on a short time order produced a better performing ARIMAX model. Models were cross-validated between groups and found a moderate improvement in the next trial trust prediction. Conclusion A time series modeling approach reveals the effects of temporal ordering of agent performance on estimated trust. Recency bias may affect how subjects weigh the contribution of strategy or capability to trust. Understanding the connections between agent behavior, agent performance, and human trust is crucial to improving human-robot collaborative tasks. Application The modeling approach in this study demonstrates the need to represent autonomous agent characteristics over time to capture changes in human trust.
Collaborative robots must effectively communicate their internal state to humans to enable a smooth interaction. Nonverbal communication is widely used to communicate information during human-robot interaction, however, such methods may also be misunderstood, leading to communication errors. In this work, we explore modulating the acoustic parameter values (pitch bend, beats per minute, beats per loop) of nonverbal auditory expressions to convey functional robot states (accomplished, progressing, stuck). We propose a reinforcement learning (RL) algorithm based on noisy human feedback to produce accurately interpreted nonverbal auditory expressions. The proposed approach was evaluated through a user study with 24 participants. The results demonstrate that: 1. Our proposed RL-based approach is able to learn suitable acoustic parameter values which improve the users' ability to correctly identify the state of the robot. 2. Algorithm initialization informed by previous user data can be used to significantly speed up the learning process. 3. The method used for algorithm initialization strongly influences whether participants converge to similar sounds for each robot state. 4. Modulation of pitch bend has the largest influence on user association between sounds and robotic states.
A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress. To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers. We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.
Intelligent reflecting surfaces (IRSs) are a promising low-cost solution for achieving high spectral and energy efficiency in future communication systems by enabling the customization of wireless propagation environments. Despite the plethora of research on resource allocation design for IRS-assisted multiuser wireless communication systems, the optimal design and the corresponding performance upper bound are still not fully understood. To bridge this gap in knowledge, in this paper, we investigate the optimal resource allocation design for IRS-assisted multiuser multiple-input single-output systems employing practical discrete IRS phase shifters. In particular, we jointly optimize the beamforming vector at the base station and the discrete IRS phase shifts to minimize the total transmit power for the cases of perfect and imperfect channel state information (CSI) knowledge. To this end, two novel algorithms based on the generalized Benders decomposition (GBD) method are developed to obtain the globally optimal solution for perfect and imperfect CSI, respectively. Moreover, to facilitate practical implementation, we propose two corresponding low-complexity suboptimal algorithms with guaranteed convergence by capitalizing on successive convex approximation (SCA). In particular, for imperfect CSI, we adopt a bounded error model to characterize the CSI uncertainty and propose a new transformation to convexify the robust quality-of-service constraints. Our numerical results confirm the optimality of the proposed GBD-based algorithms for the considered system for both perfect and imperfect CSI. Furthermore, we unveil that both proposed SCA-based algorithms can attain a locally optimal solution within a few iterations. Moreover, compared with the state-of-the-art solution based on alternating optimization, the proposed low-complexity SCA-based schemes achieve a significant performance gain.
Electromagneto-quasistatic (EMQS) field formulations are often dubbed as Darwin-type field formulations which approximate the Maxwell equations by neglecting radiation effects while modelling resistive, capacitive, and inductive effects. A common feature of EMQS field models is the Darwin-Amp\'ere equation formulated with the magnetic vector potential and the electric scalar potential. EMQS field formulations yield different approximations to the Maxwell equations by choice of additional gauge equations. These EMQS formulations are analyzed within the port-Hamiltonian system (PHS) framework. It is shown via the PHS compatibility equation that formulations based on the combination of the Darwin-Amp\'ere equation and the full Maxwell continuity equation yield port-Hamiltonian systems implying numerical stability and specific EMQS energy conservation.
Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps. Existing convolutional neural networks (CNNs) based approaches often struggle to capture long-range dependencies. Whereas recent transformer-based methods are prone to the dominant global representation and may limit their capabilities to capture the subtle change regions due to the complexity of the objects in the scene. To address these limitations, we propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images. The main focus of our design is to introduce a change encoder that leverages local and global feature representations to capture both subtle and large change feature information from multi-scale features to precisely estimate the change regions. Our experimental study on two challenging CD datasets reveals the merits of our approach and obtains state-of-the-art performance.
When faced with accomplishing a task, human experts exhibit intentional behavior. Their unique intents shape their plans and decisions, resulting in experts demonstrating diverse behaviors to accomplish the same task. Due to the uncertainties encountered in the real world and their bounded rationality, experts sometimes adjust their intents, which in turn influences their behaviors during task execution. This paper introduces IDIL, a novel imitation learning algorithm to mimic these diverse intent-driven behaviors of experts. Iteratively, our approach estimates expert intent from heterogeneous demonstrations and then uses it to learn an intent-aware model of their behavior. Unlike contemporary approaches, IDIL is capable of addressing sequential tasks with high-dimensional state representations, while sidestepping the complexities and drawbacks associated with adversarial training (a mainstay of related techniques). Our empirical results suggest that the models generated by IDIL either match or surpass those produced by recent imitation learning benchmarks in metrics of task performance. Moreover, as it creates a generative model, IDIL demonstrates superior performance in intent inference metrics, crucial for human-agent interactions, and aptly captures a broad spectrum of expert behaviors.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.