亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of assigning robots with actions to track targets. The objective is to optimize the robot team's tracking quality which can be defined as the reduction in the uncertainty of the targets' states. Specifically, we consider two assignment problems given the different sensing capabilities of the robots. In the first assignment problem, a single robot is sufficient to track a target. To this end, we present a greedy algorithm (Algorithm 1) that assigns a robot with its action to each target. We prove that the greedy algorithm has a 1/2 approximation bound and runs in polynomial time. Then, we study the second assignment problem where two robots are necessary to track a target. We design another greedy algorithm (Algorithm 2) that assigns a pair of robots with their actions to each target. We prove that the greedy algorithm achieves a 1/3 approximation bound and has a polynomial running time. Moreover, we illustrate the performance of the two greedy algorithms in the ROS-Gazebo environment where the tracking patterns of one robot following one target using Algorithm 1 and two robots following one target using Algorithm 2 are clearly observed. Further, we conduct extensive comparisons to demonstrate that the two greedy algorithms perform close to their optimal counterparts and much better than their respective (1/2 and 1/3) approximation bounds.

相關內容

Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at //3DAPNet.github.io

A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.

Image denoising is a fundamental and challenging task in the field of computer vision. Most supervised denoising methods learn to reconstruct clean images from noisy inputs, which have intrinsic spectral bias and tend to produce over-smoothed and blurry images. Recently, researchers have explored diffusion models to generate high-frequency details in image restoration tasks, but these models do not guarantee that the generated texture aligns with real images, leading to undesirable artifacts. To address the trade-off between visual appeal and fidelity of high-frequency details in denoising tasks, we propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG). Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal, which serves as the initial estimation for subsequent steps to maintain fidelity. Additionally, it employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality. We further introduce a two-stage training scheme to ensure effective collaboration between the reconstructive and generative modules of RnG. To reduce undesirable texture introduced by the diffusion model, we also propose an adaptive step controller that regulates the number of inverse steps applied by the diffusion model, allowing control over the level of high-frequency details added to each patch as well as saving the inference computational cost. Through our proposed RnG, we achieve a better balance between perception and distortion. We conducted extensive experiments on both synthetic and real denoising datasets, validating the superiority of the proposed approach.

Predicting pedestrian movements remains a complex and persistent challenge in robot navigation research. We must evaluate several factors to achieve accurate predictions, such as pedestrian interactions, the environment, crowd density, and social and cultural norms. Accurate prediction of pedestrian paths is vital for ensuring safe human-robot interaction, especially in robot navigation. Furthermore, this research has potential applications in autonomous vehicles, pedestrian tracking, and human-robot collaboration. Therefore, in this paper, we introduce FlowMNO, an Optical Flow-Integrated Markov Neural Operator designed to capture pedestrian behavior across diverse scenarios. Our paper models trajectory prediction as a Markovian process, where future pedestrian coordinates depend solely on the current state. This problem formulation eliminates the need to store previous states. We conducted experiments using standard benchmark datasets like ETH, HOTEL, ZARA1, ZARA2, UCY, and RGB-D pedestrian datasets. Our study demonstrates that FlowMNO outperforms some of the state-of-the-art deep learning methods like LSTM, GAN, and CNN-based approaches, by approximately 86.46% when predicting pedestrian trajectories. Thus, we show that FlowMNO can seamlessly integrate into robot navigation systems, enhancing their ability to navigate crowded areas smoothly.

Despite significant improvements in robot capabilities, they are likely to fail in human-robot collaborative tasks due to high unpredictability in human environments and varying human expectations. In this work, we explore the role of explanation of failures by a robot in a human-robot collaborative task. We present a user study incorporating common failures in collaborative tasks with human assistance to resolve the failure. In the study, a robot and a human work together to fill a shelf with objects. Upon encountering a failure, the robot explains the failure and the resolution to overcome the failure, either through handovers or humans completing the task. The study is conducted using different levels of robotic explanation based on the failure action, failure cause, and action history, and different strategies in providing the explanation over the course of repeated interaction. Our results show that the success in resolving the failures is not only a function of the level of explanation but also the type of failures. Furthermore, while novice users rate the robot higher overall in terms of their satisfaction with the explanation, their satisfaction is not only a function of the robot's explanation level at a certain round but also the prior information they received from the robot.

Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. Toward this goal, we extend our previous work to propose the TOPS2 descriptor, and an accompanying recognition framework, THOR2, inspired by a human reasoning mechanism known as object unity. We interleave color embeddings obtained using the Mapper algorithm for topological soft clustering with the shape-based TOPS descriptor to obtain the TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework and outperforms RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.

A typical application of upper-limb exoskeleton robots is deployment in rehabilitation training, helping patients to regain manipulative abilities. However, as the patient is not always capable of following the robot, safety issues may arise during the training. Due to the bias in different patients, an individualized scheme is also important to ensure that the robot suits the specific conditions (e.g., movement habits) of a patient, hence guaranteeing effectiveness. To fulfill this requirement, this paper proposes a new motion planning scheme for upper-limb exoskeleton robots, which drives the robot to provide customized, safe, and individualized assistance using both human demonstration and interactive learning. Specifically, the robot first learns from a group of healthy subjects to generate a reference motion trajectory via probabilistic movement primitives (ProMP). It then learns from the patient during the training process to further shape the trajectory inside a moving safe region. The interactive data is fed back into the ProMP iteratively to enhance the individualized features for as long as the training process continues. The robot tracks the individualized trajectory under a variable impedance model to realize the assistance. Finally, the experimental results are presented in this paper to validate the proposed control scheme.

Current robotic grasping methods often rely on estimating the pose of the target object, explicitly predicting grasp poses, or implicitly estimating grasp success probabilities. In this work, we propose a novel approach that directly maps gripper poses to their corresponding grasp success values, without considering objectness. Specifically, we leverage a Neural Radiance Field (NeRF) architecture to learn a scene representation and use it to train a grasp success estimator that maps each pose in the robot's task space to a grasp success value. We employ this learned estimator to tune its inputs, i.e., grasp poses, by gradient-based optimization to obtain successful grasp poses. Contrary to other NeRF-based methods which enhance existing grasp pose estimation approaches by relying on NeRF's rendering capabilities or directly estimate grasp poses in a discretized space using NeRF's scene representation capabilities, our approach uniquely sidesteps both the need for rendering and the limitation of discretization. We demonstrate the effectiveness of our approach on four simulated 3DoF (Degree of Freedom) robotic grasping tasks and show that it can generalize to novel objects. Our best model achieves an average translation error of 3mm from valid grasp poses. This work opens the door for future research to apply our approach to higher DoF grasps and real-world scenarios.

Bipedal robots promise the ability to traverse rough terrain quickly and efficiently, and indeed, humanoid robots can now use strong ankles and careful foot placement to traverse discontinuous terrain. However, more agile underactuated bipeds have small feet and weak ankles, and must constantly adjust their planned footstep position to maintain balance. We introduce a new model-predictive footstep controller which jointly optimizes over the robot's discrete choice of stepping surface, impending footstep position sequence, ankle torque in the sagittal plane, and center of mass trajectory, to track a velocity command. The controller is formulated as a single Mixed Integer Quadratic Program (MIQP) which is solved at 50-200 Hz, depending on terrain complexity. We implement a state of the art real-time elevation mapping and convex terrain decomposition framework to inform the controller of its surroundings in the form on convex polygons representing steppable terrain. We investigate the capabilities and challenges of our approach through hardware experiments on the underactuated biped Cassie.

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at //github.com/redwang/DTGRM.

北京阿比特科技有限公司