亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. While previous methods have achieved promising performance, most of them presume full knowledge of robot internal states, e.g. ground-truth robot joint angles, which are not always available in real-world scenarios. On the other hand, existing approaches that estimate robot pose without joint state priors suffer from heavy computation burdens and thus cannot support real-time applications. This work addresses the urgent need for efficient robot pose estimation with unknown states. We propose an end-to-end pipeline for real-time, holistic robot pose estimation from a single RGB image, even in the absence of known robot states. Our method decomposes the problem into estimating camera-to-robot rotation, robot state parameters, keypoint locations, and root depth. We further design a corresponding neural network module for each task. This approach allows for learning multi-facet representations and facilitates sim-to-real transfer through self-supervised learning. Notably, our method achieves inference with a single feedforward, eliminating the need for costly test-time iterative optimization. As a result, it delivers a 12-time speed boost with state-of-the-art accuracy, enabling real-time holistic robot pose estimation for the first time. Code is available at //oliverbansk.github.io/Holistic-Robot-Pose/.

相關內容

Communication robots have the potential to contribute to effective human-XAI interaction as an interface that goes beyond textual or graphical explanations. One of their strengths is that they can use physical and vocal expressions to add detailed nuances to explanations. However, it is not clear how a robot can apply such expressions, or in particular, how we can develop a strategy to adaptively use such expressions depending on the task and user in dynamic interactions. To address this question, this paper proposes DynEmph, a method for a communication robot to decide where to emphasize XAI-generated explanations with physical expressions. It predicts the effect of emphasizing certain points on a user and aims to minimize the expected difference between predicted user decisions and AI-suggested ones. DynEmph features a strategy for deciding where to emphasize in a data-driven manner, relieving engineers from the need to manually design a strategy. We further conducted experiments to investigate how emphasis selection strategies affect the performance of user decisions. The results suggest that, while a naive strategy (emphasizing explanations for an AI's most probable class) does not necessarily work better, DynEmph effectively guides users to better decisions under the condition that the performance of the AI suggestion is high.

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question. As a result, existing works tend to directly treat all the unlabeled answers as negative labels, leading to limited ability for generalization. In this work, we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers, which contain rich knowledge about label priority as well as label-associated visual cues, thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model, we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings, and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem.

Supervised deep learning techniques show promise in medical image analysis. However, they require comprehensive annotated data sets, which poses challenges, particularly for rare diseases. Consequently, unsupervised anomaly detection (UAD) emerges as a viable alternative for pathology segmentation, as only healthy data is required for training. However, recent UAD anomaly scoring functions often focus on intensity only and neglect structural differences, which impedes the segmentation performance. This work investigates the potential of Structural Similarity (SSIM) to bridge this gap. SSIM captures both intensity and structural disparities and can be advantageous over the classical $l1$ error. However, we show that there is more than one optimal kernel size for the SSIM calculation for different pathologies. Therefore, we investigate an adaptive ensembling strategy for various kernel sizes to offer a more pathology-agnostic scoring mechanism. We demonstrate that this ensembling strategy can enhance the performance of DMs and mitigate the sensitivity to different kernel sizes across varying pathologies, highlighting its promise for brain MRI anomaly detection.

To achieve safe legged locomotion, it is important to generate motion in real-time considering various constraints in robots and environments. In this study, we propose a lightweight real-time perspective motion control system for the newly developed six-wheeled-telescopic-legged robot, Tachyon 3. In the proposed method, analytically smoothed constraints including Smooth Separating Axis Theorem (Smooth SAT) as a novel higher order differentiable collision detection for 3D shapes is applied to the Control Barrier Function (CBF). The proposed system integrating the CBF achieves online motion generation in a short control cycle of 1 ms that satisfies joint limitations, environmental collision avoidance and safe convex foothold constraints. The efficiency of Smooth SAT is shown from the collision detection time of 1 us or less and the CBF constraint computation time for Tachyon3 of several us. Furthermore, the effectiveness of the proposed system is verified through the stair-climbing motion, integrating online recognition in a simulation and a real machine.

We present a method named iComMa to address the 6D camera pose estimation problem in computer vision. Conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods have achieved promising results in mesh-free object and scene pose estimation by inverting the Neural Radiance Fields (NeRF). However, they still struggle with adverse initializations such as large rotations and translations. To address this issue, we propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS). Specifically, a gradient-based differentiable framework optimizes camera pose by minimizing the residual between the query image and the rendered image, requiring no training. An end-to-end matching module is designed to enhance the model's robustness against adverse initializations, while minimizing pixel-level comparing loss aids in precise pose estimation. Experimental results on synthetic and complex real-world data demonstrate the effectiveness of the proposed approach in challenging conditions and the accuracy of camera pose estimation.

Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigurable multi-mode grasps driven by a single motor. Firstly, inspired by the changes that occurred in the contact surface with a human finger moving, we artfully design a slider-slide rail mechanism as the phalanx to achieve retraction of each finger, allowing for better performance in the enveloping grasping mode. Secondly, a reconfigurable structure is constructed to broaden the grasping range of objects' dimensions for the proposed gripper. By adjusting the configuration and gesture of each finger, the gripper can achieve five grasping modes. Thirdly, the proposed gripper is just actuated by a single motor, yet it can be capable of grasping and reconfiguring simultaneously. Finally, various experiments on grasps of slender, thin, and large-volume objects are implemented to evaluate the performance of the proposed gripper in practical scenarios, which demonstrates the excellent grasping capabilities of the gripper.

This paper focuses on sensor fault detection and compensation for robotic manipulators. The proposed method features a new adaptive observer and a new terminal sliding mode control law established on a second-order integral sliding surface. The method enables sensor fault detection without the need to know the bounds on fault value and/or its derivative. It also enables fast and fixed-time fault-tolerant control whose performance can be prescribed beforehand by defining funnel bounds on the tracking error. The ultimate boundedness of the estimation errors for the proposed observer and the fixed-time stability of the control system are shown using Lyapunov stability analysis. The effectiveness of the proposed method is verified using numerical simulations on two different robotic manipulators, and the results are compared with existing methods. Our results demonstrate performance gains obtained by the proposed method compared to the existing results.

Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.

Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.

We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.

北京阿比特科技有限公司