亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of visual servoing and imitation learning allows us to pursue the objective of realizing friendly robotic interfaces that (i) are able to adapt to the environment thanks to the use of visual perception and (ii) avoid explicit programming thanks to the emulation of previous demonstrations. This work aims to exploit imitation learning for the visual servoing paradigm to address the specific problem of tracking moving objects. In particular, we show that it is possible to infer from data the compensation term required for realizing the tracking controller, avoiding the explicit implementation of estimators or observers. The effectiveness of the proposed method has been validated through simulations with a robotic manipulator.

相關內容

CLIP, one of the pioneering foundation models that connect images and text, has enabled many recent breakthroughs in computer vision. However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration. In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. Moreover, we showcase that the strategy for reducing image/text token length plays a crucial role in determining the quality of this scaling law. As a result of this finding, we are able to successfully train CLIP even with limited computational resources. For example, using 8 A100 GPUs, our CLIP models achieve zero-shot top-1 ImageNet-1k accuracies of 63.2% in ~2 days, 67.8% in ~3 days, and 69.3% in ~4 days. Our method also works well when scaling up -- with G/14, we register a new record of 83.0% ImageNet-1k zero-shot accuracy, and meanwhile accelerate the training by ~33x compared to its OpenCLIP counterpart. By reducing the computation barrier associated with CLIP, we hope to inspire more research in this field, particularly from academics. Our code is available at //github.com/UCSC-VLAA/CLIPA.

Mediation analysis is an important tool to study causal associations in biomedical and other scientific areas and has recently gained attention in microbiome studies. Using a microbiome study of acute myeloid leukemia (AML) patients, we investigate whether the effect of induction chemotherapy intensity levels on the infection status is mediated by the microbial taxa abundance. The unique characteristics of the microbial mediators -- high-dimensionality, zero-inflation, and dependence -- call for new methodological developments in mediation analysis. The presence of an exposure-induced mediator-outcome confounder, antibiotic use, further requires a delicate treatment in the analysis. To address these unique challenges in our motivating AML microbiome study, we propose a novel nonparametric identification formula for the interventional indirect effect (IIE), a measure recently developed for studying mediation effects. We develop the corresponding estimation algorithm using the inverse probability weighting method. We also test the presence of mediation effects via constructing the standard normal bootstrap confidence intervals. Simulation studies show that the proposed method has good finite-sample performance in terms of the IIE estimation, and type-I error rate and power of the corresponding test. In the AML microbiome study, our findings suggest that the effect of induction chemotherapy intensity levels on infection is mainly mediated by patients' gut microbiome.

Emerging applications of robotics, and concerns about their impact, require the research community to put human-centric objectives front-and-center. To meet this challenge, we advocate an interdisciplinary approach, socially cognizant robotics, which synthesizes technical and social science methods. We argue that this approach follows from the need to empower stakeholder participation (from synchronous human feedback to asynchronous societal assessment) in shaping AI-driven robot behavior at all levels, and leads to a range of novel research perspectives and problems both for improving robots' interactions with individuals and impacts on society. Drawing on these arguments, we develop best practices for socially cognizant robot design that balance traditional technology-based metrics (e.g. efficiency, precision and accuracy) with critically important, albeit challenging to measure, human and society-based metrics.

Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the process of simultaneous generation, the model alternates between waiting for a source segment and generating a target segment, making the segment serve as the natural bridge between the source and target. To accomplish this, Seg2Seg introduces a latent segment as the pivot between source to target and explores all potential source-target mappings via the proposed expectation training, thereby learning the optimal moments for generating. Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks.

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at //www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the code and demos on our project webpage at //drive-anywhere.github.io/.

Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user's instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session. In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ hypergraph to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a knowledge-based hypergraph considering fine-grained, entity-level semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks ReDial and TG-ReDial validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: //github.com/RUCAIBox/MHIM.

Shared autonomy methods, where a human operator and a robot arm work together, have enabled robots to complete a range of complex and highly variable tasks. Existing work primarily focuses on one human sharing autonomy with a single robot. By contrast, in this paper we present an approach for multi-robot shared autonomy that enables one operator to provide real-time corrections across two coordinated robots completing the same task in parallel. Sharing autonomy with multiple robots presents fundamental challenges. The human can only correct one robot at a time, and without coordination, the human may be left idle for long periods of time. Accordingly, we develop an approach that aligns the robot's learned motions to best utilize the human's expertise. Our key idea is to leverage Learning from Demonstration (LfD) and time warping to schedule the motions of the robots based on when they may require assistance. Our method uses variability in operator demonstrations to identify the types of corrections an operator might apply during shared autonomy, leverages flexibility in how quickly the task was performed in demonstrations to aid in scheduling, and iteratively estimates the likelihood of when corrections may be needed to ensure that only one robot is completing an action requiring assistance. Through a preliminary study, we show that our method can decrease the scheduled time spent sanding by iteratively estimating the times when each robot could need assistance and generating an optimized schedule that allows the operator to provide corrections to each robot during these times.

Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however, call for a multi-modal approach and, therefore, for multi-tasking models. Multi-task learning (MTL) aims to leverage useful information across tasks to improve the generalization capability of a model. This thesis is concerned with multi-task learning in the context of computer vision. First, we review existing approaches for MTL. Next, we propose several methods that tackle important aspects of multi-task learning. The proposed methods are evaluated on various benchmarks. The results show several advances in the state-of-the-art of multi-task learning. Finally, we discuss several possibilities for future work.

Link prediction on knowledge graphs (KGs) is a key research topic. Previous work mainly focused on binary relations, paying less attention to higher-arity relations although they are ubiquitous in real-world KGs. This paper considers link prediction upon n-ary relational facts and proposes a graph-based approach to this task. The key to our approach is to represent the n-ary structure of a fact as a small heterogeneous graph, and model this graph with edge-biased fully-connected attention. The fully-connected attention captures universal inter-vertex interactions, while with edge-aware attentive biases to particularly encode the graph structure and its heterogeneity. In this fashion, our approach fully models global and local dependencies in each n-ary fact, and hence can more effectively capture associations therein. Extensive evaluation verifies the effectiveness and superiority of our approach. It performs substantially and consistently better than current state-of-the-art across a variety of n-ary relational benchmarks. Our code is publicly available.

Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. Deep metric learning aims to learn deep neural networks for feature embeddings, distances of which satisfy given constraint. In deep metric learning, ensemble takes average of distances learned by multiple learners. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.

北京阿比特科技有限公司