亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Bag manipulation through robots is complex and challenging due to the deformability of the bag. Based on dynamic manipulation strategy, we propose a new framework, ShakingBot, for the bagging tasks. ShakingBot utilizes a perception module to identify the key region of the plastic bag from arbitrary initial configurations. According to the segmentation, ShakingBot iteratively executes a novel set of actions, including Bag Adjustment, Dual-arm Shaking, and One-arm Holding, to open the bag. The dynamic action, Dual-arm Shaking, can effectively open the bag without the need to account for the crumpled configuration.Then, we insert the items and lift the bag for transport. We perform our method on a dual-arm robot and achieve a success rate of 21/33 for inserting at least one item across various initial bag configurations. In this work, we demonstrate the performance of dynamic shaking actions compared to the quasi-static manipulation in the bagging task. We also show that our method generalizes to variations despite the bag's size, pattern, and color.

相關內容

The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes. However, even though the image and language representations have been aligned by cross-modal models like CLIP, we find that the image modality fails to contribute as much as the language in existing multi-modal 3D representation learning methods. This is attributed to the domain shift in the 2D images and the distinct focus of each modality. To more effectively leverage both modalities in the pre-training, we introduce TriAdapter Multi-Modal Learning (TAMM) -- a novel two-stage learning approach based on three synergetic adapters. First, our CLIP Image Adapter mitigates the domain gap between 3D-rendered images and natural images, by adapting the visual representations of CLIP for synthetic image-text pairs. Subsequently, our Dual Adapters decouple the 3D shape representation space into two complementary sub-spaces: one focusing on visual attributes and the other for semantic understanding, which ensure a more comprehensive and effective multi-modal pre-training. Extensive experiments demonstrate that TAMM consistently enhances 3D representations for a wide range of 3D encoder architectures, pre-training datasets, and downstream tasks. Notably, we boost the zero-shot classification accuracy on Objaverse-LVIS from 46.8 to 50.7, and improve the 5-way 10-shot linear probing classification accuracy on ModelNet40 from 96.1 to 99.0. Project page: \url{//alanzhangcs.github.io/tamm-page}.

Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of Large Language Models (LLMs) to edit code based on user instructions. Evaluated on a novel human-written execution-based benchmark dubbed EditEval, we found current models often struggle to fulfill the instructions. In light of this, we contribute InstructCoder, the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing, containing high-diversity code-editing tasks such as comment insertion, code optimization, and code refactoring. It consists of over 114,000 instruction-input-output triplets and covers multiple distinct code editing scenarios. The collection process starts with filtered commit data sourced from GitHub Python repositories as seeds. Subsequently, the dataset is systematically expanded through an iterative process, where both seed and generated tasks are used to prompt ChatGPT for more data. Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits, exhibiting superior code-editing performance matching advanced proprietary LLMs. The datasets and the source code are publicly available at //github.com/qishenghu/CodeInstruct.

In the field of robotics and automation, conventional object recognition and instance segmentation methods face a formidable challenge when it comes to perceiving Deformable Linear Objects (DLOs) like wires, cables, and flexible tubes. This challenge arises primarily from the lack of distinct attributes such as shape, color, and texture, which calls for tailored solutions to achieve precise identification. In this work, we propose a foundation model-based DLO instance segmentation technique that is text-promptable and user-friendly. Specifically, our approach combines the text-conditioned semantic segmentation capabilities of CLIPSeg model with the zero-shot generalization capabilities of Segment Anything Model (SAM). We show that our method exceeds SOTA performance on DLO instance segmentation, achieving a mIoU of $91.21\%$. We also introduce a rich and diverse DLO-specific dataset for instance segmentation.

The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which developers use to write software are often used as an interface to interact with LLMs. Although many such tools have been released, almost all of them focus on general-purpose programming languages. Domain-specific languages, such as those crucial for IT automation, have not received much attention. Ansible is one such YAML-based IT automation-specific language. Red Hat Ansible Lightspeed with IBM Watson Code Assistant, further referred to as Ansible Lightspeed, is an LLM-based service designed explicitly for natural language to Ansible code generation. In this paper, we describe the design and implementation of the Ansible Lightspeed service and analyze feedback from thousands of real users. We examine diverse performance indicators, classified according to both immediate and extended utilization patterns along with user sentiments. The analysis shows that the user acceptance rate of Ansible Lightspeed suggestions is higher than comparable tools that are more general and not specific to a programming language. This remains true even after we use much more stringent criteria for what is considered an accepted model suggestion, discarding suggestions which were heavily edited after being accepted. The relatively high acceptance rate results in higher-than-expected user retention and generally positive user feedback. This paper provides insights on how a comparatively small, dedicated model performs on a domain-specific language and more importantly, how it is received by users.

Incompressibility is a fundamental condition in most fluid models. Accumulation of simulation errors violates it and causes volume loss. Past work suggested correction methods to battle it. These methods, however, are imperfect and in some cases inadequate. We present a method for fluid simulation that strictly enforces incompressibility based on a grid-related definition of discrete incompressibility. We formulate a linear programming (LP) problem that bounds the number of particles that end up in each grid cell. A variant of the band method is offered for acceleration, which requires special constraints to ensure volume preservation. Further acceleration is achieved by simplifying the problem and adding a special band correction step that is formulated as a minimum-cost flow problem (MCFP). We also address coupling with solids in our framework and demonstrate advantages over prior work.

Product embedding serves as a cornerstone for a wide range of applications in eCommerce. The product embedding learned from multiple modalities shows significant improvement over that from a single modality, since different modalities provide complementary information. However, some modalities are more informatively dominant than others. How to teach a model to learn embedding from different modalities without neglecting information from the less dominant modality is challenging. We present an image-text embedding model (ITEm), an unsupervised learning method that is designed to better attend to image and text modalities. We extend BERT by (1) learning an embedding from text and image without knowing the regions of interest; (2) training a global representation to predict masked words and to construct masked image patches without their individual representations. We evaluate the pre-trained ITEm on two tasks: the search for extremely similar products and the prediction of product categories, showing substantial gains compared to strong baseline models.

Vehicular networks use Decentralized Congestion Control (DCC) mechanisms to operate effectively, but this mechanism may introduce queuing delays. Freshness of Cooperative Awareness Messages (CAMs) is critical for their usefulness. In this letter we explore how the presence of other types of traffic additional to CAMs, even with lower priorities, has an impact on the freshness of CAM messages due to DCC queuing. Finally, we propose Generate-on-Time (GoT), which is a simple mechanism that reduces DCC queuing delays for CAM messages without introducing any downside in other performance metrics.

The relevant features for a machine learning task may arrive as one or more continuous streams of data. Serving machine learning models over streams of data creates a number of interesting systems challenges in managing data routing, time-synchronization, and rate control. This paper presents EdgeServe, a distributed streaming system that can serve predictions from machine learning models in real time. We evaluate EdgeServe on three streaming prediction tasks: (1) human activity recognition, (2) autonomous driving, and (3) network intrusion detection.

Knowledge graphs are important resources for many artificial intelligence tasks but often suffer from incompleteness. In this work, we propose to use pre-trained language models for knowledge graph completion. We treat triples in knowledge graphs as textual sequences and propose a novel framework named Knowledge Graph Bidirectional Encoder Representations from Transformer (KG-BERT) to model these triples. Our method takes entity and relation descriptions of a triple as input and computes scoring function of the triple with the KG-BERT language model. Experimental results on multiple benchmark knowledge graphs show that our method can achieve state-of-the-art performance in triple classification, link prediction and relation prediction tasks.

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

北京阿比特科技有限公司