高清一区二区三区视频在线观看,精品人妻视频一区二区三区,欧美黑白配黄色高清免费一级视频,国产精品免费在线一区二区

We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That. We achieve robot planning for general tasks by leveraging the power of video generative models trained on internet-scale data containing rich physical and semantic context. In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task communication with simple human instructions, 2) controllable video generation that respects user intents, and 3) translating visual planning into robot actions. We propose language-gesture conditioning to generate videos, which is both simpler and clearer than existing language-only methods, especially in complex and uncertain environments. We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness in addressing the above three challenges, and justifies the use of video generation as an intermediate representation for generalizable task planning and execution. Project website: //cfeng16.github.io/this-and-that/.

相關內容

控制器

關注 5

Vision · Analysis · MoDELS · 計算機視覺 · 數據分析 ·

2024 年 8 月 22 日

Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis

Zhixiang Guo,Xinming Wu,Luming Liang,Hanlin Sheng,Nuo Chen,Zhengfa Bi

We explore adapting foundation models (FMs) from the computer vision domain to geoscience. FMs, large neural networks trained on massive datasets, excel in diverse tasks with remarkable adaptability and generality. However, geoscience faces challenges like lacking curated training datasets and high computational costs for developing specialized FMs. This study considers adapting FMs from computer vision to geoscience, analyzing their scale, adaptability, and generality for geoscientific data analysis. We introduce a workflow that leverages existing computer vision FMs, fine-tuning them for geoscientific tasks, reducing development costs while enhancing accuracy. Through experiments, we demonstrate this workflow's effectiveness in broad applications to process and interpret geoscientific data of lunar images, seismic data, DAS arrays and so on. Our findings introduce advanced ML techniques to geoscience, proving the feasibility and advantages of cross-domain FMs adaptation, driving further advancements in geoscientific data analysis and offering valuable insights for FMs applications in other scientific domains.

可約的 · WiFi · 分離的 · 模型評估 · MoDELS ·

2024 年 8 月 21 日

Anteumbler: Non-Invasive Antenna Orientation Error Measurement for WiFi APs

Dawei Yan,Panlong Yang,Fei Shang,Nikolaos M. Freris,Yubo Yan

The performance of WiFi-based localization systems is affected by the spatial accuracy of WiFi AP. Compared with the imprecision of AP location and antenna separation, the imprecision of AP's or antenna's orientation is more important in real scenarios, including AP rotation and antenna irregular tilt. In this paper, we propose Anteumbler that non-invasively, accurately and efficiently measures the orientation of each antenna in physical space. Based on the fact that the received power is maximized when a Tx-Rx antenna pair is perfectly aligned, we construct a spatial angle model that can obtain the antennas' orientations without prior knowledge. However, the sampling points of traversing the spatial angle need to cover the entire space. We use the orthogonality of antenna directivity and polarization and adopt an iterative algorithm to reduce the sampling points by hundreds of times, which greatly improves the efficiency. To achieve the required antenna orientation accuracy, we eliminate the influence of propagation distance using a dual plane intersection model and filter out ambient noise. Our real-world experiments with six antenna types, two antenna layouts and two antenna separations show that Anteumbler achieves median errors below 6 degree for both elevation and azimuth angles, and is robust to NLoS and dynamic environments. Last but not least, for the reverse localization system, we deploy Anteumbler over LocAP and reduce the antenna separation error by 10 mm, while for the user localization system, we deploy Anteumbler over SpotFi and reduce the user localization error by more than 1 m.

MoDELS · 過擬合 · 損失 · 損失函數（機器學習） · Processing（編程語言） ·

2024 年 8 月 21 日

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

Guang Yang,Yu Zhou,Xiang Chen,Xiangyu Zhang,Terry Yue Zhuo,David Lo,Taolue Chen

from arxiv, Under Review; Waiting for updates

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.

ReQuEST · Vision · 機器人 · Performer · Integration ·

2024 年 8 月 19 日

Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Koffivi Fidèle Gbagbe,Miguel Altamirano Cabrera,Ali Alabbas,Oussama Alyunes,Artem Lykov,Dzmitry Tsetserukou

from arxiv, The paper was accepted to the IEEE SMC 2024

This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulation that seamlessly integrates vision for scene understanding, language comprehension for translating human instructions into executable code, and physical action generation. We evaluated the system's functionality through a series of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to prepare the requested salad. We assessed the system's performance in terms of accuracy, efficiency, and adaptability to different salad recipes and human preferences through a series of experiments. Our results show a 100% success rate in generating the correct executable code by the Language Module, a 96.06% success rate in detecting specific ingredients by the Vision Module, and an overall success rate of 83.4% in correctly executing user-requested tasks.

Learning · contrastive · 表示 · Microsoft Surface · 對比學習 ·

2024 年 8 月 15 日

HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning

Hongyu Li,Snehal Dikhale,Jinda Cui,Soshi Iba,Nawid Jamali

from arxiv, Accepted by IROS 2024

To achieve dexterity comparable to that of humans, robots must intelligently process tactile sensor data. Taxel-based tactile signals often have low spatial-resolution, with non-standardized representations. In this paper, we propose a novel framework, HyperTaxel, for learning a geometrically-informed representation of taxel-based tactile signals to address challenges associated with their spatial resolution. We use this representation and a contrastive learning objective to encode and map sparse low-resolution taxel signals to high-resolution contact surfaces. To address the uncertainty inherent in these signals, we leverage joint probability distributions across multiple simultaneous contacts to improve taxel hyper-resolution. We evaluate our representation by comparing it with two baselines and present results that suggest our representation outperforms the baselines. Furthermore, we present qualitative results that demonstrate the learned representation captures the geometric features of the contact surface, such as flatness, curvature, and edges, and generalizes across different objects and sensor configurations. Moreover, we present results that suggest our representation improves the performance of various downstream tasks, such as surface classification, 6D in-hand pose estimation, and sim-to-real transfer.

最大平均偏差 · 均值 · 核化 · Machine Learning · Learning ·

2024 年 8 月 9 日

HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling

Piotr Keller,Muhammad Dawood,Brinder Singh Chohan,Fayyaz ul Amir Afsar Minhas

from arxiv, 28 pages, 5 figures, 1 Table. Preprint for article in review at Nature Machine Intelligence

Machine learning in computational pathology (CPath) often aggregates patch-level predictions from multi-gigapixel Whole Slide Images (WSIs) to generate WSI-level prediction scores for crucial tasks such as survival prediction and drug effect prediction. However, current methods do not explicitly characterize distributional differences between patch sets within WSIs. We introduce HistoKernel, a novel Maximum Mean Discrepancy (MMD) kernel that measures distributional similarity between WSIs for enhanced prediction performance on downstream prediction tasks. Our comprehensive analysis demonstrates HistoKernel's effectiveness across various machine learning tasks, including retrieval (n = 9,362), drug sensitivity regression (n = 551), point mutation classification (n = 3,419), and survival analysis (n = 2,291), outperforming existing deep learning methods. Additionally, HistoKernel seamlessly integrates multi-modal data and offers a novel perturbation-based method for patch-level explainability. This work pioneers the use of kernel-based methods for WSI-level predictive modeling, opening new avenues for research. Code is available at //github.com/pkeller00/HistoKernel.

可約的 · 損失 · Performer · state-of-the-art · 全 ·

2024 年 8 月 6 日

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak,Gleb Mezentsev,Ivan Oseledets,Evgeny Frolov

from arxiv, 5 pages, accepted for CIKM'24

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

有偏 · MoDELS · Taxonomy · INFORMS · 可辨認的 ·

2024 年 8 月 4 日

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

Robert Wolfe,Alexis Hiniker,Bill Howe

from arxiv, Accepted at Artificial Intelligence, Ethics, and Society 2024

This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.

聯邦學習 · Learning · TransAct · 模型評估 · Analysis ·

2024 年 8 月 3 日

Fed-RD: Privacy-Preserving Federated Learning for Financial Crime Detection

Md. Saikat Islam Khan,Aparna Gupta,Oshani Seneviratne,Stacy Patterson

We introduce Federated Learning for Relational Data (Fed-RD), a novel privacy-preserving federated learning algorithm specifically developed for financial transaction datasets partitioned vertically and horizontally across parties. Fed-RD strategically employs differential privacy and secure multiparty computation to guarantee the privacy of training data. We provide theoretical analysis of the end-to-end privacy of the training algorithm and present experimental results on realistic synthetic datasets. Our results demonstrate that Fed-RD achieves high model accuracy with minimal degradation as privacy increases, while consistently surpassing benchmark results.

知識 (knowledge) · MoDELS · Processing（編程語言） · 大語言模型 · INTERACT ·

2024 年 8 月 2 日

BioRAG: A RAG-LLM Framework for Biological Question Reasoning

Chengrui Wang,Qingqing Long,Xiao Meng,Xunxin Cai,Chengjun Wu,Zhen Meng,Xuezhi Wang,Yuanchun Zhou

from arxiv, 12 pages, 7 figures

The question-answering system for Life science research, which is characterized by the rapid pace of discovery, evolving insights, and complex interactions among knowledge entities, presents unique challenges in maintaining a comprehensive knowledge warehouse and accurate information retrieval. To address these issues, we introduce BioRAG, a novel Retrieval-Augmented Generation (RAG) with the Large Language Models (LLMs) framework. Our approach starts with parsing, indexing, and segmenting an extensive collection of 22 million scientific papers as the basic knowledge, followed by training a specialized embedding model tailored to this domain. Additionally, we enhance the vector retrieval process by incorporating a domain-specific knowledge hierarchy, which aids in modeling the intricate interrelationships among each query and context. For queries requiring the most current information, BioRAG deconstructs the question and employs an iterative retrieval process incorporated with the search engine for step-by-step reasoning. Rigorous experiments have demonstrated that our model outperforms fine-tuned LLM, LLM with search engines, and other scientific RAG frameworks across multiple life science question-answering tasks.