苹果电影在线观看免费高清_中文字幕无码乱人伦漫画_五月天婷婷丁香在线观看_国产精品久久久久久无毒不卡_亚洲精品粉嫩区偷拍无码_亚洲制服另类无码专区_欧美日韩国产一级视频

This work highlights a critical shortcoming in text-based Large Language Models (LLMs) used for human-robot interaction, demonstrating that text alone as a conversation modality falls short in such applications. While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present "Beyond Text"; an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations. This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 48.30%, but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. "Beyond Text" marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.

相關內容

大語言模(mo)型

關注 56

大(da)語(yu)(yu)言模型是(shi)基于海量文(wen)(wen)(wen)本(ben)數(shu)據訓練的深(shen)度學習模型。它不(bu)僅能夠(gou)生(sheng)成(cheng)自然(ran)語(yu)(yu)言文(wen)(wen)(wen)本(ben)，還能夠(gou)深(shen)入理(li)解(jie)文(wen)(wen)(wen)本(ben)含(han)義，處理(li)各種自然(ran)語(yu)(yu)言任(ren)(ren)務，如文(wen)(wen)(wen)本(ben)摘(zhai)要、問答、翻譯等。2023年(nian)，大(da)語(yu)(yu)言模型及(ji)其(qi)(qi)在人工智(zhi)能領域的應用(yong)已成(cheng)為(wei)全球(qiu)科技(ji)(ji)研究的熱點，其(qi)(qi)在規模上的增長尤為(wei)引人注目，參數(shu)量已從最(zui)初的十幾億(yi)躍升(sheng)到如今的一萬(wan)億(yi)。參數(shu)量的提(ti)升(sheng)使(shi)得模型能夠(gou)更加精細地(di)捕捉人類(lei)語(yu)(yu)言微妙之處，更加深(shen)入地(di)理(li)解(jie)人類(lei)語(yu)(yu)言的復雜性。在過去的一年(nian)里(li)，大(da)語(yu)(yu)言模型在吸(xi)納(na)新(xin)知識、分解(jie)復雜任(ren)(ren)務以及(ji)圖文(wen)(wen)(wen)對齊等多(duo)方(fang)面都有顯著提(ti)升(sheng)。隨著技(ji)(ji)術(shu)的不(bu)斷成(cheng)熟(shu)，它將不(bu)斷拓展其(qi)(qi)應用(yong)范圍，為(wei)人類(lei)提(ti)供更加智(zhi)能化和個性化的服務，進(jin)一步改善人們的生(sheng)活和生(sheng)產方(fang)式(shi)。

INTERACT · MoDELS · 語言模型化 · Learning · 可約的 ·

2024 年 3 月 19 日

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Yangyi Chen,Karan Sikka,Michael Cogswell,Heng Ji,Ajay Divakaran

from arxiv, CVPR 2024. The feedback datasets are released at: //huggingface.co/datasets/YangyiYY/LVLM_NLF

We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences. The refinement NLF offers concrete suggestions for improvement and is adopted to improve the interaction ability of the LVLMs-- which focuses on LVLMs' ability to refine responses by incorporating feedback in multi-turn interactions. To address the non-differentiable nature of NLF, we generalize conditional reinforcement learning for training. Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses, and more effectively learn from feedback during multi-turn interactions compared to SOTA LVMLs.

圖像分割 · Segment Anything · Prompt · Performer · 估計/估計量 ·

2024 年 3 月 18 日

Enhancing the Reliability of Segment Anything Model for Auto-Prompting Medical Image Segmentation with Uncertainty Rectification

Yichi Zhang,Shiyao Hu,Sijie Ren,Chen Jiang,Yuan Cheng,Yuan Qi

The Segment Anything Model (SAM) has recently emerged as a groundbreaking foundation model for prompt-driven image segmentation tasks. However, both the original SAM and its medical variants require slice-by-slice manual prompting of target structures, which directly increase the burden for applications. Despite attempts of auto-prompting to turn SAM into a fully automatic manner, it still exhibits subpar performance and lacks of reliability especially in the field of medical imaging. In this paper, we propose UR-SAM, an uncertainty rectified SAM framework to enhance the reliability for auto-prompting medical image segmentation. Building upon a localization framework for automatic prompt generation, our method incorporates a prompt augmentation module to obtain a series of input prompts for SAM for uncertainty estimation and an uncertainty-based rectification module to further utilize the distribution of estimated uncertainty to improve the segmentation performance. Extensive experiments on two public 3D medical datasets covering the segmentation of 35 organs demonstrate that without supplementary training or fine-tuning, our method further improves the segmentation performance with up to 10.7 % and 13.8 % in dice similarity coefficient, demonstrating efficiency and broad capabilities for medical image segmentation without manual prompting.

生成式人工智能 · AI · MoDELS · 大學 · Taxonomy ·

2024 年 3 月 18 日

Embracing the Generative AI Revolution: Advancing Tertiary Education in Cybersecurity with GPT

Raza Nowrozy,David Jam

The rapid advancement of generative Artificial Intelligence (AI) technologies, particularly Generative Pre-trained Transformer (GPT) models such as ChatGPT, has the potential to significantly impact cybersecurity. In this study, we investigated the impact of GPTs, specifically ChatGPT, on tertiary education in cybersecurity, and provided recommendations for universities to adapt their curricula to meet the evolving needs of the industry. Our research highlighted the importance of understanding the alignment between GPT's ``mental model'' and human cognition, as well as the enhancement of GPT capabilities to human skills based on Bloom's taxonomy. By analyzing current educational practices and the alignment of curricula with industry requirements, we concluded that universities providing practical degrees like cybersecurity should align closely with industry demand and embrace the inevitable generative AI revolution, while applying stringent ethics oversight to safeguard responsible GPT usage. We proposed a set of recommendations focused on updating university curricula, promoting agility within universities, fostering collaboration between academia, industry, and policymakers, and evaluating and assessing educational outcomes.

AVS · Automator · 評論員 · Integration · 設計 ·

2024 年 3 月 18 日

Holistic HMI Design for Automated Vehicles: Bridging In-Vehicle and External Communication

Haoyu Dong,Tram Thi Minh Tran,Pavlo Bazilinskyy,Marius Hoggenmüller,Debargha Dey,Silvia Cazacu,Mervyn Franssen,Ruolin Gao

As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic approach to HMI designs, which promotes the integration of both in-vehicle user and external road user perspectives. This approach aims to create a unified and coherent experience for different stakeholders interacting with AVs. This workshop seeks to bring together designers, engineers, researchers, and other stakeholders to delve into relevant use cases, exploring the potential advantages and challenges of this approach. The insights generated from this workshop aim to inform further design and research in the development of coherent HMIs for AVs, ultimately for more seamless integration of AVs into existing traffic.

MoDELS · contrastive · 潛變量/隱變量 · Extensibility · INFORMS ·

2024 年 3 月 17 日

CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

Xiaoyu Wu,Yang Hua,Chumeng Liang,Jiaru Zhang,Hao Wang,Tao Song,Haibing Guan

from arxiv, Accepted by CVPR 2024

Diffusion Models (DMs) have evolved into advanced image generation tools, especially for few-shot generation where a pretrained model is fine-tuned on a small set of images to capture a specific style or object. Despite their success, concerns exist about potential copyright violations stemming from the use of unauthorized data in this process. In response, we present Contrasting Gradient Inversion for Diffusion Models (CGI-DM), a novel method featuring vivid visual representations for digital copyright authentication. Our approach involves removing partial information of an image and recovering missing details by exploiting conceptual differences between the pretrained and fine-tuned models. We formulate the differences as KL divergence between latent variables of the two models when given the same input image, which can be maximized through Monte Carlo sampling and Projected Gradient Descent (PGD). The similarity between original and recovered images serves as a strong indicator of potential infringements. Extensive experiments on the WikiArt and Dreambooth datasets demonstrate the high accuracy of CGI-DM in digital copyright authentication, surpassing alternative validation techniques. Code implementation is available at //github.com/Nicholas0228/Revelio.

Prompt · Better · 變換 · Attention · 推斷 ·

2024 年 3 月 15 日

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

Cong Wang,Jinshan Pan,Wanyu Lin,Jiangxin Dong,Xiao-Ming Wu

from arxiv, Accepted by AAAI24. Source codes will be made available at: //github.com/supersupercong/SelfPromer

This work presents an effective depth-consistency self-prompt Transformer for image dehazing. It is motivated by an observation that the estimated depths of an image with haze residuals and its clear counterpart vary. Enforcing the depth consistency of dehazed images with clear ones, therefore, is essential for dehazing. For this purpose, we develop a prompt based on the features of depth differences between the hazy input images and corresponding clear counterparts that can guide dehazing models for better restoration. Specifically, we first apply deep features extracted from the input images to the depth difference features for generating the prompt that contains the haze residual information in the input. Then we propose a prompt embedding module that is designed to perceive the haze residuals, by linearly adding the prompt to the deep features. Further, we develop an effective prompt attention module to pay more attention to haze residuals for better removal. By incorporating the prompt, prompt embedding, and prompt attention into an encoder-decoder network based on VQGAN, we can achieve better perception quality. As the depths of clear images are not available at inference, and the dehazed images with one-time feed-forward execution may still contain a portion of haze residuals, we propose a new continuous self-prompt inference that can iteratively correct the dehazing model towards better haze-free image generation. Extensive experiments show that our method performs favorably against the state-of-the-art approaches on both synthetic and real-world datasets in terms of perception metrics including NIQE, PI, and PIQE.

優化器 · 控制器 · 平穩的 · 離散化 · Lipschitz ·

2024 年 3 月 15 日

Optimal Control of Stationary Doubly Diffusive Flows on Two and Three Dimensional Bounded Lipschitz Domains: Numerical Analysis

Jai Tushar,Arbaz Khan,Manil T. Mohan

In this work, we propose fully nonconforming, locally exactly divergence-free discretizations based on lowest order Crouziex-Raviart finite element and piecewise constant spaces to study the optimal control of stationary double diffusion model presented in [B\"urger, M\'endez, Ruiz-Baier, SINUM (2019), 57:1318-1343]. The well-posedness of the discrete uncontrolled state and adjoint equations are discussed using discrete lifting and fixed point arguments, and convergence results are derived rigorously under minimal regularity. Building upon our recent work [Tushar, Khan, Mohan arXiv (2023)], we prove the local optimality of a reference control using second-order sufficient optimality condition for the control problem, and use it along with an optimize-then-discretize approach to prove optimal order a priori error estimates for the control, state and adjoint variables upto the regularity of the solution. The optimal control is computed using a primal-dual active set strategy as a semi-smooth Newton method and computational tests validate the predicted error decay rates and illustrate the proposed scheme's applicability to optimal control of thermohaline circulation problems.

語言模型化 · MoDELS · Unstructured · Automator · INFORMS ·

2024 年 3 月 15 日

Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain

Eugene Jang,Jian Cui,Dayeon Yim,Youngjin Jin,Jin-Woo Chung,Seungwon Shin,Yongjae Lee

from arxiv, To appear in NAACL Findings 2024

Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (such as URLs and hash values) that could be unsuitable with the established pretraining methodologies. Previous work in other domains have removed or filtered such text as noise, but the effectiveness of these methods have not been investigated, especially in the cybersecurity domain. We propose different pretraining methodologies and evaluate their effectiveness through downstream tasks and probing tasks. Our proposed strategy (selective MLM and jointly training NLE token classification) outperforms the commonly taken approach of replacing non-linguistic elements (NLEs). We use our domain-customized methodology to train CyBERTuned, a cybersecurity domain language model that outperforms other cybersecurity PLMs on most tasks.

大語言模型 · 語言模型化 · MoDELS · AI · INTERACT ·

2023 年 12 月 23 日

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Xupeng Miao,Gabriele Oliaro,Zhihao Zhang,Xinhao Cheng,Hongyi Jin,Tianqi Chen,Zhihao Jia

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

多峰值 · 學成 · 知識 (knowledge) · Performer · 講稿 ·

2022 年 5 月 3 日

Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review

Jiaxin Li,Danfeng Hong,Lianru Gao,Jing Yao,Ke Zheng,Bing Zhang,Jocelyn Chanussot

With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.