A级日本乱理伦片免费入口_国产视频999免费在线观看_中国韩国日本欧美一区_中文字幕乱偷免费视_18禁国产精品久久久久久KTV_国产真人二三级高清视频_又黄又硬又爽又粗免费视频

The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platforms, LimSim++ addresses the need for a long-term closed-loop infrastructure supporting continuous learning and improved generalization in autonomous driving. The platform offers extended-duration, multi-scenario simulations, providing crucial information for (M)LLM-driven vehicles. Users can engage in prompt engineering, model evaluation, and framework enhancement, making LimSim++ a versatile tool for research and practice. This paper additionally introduces a baseline (M)LLM-driven framework, systematically validated through quantitative experiments across diverse scenarios. The open-source resources of LimSim++ are available at: //pjlab-adg.github.io/limsim_plus/.

相關內容

多峰值

關注 2

VANET · Networking · 可約的 · Integration · AI ·

2024 年 3 月 15 日

Cyber-Twin: Digital Twin-boosted Autonomous Attack Detection for Vehicular Ad-Hoc Networks

Yagmur Yigit,Ioannis Panitsas,Leandros Maglaras,Leandros Tassiulas,Berk Canberk

from arxiv, 6 pages, 5 figures, IEEE International Conference on Communications (ICC) 2024

The rapid evolution of Vehicular Ad-hoc NETworks (VANETs) has ushered in a transformative era for intelligent transportation systems (ITS), significantly enhancing road safety and vehicular communication. However, the intricate and dynamic nature of VANETs presents formidable challenges, particularly in vehicle-to-infrastructure (V2I) communications. Roadside Units (RSUs), integral components of VANETs, are increasingly susceptible to cyberattacks, such as jamming and distributed denial of service (DDoS) attacks. These vulnerabilities pose grave risks to road safety, potentially leading to traffic congestion and vehicle malfunctions. Existing methods face difficulties in detecting dynamic attacks and integrating digital twin technology and artificial intelligence (AI) models to enhance VANET cybersecurity. Our study proposes a novel framework that combines digital twin technology with AI to enhance the security of RSUs in VANETs and address this gap. This framework enables real-time monitoring and efficient threat detection while also improving computational efficiency and reducing data transmission delay for increased energy efficiency and hardware durability. Our framework outperforms existing solutions in resource management and attack detection. It reduces RSU load and data transmission delay while achieving an optimal balance between resource consumption and high attack detection effectiveness. This highlights our commitment to secure and sustainable vehicular communication systems for smart cities.

噪聲 · Metal · 簇 · 泛函 · Performer ·

2024 年 3 月 15 日

A Data-Driven Approach for Mitigating Dark Current Noise and Bad Pixels in Complementary Metal Oxide Semiconductor Cameras for Space-based Telescopes

Peng Jia,Chao Lv,Yushan Li,Yongyang Sun,Shu Niu,Zhuoxiao Wang

from arxiv, Accepted by the AJ, comments are welcome. The complete code could be downloaded from: DOI: 10.12149/101387

In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources available on satellites present challenges in maintaining ideal observation conditions, including temperature and radiation environment. Consequently, images captured by CMOS cameras are susceptible to issues such as dark current noise and defective pixels. In this paper, we introduce a data-driven framework for mitigating dark current noise and bad pixels for CMOS cameras. Our approach involves two key steps: pixel clustering and function fitting. During pixel clustering step, we identify and group pixels exhibiting similar dark current noise properties. Subsequently, in the function fitting step, we formulate functions that capture the relationship between dark current and temperature, as dictated by the Arrhenius law. Our framework leverages ground-based test data to establish distinct temperature-dark current relations for pixels within different clusters. The cluster results could then be utilized to estimate the dark current noise level and detect bad pixels from real observational data. To assess the effectiveness of our approach, we have conducted tests using real observation data obtained from the Yangwang-1 satellite, equipped with a near-ultraviolet telescope and an optical telescope. The results show a considerable improvement in the detection efficiency of space-based telescopes.

縮放 · 設計 · 可辨認的 · Information Systems · INFORMS ·

2024 年 3 月 15 日

eKichabi v2: Designing and Scaling a Dual-Platform Agricultural Technology in Rural Tanzania

Ananditha Raghunath,Alexander Metzger,Hans Easton,XunMei Liu,Fanchong Wang,Yunqi Wang,Yunwei Zhao,Hosea Mpogole,Richard Anderson

Although farmers in Sub-Saharan Africa are accessing feature phones and smartphones at historically high rates, they face challenges finding a robust network of agricultural contacts. With collaborators, we conduct a quantitative survey of 1014 agricultural households in Kagera, Tanzania to characterize technology access, use, and comfort levels in the region. Recognizing the paucity of research on dual-platform technologies that cater to both feature phone and smartphone users, we develop and deploy eKichabi v2, a searchable directory of 9833 agriculture-related enterprises accessible via a USSD application and an Android application. To bridge the gap in affordances between the two applications, we conduct a mixed methods pilot leveraging mobile money agents as intermediators for our USSD application's users. Through our investigations, we identify the advantages, obstacles, and critical considerations in the design, implementation, and scalability of agricultural information systems tailored to both feature phone and smartphone users in Sub-Saharan Africa.

大語言模型 · Prompt · 語言模型化 · MoDELS · Performer ·

2024 年 3 月 14 日

Prompting Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving

Yizhou Zhang,Lun Du,Defu Cao,Qiang Fu,Yan Liu

from arxiv, Preprint

Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. Existing works show that appropriate prompt design, such as Chain-of-Thoughts, can unlock LLM's powerful capacity in diverse areas. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, existing prompting strategies either suffers from insufficient expressive power or intermediate errors triggered by hallucination. To make LLM more discerning to such intermediate errors, we propose to guide LLM with a Divide-and-Conquer program that simultaneously ensures superior expressive power and disentangles task decomposition, sub-task resolution, and resolution assembly process. Theoretic analysis reveals that our strategy can guide LLM to extend the expressive power of fixed-depth Transformer. Experiments indicate that our proposed method can achieve better performance than typical prompting strategies in tasks bothered by intermediate errors and deceptive contents, such as large integer multiplication, hallucination detection and misinformation detection.

FAST · MoDELS · 樣本 · 潛在 · 去噪 ·

2024 年 3 月 14 日

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

Wenyang Zhou,Zhiyang Dou,Zeyu Cao,Zhouyingcheng Liao,Jingbo Wang,Wenjia Wang,Yuan Liu,Taku Komura,Wenping Wang,Lingjie Liu

from arxiv, Project Page: //frank-zy-dou.github.io/projects/EMDM/index.html

We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a latent space can be a non-trivial effort. On the other hand, accelerating generation by naively increasing the sampling step size, e.g., DDIM, often leads to quality degradation as it fails to approximate the complex denoising distribution. To address these issues, we propose EMDM, which captures the complex distribution during multiple sampling steps in the diffusion model, allowing for much fewer sampling steps and significant acceleration in generation. This is achieved by a conditional denoising diffusion GAN to capture multimodal data distributions among arbitrary (and potentially larger) step sizes conditioned on control signals, enabling fewer-step motion sampling with high fidelity and diversity. To minimize undesired motion artifacts, geometric losses are imposed during network learning. As a result, EMDM achieves real-time motion generation and significantly improves the efficiency of motion diffusion models compared to existing methods while achieving high-quality motion generation. Our code will be publicly available upon publication.

知識 (knowledge) · MoDELS · Machine Translation · 機器翻譯 · 大語言模型 ·

2024 年 3 月 14 日

MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation

Jiahuan Li,Shanbo Cheng,Shujian Huang,Jiajun Chen

from arxiv, Accepted to NAACL-2024

Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT), yet they suffer from high computational cost and latency. Therefore, transferring translation knowledge from giant LLMs to medium-sized machine translation models is a promising research direction. However, traditional knowledge distillation methods do not take the capability of student and teacher models into consideration, therefore repeatedly teaching student models on the knowledge they have learned, and failing to extend to novel contexts and knowledge. In this paper, we propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner. Considering the current translation ability of student MT models, we only identify and correct their translation errors, instead of distilling the whole translation from the teacher. Leveraging the strong language abilities of LLMs, we instruct LLM teachers to synthesize diverse contexts and anticipate more potential errors for the student. Experiment results on translating both specific language phenomena and general MT benchmarks demonstrate that finetuning the student MT model on about 10% examples can achieve comparable results to the traditional knowledge distillation method, and synthesized potential errors and diverse contexts further improve translation performances on unseen contexts and words.

MoDELS · 語言模型化 · Performer · 多樣性 · 大語言模型 ·

2024 年 3 月 14 日

Komodo: A Linguistic Expedition into Indonesia's Regional Languages

Louis Owen,Vishesh Tripathi,Abhay Kumar,Biddwan Ahmed

from arxiv, 30 Pages, 8 Figures, 4 Tables

The recent breakthroughs in Large Language Models (LLMs) have mostly focused on languages with easily available and sufficient resources, such as English. However, there remains a significant gap for languages that lack sufficient linguistic resources in the public domain. Our work introduces Komodo-7B, 7-billion-parameter Large Language Models designed to address this gap by seamlessly operating across Indonesian, English, and 11 regional languages in Indonesia. Komodo-7B is a family of LLMs that consist of Komodo-7B-Base and Komodo-7B-Instruct. Komodo-7B-Instruct stands out by achieving state-of-the-art performance in various tasks and languages, outperforming the benchmarks set by OpenAI's GPT-3.5, Cohere's Aya-101, Llama-2-Chat-13B, Mixtral-8x7B-Instruct-v0.1, Gemma-7B-it , and many more. This model not only demonstrates superior performance in both language-specific and overall assessments but also highlights its capability to excel in linguistic diversity. Our commitment to advancing language models extends beyond well-resourced languages, aiming to bridge the gap for those with limited linguistic assets. Additionally, Komodo-7B-Instruct's better cross-language understanding contributes to addressing educational disparities in Indonesia, offering direct translations from English to 11 regional languages, a significant improvement compared to existing language translation services. Komodo-7B represents a crucial step towards inclusivity and effectiveness in language models, providing to the linguistic needs of diverse communities.

MoDELS · 多峰值 · Performer · 縮放 · 詞元分析器 ·

2024 年 3 月 14 日

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan,Yousong Zhu,Hongyin Zhao,Fan Yang,Ming Tang,Jinqiao Wang

from arxiv, Tech report working in progress. Codes, models and datasets will be released at //github.com/jefferyZhan/Griffon

Large Vision Language Models have achieved fine-grained object perception, but the limitation of image resolution remains a significant obstacle to surpass the performance of task-specific experts in complex and dense scenarios. Such limitation further restricts the model's potential to achieve nuanced visual and language referring in domains such as GUI Agents, Counting and \etc. To address this issue, we introduce a unified high-resolution generalist model, Griffon v2, enabling flexible object referring with visual and textual prompts. To efficiently scaling up image resolution, we design a simple and lightweight down-sampling projector to overcome the input tokens constraint in Large Language Models. This design inherently preserves the complete contexts and fine details, and significantly improves multimodal perception ability especially for small objects. Building upon this, we further equip the model with visual-language co-referring capabilities through a plug-and-play visual tokenizer. It enables user-friendly interaction with flexible target images, free-form texts and even coordinates. Experiments demonstrate that Griffon v2 can localize any objects of interest with visual and textual referring, achieve state-of-the-art performance on REC, phrase grounding, and REG tasks, and outperform expert models in object detection and object counting. Data, codes and models will be released at //github.com/jefferyZhan/Griffon.

相似度 · 損失 · OIM · 規范化的 · 余弦 ·

2024 年 3 月 14 日

GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds

Ayako Yamamoto,Toshio Irino,Fuki Miyazaki,Honoka Tamaru

from arxiv, This paper was submitted to JASA on March 14, 2024

We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), which can predict the speech intelligibility (SI) of simulated hearing loss (HL) sounds for normal hearing (NH) listeners. GESI is an intrusive method that computes the SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. The unique features of GESI are that i) it reflects the hearing impaired (HI) listener's HL that appears in the audiogram and is caused by active and passive cochlear dysfunction, ii) it provides a single goodness metric, as in the widely used STOI and ESTOI, that can be used immediately to evaluate SE algorithms, and iii) it provides a simple control parameter to accept the level asymmetry of the reference and test sounds and to deal with individual listening conditions and environments. We evaluated GESI and the conventional OIMs, STOI, ESTOI, MBSTOI, and HASPI versions 1 and 2 by using four SI experiments on words of male and female speech sounds in both laboratory and remote environments. GESI was shown to outperform the other OIMs in the evaluations. GESI could be used to improve SE algorithms in assistive listening devices for individual HI listeners.

GANs · Taxonomy · Vision · 生成式對抗網絡 · 計算機視覺 ·

2020 年 12 月 21 日

Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

Zhengwei Wang,Qi She,Tomas E. Ward

from arxiv, Accepted by ACM Computing Surveys, 23 November 2020

Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are: (1) the generation of high quality images, (2) diversity of image generation, and (3) stable training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state of the art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress towards addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress towards important computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Code related to GAN-variants studied in this work is summarized on //github.com/sheqi/GAN_Review.