亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we propose a novel Risk-Aware Local Trajectory Planner (RALTPER) for autonomous vehicles in complex environments characterized by Gaussian uncertainty. The proposed method integrates risk awareness and trajectory planning by leveraging probabilistic models to evaluate the likelihood of collisions with dynamic and static obstacles. The RALTPER focuses on collision avoidance constraints for both the ego vehicle region and the Gaussian-obstacle risk region. Additionally, this work enhances the generalization of both vehicle and obstacle models, making the planner adaptable to a wider range of scenarios. Our approach formulates the planning problem as a nonlinear optimization, solved using the IPOPT solver within the CasADi environment. The planner is evaluated through simulations of various challenging scenarios, including complex, static, mixed environment and narrow single-lane avoidance of pedestrians. Results demonstrate that RALTPER achieves safer and more efficient trajectory planning particularly in navigating narrow areas where a more accurate vehicle profile representation is critical for avoiding collisions.

相關內容

Navigating complex environments requires Unmanned Aerial Vehicles (UAVs) and autonomous systems to perform trajectory tracking and obstacle avoidance in real-time. While many control strategies have effectively utilized linear approximations, addressing the non-linear dynamics of UAV, especially in obstacle-dense environments, remains a key challenge that requires further research. This paper introduces a Non-linear Model Predictive Control (NMPC) framework for the DJI Matrice 100, addressing these challenges by using a dynamic model and B-spline interpolation for smooth reference trajectories, ensuring minimal deviation while respecting safety constraints. The framework supports various trajectory types and employs a penalty-based cost function for control accuracy in tight maneuvers. The framework utilizes CasADi for efficient real-time optimization, enabling the UAV to maintain robust operation even under tight computational constraints. Simulation and real-world indoor and outdoor experiments demonstrated the NMPC ability to adapt to disturbances, resulting in smooth, collision-free navigation.

We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) that utilizes Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. This span uncertainty enhances model calibration, improving robustness and mitigating semantic inconsistencies introduced by random chunking. Leveraging this insight, we propose an efficient unsupervised learning technique to train the retrieval model, alongside an effective data sampling and scaling strategy. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results while using only 4% of the training data compared to other advanced open-source retrieval models under distribution shift settings. Our method demonstrates strong calibration through span uncertainty, leading to improved generalization and robustness in long-context RAG tasks. Additionally, UncertaintyRAG provides a lightweight retrieval model that can be integrated into any large language model with varying context window lengths, without the need for fine-tuning, showcasing the flexibility of our approach.

Existing 3D facial emotion modeling have been constrained by limited emotion classes and insufficient datasets. This paper introduces "Emo3D", an extensive "Text-Image-Expression dataset" spanning a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, facilitating the capture of a broad spectrum of emotional expressions. Using this unique dataset, we conduct a comprehensive evaluation of language-based models' fine-tuning and vision-language models like Contranstive Language Image Pretraining (CLIP) for 3D facial expression synthesis. We also introduce a new evaluation metric for this task to more directly measure the conveyed emotion. Our new evaluation metric, Emo3D, demonstrates its superiority over Mean Squared Error (MSE) metrics in assessing visual-text alignment and semantic richness in 3D facial expressions associated with human emotions. "Emo3D" has great applications in animation design, virtual reality, and emotional human-computer interaction.

We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.

Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task.

We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (-13% in OSR and -13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance scheme where the ground agent retains its independent exploratory behaviour shows a 10% OSR and 7.65% SPL improvement. To explain navigation performance, we analyze the GC for unique traits, quantifying the presence of hallucination and cooperation. Specifically, we identify the novel linguistic trait of preemptive hallucination in our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move, and note its strong correlation with navigation performance. We conduct real-world experiments and present some qualitative examples where we mitigate hallucinations via prompt finetuning to improve ObjectNav performance.

To ensure privacy protection and alleviate computational burden, we propose a fast subsmaling procedure for the Cox model with massive survival datasets from multi-centered, decentralized sources. The proposed estimator is computed based on optimal subsampling probabilities that we derived and enables transmission of subsample-based summary level statistics between different storage sites with only one round of communication. For inference, the asymptotic properties of the proposed estimator were rigorously established. An extensive simulation study demonstrated that the proposed approach is effective. The methodology was applied to analyze a large dataset from the U.S. airlines.

The impressive performance of proprietary LLMs like GPT4 in code generation has led to a trend to replicate these capabilities in open-source models through knowledge distillation (e.g. Code Evol-Instruct). However, these efforts often neglect the crucial aspect of response quality, relying heavily on teacher models for direct response distillation. This paradigm, especially for complex instructions, can degrade the quality of synthesized data, compromising the knowledge distillation process. To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation. The first stage, modular decomposition, breaks down the direct response into more manageable sub-modules. The second stage, adaptive response evolution, automatically evolves the response with the related function modules. Our experiments with three popular code benchmarks (HumanEval, MBPP, and EvalPlus) attest to the superiority of the AMR-Evol framework over baseline response distillation methods. By comparing with the open-source Code LLMs trained on a similar scale of data, we observed performance enhancements: more than +3.0 points on HumanEval-Plus and +1.0 points on MBPP-Plus, which underscores the effectiveness of our framework. Our codes are available at //github.com/ChiYeungLaw/AMR-Evol.

The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this end, we propose a Self-Challenge evaluation framework with human-in-the-loop. Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances and incorporate human feedback on them to refine these patterns for generating more challenging data, iteratively. We end up with 8 diverse patterns, such as text manipulation and questions with assumptions. We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses. The SC-G4 serves as a challenging benchmark that allows for a detailed assessment of LLMs' abilities. Our results show that only 44.96\% of instances in SC-G4 can be answered correctly by GPT-4. Interestingly, our pilot study indicates that these error patterns also challenge other LLMs, such as Claude-3 and Llama-3, and cannot be fully resolved through fine-tuning. Our work takes the first step to demonstrate that LLMs can autonomously identify their inherent flaws and provide insights for future dynamic and automatic evaluation.

We present CROSS-GAiT, a novel algorithm for quadruped robots that uses Cross Attention to fuse terrain representations derived from visual and time-series inputs, including linear accelerations, angular velocities, and joint efforts. These fused representations are used to adjust the robot's step height and hip splay, enabling adaptive gaits that respond dynamically to varying terrain conditions. We generate these terrain representations by processing visual inputs through a masked Vision Transformer (ViT) encoder and time-series data through a dilated causal convolutional encoder. The cross-attention mechanism then selects and integrates the most relevant features from each modality, combining terrain characteristics with robot dynamics for better-informed gait adjustments. CROSS-GAiT uses the combined representation to dynamically adjust gait parameters in response to varying and unpredictable terrains. We train CROSS-GAiT on data from diverse terrains, including asphalt, concrete, brick pavements, grass, dense vegetation, pebbles, gravel, and sand. Our algorithm generalizes well and adapts to unseen environmental conditions, enhancing real-time navigation performance. CROSS-GAiT was implemented on a Ghost Robotics Vision 60 robot and extensively tested in complex terrains with high vegetation density, uneven/unstable surfaces, sand banks, deformable substrates, etc. We observe at least a 7.04% reduction in IMU energy density and a 27.3% reduction in total joint effort, which directly correlates with increased stability and reduced energy usage when compared to state-of-the-art methods. Furthermore, CROSS-GAiT demonstrates at least a 64.5% increase in success rate and a 4.91% reduction in time to reach the goal in four complex scenarios. Additionally, the learned representations perform 4.48% better than the state-of-the-art on a terrain classification task.

北京阿比特科技有限公司