亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Pedestrian trajectory prediction, vital for selfdriving cars and socially-aware robots, is complicated due to intricate interactions between pedestrians, their environment, and other Vulnerable Road Users. This paper presents GSGFormer, an innovative generative model adept at predicting pedestrian trajectories by considering these complex interactions and offering a plethora of potential modal behaviors. We incorporate a heterogeneous graph neural network to capture interactions between pedestrians, semantic maps, and potential destinations. The Transformer module extracts temporal features, while our novel CVAE-Residual-GMM module promotes diverse behavioral modality generation. Through evaluations on multiple public datasets, GSGFormer not only outperforms leading methods with ample data but also remains competitive when data is limited.

相關內容

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來,這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接: · INFORMS · Performer · MoDELS · entity ·
2024 年 1 月 29 日

Multi-agent motion prediction is a crucial concern in autonomous driving, yet it remains a challenge owing to the ambiguous intentions of dynamic agents and their intricate interactions. Existing studies have attempted to capture interactions between road entities by using the definite data in history timesteps, as future information is not available and involves high uncertainty. However, without sufficient guidance for capturing future states of interacting agents, they frequently produce unrealistic trajectory overlaps. In this work, we propose Future Interaction modeling for Motion Prediction (FIMP), which captures potential future interactions in an end-to-end manner. FIMP adopts a future decoder that implicitly extracts the potential future information in an intermediate feature-level, and identifies the interacting entity pairs through future affinity learning and top-k filtering strategy. Experiments show that our future interaction modeling improves the performance remarkably, leading to superior performance on the Argoverse motion forecasting benchmark.

Dialogue systems controlled by predefined or rule-based scenarios derived from counseling techniques, such as cognitive behavioral therapy (CBT), play an important role in mental health apps. Despite the need for responsible responses, it is conceivable that using the newly emerging LLMs to generate contextually relevant utterances will enhance these apps. In this study, we construct dialogue modules based on a CBT scenario focused on conventional Socratic questioning using two kinds of LLMs: a Transformer-based dialogue model further trained with a social media empathetic counseling dataset, provided by Osaka Prefecture (OsakaED), and GPT-4, a state-of-the art LLM created by OpenAI. By comparing systems that use LLM-generated responses with those that do not, we investigate the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality (e.g., empathy). As a result, no notable improvements are observed when using the OsakaED model. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly. Results suggest that GPT-4 possesses a high counseling ability. However, they also indicate that even when using a dialogue model trained with a human counseling dataset, it does not necessarily yield better outcomes compared to scenario-based dialogues. While presenting LLM-generated responses, including GPT-4, and having them interact directly with users in real-life mental health care services may raise ethical issues, it is still possible for human professionals to produce example responses or response templates using LLMs in advance in systems that use rules, scenarios, or example responses.

Autonomous overtaking at high speeds is a challenging multi-agent robotics research problem. The high-speed and close proximity situations that arise in multi-agent autonomous racing require designing algorithms that trade off aggressive overtaking maneuvers and minimize the risk of collision with the opponent. In this paper, we study a special case of multi-agent autonomous race, called the head-to-head autonomous race, that requires two racecars with similar performance envelopes. We present a mathematical formulation of an overtake and position defense in this head-to-head autonomous racing scenario, and we introduce the Automaton Referencing Guided Overtake System (ARGOS) framework that supervises the execution of an overtake or position defense maneuver depending on the current role of the racecar. The ARGOS framework works by decomposing complex overtake and position-defense maneuvers into sequential and temporal submaneuvers that are individually managed and supervised by a network of automatons. We verify the properties of the ARGOS framework using model-checking and demonstrate results from multiple simulations, which show that the framework meets the desired specifications. The ARGOS framework performs similar to what can be observed from real-world human-driven motor sport racing.

The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs by comprehensively understanding the entire vulnerable code, irrespective of its length. This model also integrates diverse information, encompassing vulnerable code structures and expert knowledge from the CWE system. We evaluated VulMaster on a real-world C/C++ vulnerability repair dataset comprising 1,754 projects with 5,800 vulnerable functions. The experimental results demonstrated that VulMaster exhibits substantial improvements compared to the learning-based state-of-the-art vulnerability repair approach. Specifically, VulMaster improves the EM, BLEU, and CodeBLEU scores from 10.2\% to 20.0\%, 21.3\% to 29.3\%, and 32.5\% to 40.9\%, respectively.

Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and Central networks. The proposed MAN injects cross-modal attention via its Peripheral key-value pairs within each layer of a mode-specific Central query network. The resulting cross-attended mode-specific descriptors are then combined using an Adaptive Fusion technique that enables the model to integrate the discriminative and complementary mode-specific data patterns within an instance-specific multimodal descriptor. Given a dialogue represented by a sequence of utterances, the proposed AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level. This helps not only in delivering better classification performance (3-5% improvement in Weighted-F1 and 5-7% improvement in Accuracy) in large-scale public datasets but also helps the users in understanding the reasoning behind each emotion prediction made by the model via its Multimodal Explainability Visualization module.

Narrative visualization effectively transforms data into engaging stories, making complex information accessible to a broad audience. Large models, essential for narrative visualization, inherently facilitate this process through their superior ability to handle natural language queries and answers, generate cohesive narratives, and enhance visual communication. Inspired by previous work in narrative visualization and recent advances in large models, we synthesized potential tasks and opportunities for large models at various stages of narrative visualization. In our study, we surveyed 79 papers to explore the role of large models in automating narrative visualization creation. We propose a comprehensive pipeline that leverages large models for crafting narrative visualization, categorizing the reviewed literature into four essential phases: Data, Narration, Visualization, and Presentation. Additionally, we identify nine specific tasks where large models are applied across these stages. This study maps out the landscape of challenges and opportunities in the LM4NV process, providing insightful directions for future research and valuable guidance for scholars in the field.

Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. In this work, we propose a framework, MResT (Multi-Resolution Transformer), for learning generalizable language-conditioned multi-task policies that utilize sensing at different spatial and temporal resolutions using networks of varying capacities to effectively perform real time control of precise and reactive tasks. We leverage off-the-shelf pretrained vision-language models to operate on low-frequency global features along with small non-pretrained models to adapt to high frequency local feedback. Through extensive experiments in 3 domains (coarse, precise and dynamic manipulation tasks), we show that our approach significantly improves (2X on average) over recent multi-task baselines. Further, our approach generalizes well to visual and geometric variations in target objects and to varying interaction forces.

Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions. To address this issue, a novel mechanism based on correntropy is introduced. The proposed mechanism not only can measure the relative importance of human-human interactions but also can build personal space for each pedestrian. An interaction module including this data-driven mechanism is further proposed. In the proposed module, the data-driven mechanism can effectively extract the feature representations of dynamic human-human interactions in the scene and calculate the corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, an interaction-aware architecture based on long short-term memory network for trajectory prediction is designed. Experiments are conducted on two public datasets. Experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.

Traffic forecasting is an important factor for the success of intelligent transportation systems. Deep learning models including convolution neural networks and recurrent neural networks have been applied in traffic forecasting problems to model the spatial and temporal dependencies. In recent years, to model the graph structures in the transportation systems as well as the contextual information, graph neural networks (GNNs) are introduced as new tools and have achieved the state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of recent research using different GNNs, e.g., graph convolutional and graph attention networks, in various traffic forecasting problems, e.g., road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, demand forecasting in ride-hailing platforms, etc. We also present a collection of open data and source resources for each problem, as well as future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public Github repository to update the latest papers, open data and source resources.

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

北京阿比特科技有限公司