亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Large-scale driving datasets such as Waymo Open Dataset and nuScenes substantially accelerate autonomous driving research, especially for perception tasks such as 3D detection and trajectory forecasting. Since the driving logs in these datasets contain HD maps and detailed object annotations which accurately reflect the real-world complexity of traffic behaviors, we can harvest a massive number of complex traffic scenarios and recreate their digital twins in simulation. Compared to the hand-crafted scenarios often used in existing simulators, data-driven scenarios collected from the real world can facilitate many research opportunities in machine learning and autonomous driving. In this work, we present ScenarioNet, an open-source platform for large-scale traffic scenario modeling and simulation. ScenarioNet defines a unified scenario description format and collects a large-scale repository of real-world traffic scenarios from the heterogeneous data in various driving datasets including Waymo, nuScenes, Lyft L5, and nuPlan datasets. These scenarios can be further replayed and interacted with in multiple views from Bird-Eye-View layout to realistic 3D rendering in MetaDrive simulator. This provides a benchmark for evaluating the safety of autonomous driving stacks in simulation before their real-world deployment. We further demonstrate the strengths of ScenarioNet on large-scale scenario generation, imitation learning, and reinforcement learning in both single-agent and multi-agent settings. Code, demo videos, and website are available at //metadriverse.github.io/scenarionet.

相關內容

With the rapid advancements in autonomous driving and robot navigation, there is a growing demand for lifelong learning models capable of estimating metric (absolute) depth. Lifelong learning approaches potentially offer significant cost savings in terms of model training, data storage, and collection. However, the quality of RGB images and depth maps is sensor-dependent, and depth maps in the real world exhibit domain-specific characteristics, leading to variations in depth ranges. These challenges limit existing methods to lifelong learning scenarios with small domain gaps and relative depth map estimation. To facilitate lifelong metric depth learning, we identify three crucial technical challenges that require attention: i) developing a model capable of addressing the depth scale variation through scale-aware depth learning, ii) devising an effective learning strategy to handle significant domain gaps, and iii) creating an automated solution for domain-aware depth inference in practical applications. Based on the aforementioned considerations, in this paper, we present i) a lightweight multi-head framework that effectively tackles the depth scale imbalance, ii) an uncertainty-aware lifelong learning solution that adeptly handles significant domain gaps, and iii) an online domain-specific predictor selection method for real-time inference. Through extensive numerical studies, we show that the proposed method can achieve good efficiency, stability, and plasticity, leading the benchmarks by 8% to 15%.

Facial expression recognition (FER) is an important task in computer vision, having practical applications in areas such as human-computer interaction, education, healthcare, and online monitoring. In this challenging FER task, there are three key issues especially prevalent: inter-class similarity, intra-class discrepancy, and scale sensitivity. While existing works typically address some of these issues, none have fully addressed all three challenges in a unified framework. In this paper, we propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER), that aims to holistically solve all three issues. Specifically, we design a transformer-based cross-fusion method that enables effective collaboration of facial landmark features and image features to maximize proper attention to salient facial regions. Furthermore, POSTER employs a pyramid structure to promote scale invariance. Extensive experimental results demonstrate that our POSTER achieves new state-of-the-art results on RAF-DB (92.05%), FERPlus (91.62%), as well as AffectNet 7 class (67.31%) and 8 class (63.34%). The code is available at //github.com/zczcwh/POSTER.

We address the problem of efficient 3-D exploration in indoor environments for micro aerial vehicles with limited sensing capabilities and payload/power constraints. We develop an indoor exploration framework that uses learning to predict the occupancy of unseen areas, extracts semantic features, samples viewpoints to predict information gains for different exploration goals, and plans informative trajectories to enable safe and smart exploration. Extensive experimentation in simulated and real-world environments shows the proposed approach outperforms the state-of-the-art exploration framework by 24% in terms of the total path length in a structured indoor environment and with a higher success rate during exploration.

Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn significant attention in both computer vision and natural language processing communities. Existing VLN tasks are built for agents that navigate on the ground, either indoors or outdoors. However, many tasks require intelligent agents to carry out in the sky, such as UAV-based goods delivery, traffic/security patrol, and scenery tour, to name a few. Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios. Our simulator supports continuous navigation, environment extension and configuration. We also proposed an extended baseline model based on the widely-used cross-modal-alignment (CMA) navigation methods. We find that there is still a significant gap between the baseline model and human performance, which suggests AerialVLN is a new challenging task. Dataset and code is available at //github.com/AirVLN/AirVLN.

In unmanned aerial vehicle (UAV) networks, high-capacity data transmission is of utmost importance for applications such as intelligent transportation, smart cities, and forest monitoring, which rely on the mobility of UAVs to collect and transmit large amount of data, including video and image data. Due to the short flight time of UAVs, the network capacity will be reduced when they return to the ground unit for charging. Hence, we suggest that UAVs can apply a store-carry-and-forward (SCF) transmission mode to carry packets on their way back to the ground unit for improving network throughput. In this paper, we propose a novel protocol, named UAV delay-tolerant multiple access control (UD-MAC), which can support different transmission modes in UAV networks. We set a higher priority for SCF transmission and analyze the probability of being in SCF mode to derive network throughput. The simulation results show that the network throughput of UD-MAC is improved by 57% to 83% compared to VeMAC.

Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptual variability casts doubt on the reliability of data analysis based on visual clustering, we lack a systematic way to efficiently assess this variability. In this research, we study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. To this end, we introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots. We first conduct a qualitative study to identify key factors that affect the visual separation of clusters (e.g., proximity or size difference between clusters). Based on study findings, we deploy a regression module that estimates the human-judged separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the aggregated results of all pairwise separability between clusters that are generated by the module. CLAMS outperforms widely-used clustering techniques in predicting ground truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human annotators. We conclude our work by presenting two applications for optimizing and benchmarking data mining techniques using CLAMS. The interactive demo of CLAMS is available at clusterambiguity.dev.

High-definition (HD) map provides abundant and precise static environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. In this paper, we present \textbf{Map} \textbf{TR}ansformer, an end-to-end framework for online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. To speed up convergence, we further introduce auxiliary one-to-many matching and dense supervision. The proposed method well copes with various map elements with arbitrary shapes. It runs at real-time inference speed and achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets. Abundant qualitative results show stable and robust map construction quality in complex and various driving scenes. Code and more demos are available at \url{//github.com/hustvl/MapTR} for facilitating further studies and applications.

In various service-oriented applications such as distributed autonomous delivery, healthcare, tourism, transportation, and many others, where service agents need to perform serial and time-bounded tasks to achieve their goals, quality of service must constantly be assured. In addition to safety requirements, such agents also need to fulfill performance requirements in order to satisfy their quality of service. This paper proposes the novel quality-aware time window temporal logic (QTWTL) by extending the traditional time window temporal logic (TWTL) with two operators for counting and aggregation operations. We also propose offline runtime monitoring algorithms for the performance monitoring of QTWTL specifications. To analyze the feasibility and efficiency of our proposed approach, we generate a large number of traces using the New York City Taxi and Limousine Commission Trip Record data, formalize their performance requirements using QTWTL, and monitor them using the proposed algorithms. The obtained results show that the monitoring algorithm has a linear space and time complexity with respect to the number of traces monitored.

This paper introduces FALL-E, a foley synthesis system and its training/inference strategies. The FALL-E model employs a cascaded approach comprising low-resolution spectrogram generation, spectrogram super-resolution, and a vocoder. We trained every sound-related model from scratch using our extensive datasets, and utilized a pre-trained language model. We conditioned the model with dataset-specific texts, enabling it to learn sound quality and recording environment based on text input. Moreover, we leveraged external language models to improve text descriptions of our datasets and performed prompt engineering for quality, coherence, and diversity. FALL-E was evaluated by an objective measure as well as listening tests in the DCASE 2023 challenge Task 7. The submission achieved the second place on average, while achieving the best score for diversity, second place for audio quality, and third place for class fitness.

The cross-domain recommendation technique is an effective way of alleviating the data sparsity in recommender systems by leveraging the knowledge from relevant domains. Transfer learning is a class of algorithms underlying these techniques. In this paper, we propose a novel transfer learning approach for cross-domain recommendation by using neural networks as the base model. We assume that hidden layers in two base networks are connected by cross mappings, leading to the collaborative cross networks (CoNet). CoNet enables dual knowledge transfer across domains by introducing cross connections from one base network to another and vice versa. CoNet is achieved in multi-layer feedforward networks by adding dual connections and joint loss functions, which can be trained efficiently by back-propagation. The proposed model is evaluated on two real-world datasets and it outperforms baseline models by relative improvements of 3.56\% in MRR and 8.94\% in NDCG, respectively.

北京阿比特科技有限公司