亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

End-to-End driving is a promising paradigm as it circumvents the drawbacks associated with modular systems, such as their overwhelming complexity and propensity for error propagation. Autonomous driving transcends conventional traffic patterns by proactively recognizing critical events in advance, ensuring passengers' safety and providing them with comfortable transportation, particularly in highly stochastic and variable traffic settings. This paper presents a comprehensive review of the End-to-End autonomous driving stack. It provides a taxonomy of automated driving tasks wherein neural networks have been employed in an End-to-End manner, encompassing the entire driving process from perception to control, while addressing key challenges encountered in real-world applications. Recent developments in End-to-End autonomous driving are analyzed, and research is categorized based on underlying principles, methodologies, and core functionality. These categories encompass sensorial input, main and auxiliary output, learning approaches ranging from imitation to reinforcement learning, and model evaluation techniques. The survey incorporates a detailed discussion of the explainability and safety aspects. Furthermore, it assesses the state-of-the-art, identifies challenges, and explores future possibilities. We maintained the latest advancements and their corresponding open-source implementations at //github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning.

相關內容

Large language models such as GPT-3 have demonstrated an impressive capability to adapt to new tasks without requiring task-specific training data. This capability has been particularly effective in settings such as narrative question answering, where the diversity of tasks is immense, but the available supervision data is small. In this work, we investigate if such language models can extend their zero-shot reasoning abilities to long multimodal narratives in multimedia content such as drama, movies, and animation, where the story plays an essential role. We propose Long Story Short, a framework for narrative video QA that first summarizes the narrative of the video to a short plot and then searches parts of the video relevant to the question. We also propose to enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art supervised models by a large margin, highlighting the potential of zero-shot QA for long videos.

Thermal spray coating is a critical process in many industries, involving the application of coatings to surfaces to enhance their functionality. This paper proposes a framework for modelling and predicting critical target variables in thermal spray coating processes, based on the application of statistical design of experiments (DoE) and the modelling of the data using generalized linear models (GLMs) and gamma regression. Experimental data obtained from thermal spray coating trials are used to validate the presented approach, demonstrating that it is able to accurately model and predict critical target variables and their intricate relationships. As such, the framework has significant potential for the optimization of thermal spray coating processes, and can contribute to the development of more efficient and effective coating technologies in various industries.

Accurate latency computation is essential for the Internet of Things (IoT) since the connected devices generate a vast amount of data that is processed on cloud infrastructure. However, the cloud is not an optimal solution. To overcome this issue, fog computing is used to enable processing at the edge while still allowing communication with the cloud. Many applications rely on fog computing, including traffic management. In this paper, an Intelligent Traffic Congestion Mitigation System (ITCMS) is proposed to address traffic congestion in heavily populated smart cities. The proposed system is implemented using fog computing and tested in a crowded city. Its performance is evaluated based on multiple metrics, such as traffic efficiency, energy savings, reduced latency, average traffic flow rate, and waiting time. The obtained results are compared with similar techniques that tackle the same issue. The results obtained indicate that the execution time of the simulation is 4,538 seconds, and the delay in the application loop is 49.67 seconds. The paper addresses various issues, including CPU usage, heap memory usage, throughput, and the total average delay, which are essential for evaluating the performance of the ITCMS. Our system model is also compared with other models to assess its performance. A comparison is made using two parameters, namely throughput and the total average delay, between the ITCMS, IOV (Internet of Vehicle), and STL (Seasonal-Trend Decomposition Procedure based on LOESS). Consequently, the results confirm that the proposed system outperforms the others in terms of higher accuracy, lower latency, and improved traffic efficiency.

Pre-trained Vision-Language Models (VLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic modalities. When CLIP is used for depth estimation tasks, the patches, divided from the input images, can be combined with a series of semantic descriptions of the depth information to obtain similarity results. The coarse estimation of depth is then achieved by weighting and summing the depth values, called depth bins, corresponding to the predefined semantic descriptions. The zero-shot approach circumvents the computational and time-intensive nature of traditional fully-supervised depth estimation methods. However, this method, utilizing fixed depth bins, may not effectively generalize as images from different scenes may exhibit distinct depth distributions. To address this challenge, we propose a few-shot-based method which learns to adapt the VLMs for monocular depth estimation to balance training costs and generalization capabilities. Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference. Additionally, we incorporate learnable prompts to preprocess the input text to convert the easily human-understood text into easily model-understood vectors and further enhance the performance. With only one image per scene for training, our extensive experiment results on the NYU V2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6\% in terms of MARE.

Inventory Routing Problem (IRP) is a crucial challenge in supply chain management as it involves optimizing efficient route selection while considering the uncertainty of inventory demand planning. To solve IRPs, usually a two-stage approach is employed, where demand is predicted using machine learning techniques first, and then an optimization algorithm is used to minimize routing costs. Our experiment shows machine learning models fall short of achieving perfect accuracy because inventory levels are influenced by the dynamic business environment, which, in turn, affects the optimization problem in the next stage, resulting in sub-optimal decisions. In this paper, we formulate and propose a decision-focused learning-based approach to solving real-world IRPs. This approach directly integrates inventory prediction and routing optimization within an end-to-end system potentially ensuring a robust supply chain strategy.

The moving discontinuous Galerkin method with interface condition enforcement (MDG-ICE) is a high-order, r-adaptive method that treats the grid as a variable and weakly enforces the conservation law, constitutive law, and corresponding interface conditions in order to implicitly fit high-gradient flow features. In this paper, we introduce nonlinear solver strategies to more robustly and efficiently compute high-speed viscous flows. Specifically, we incorporate an anisotropic grid regularization based on the mesh-implied metric into the nonlinear least-squares solver that inhibits grid motion in directions with small element length scales. Furthermore, we develop an adaptive elementwise regularization strategy that locally scales the regularization terms as needed to maintain grid validity. We apply the proposed MDG-ICE formulation to test cases involving viscous shocks and/or boundary layers, including Mach 17.6 hypersonic viscous flow over a circular cylinder and Mach 5 hypersonic viscous flow over a sphere, which are very challenging test cases for conventional numerical schemes on simplicial grids. Even without artificial dissipation, the computed solutions are free from spurious oscillations and yield highly symmetric surface heat-flux profiles.

Gaze tracking devices have the potential to greatly expand interactivity, yet miscalibration remains a significant barrier to use. As devices miscalibrate, people tend to compensate by intentionally offsetting their gaze, which makes detecting miscalibration from eye signals difficult. To help address this problem, we propose a novel approach to seamless calibration based on the insight that the system's model of eye gaze can be updated during reading (user does not compensate) to improve calibration for typing (user might compensate). To explore this approach, we built an auto-calibrating gaze typing prototype called EyeO, ran a user study with 20 participants, and conducted a semi-structured interview with 6 ALS community stakeholders. Our user study results suggest that seamless autocalibration can significantly improve typing efficiency and user experience. Findings from the semi-structured interview validate the need for autocalibration, and shed light on the prototype's potential usefulness, desired algorithmic and design improvements for users.

Feature extraction and matching are the basic parts of many robotic vision tasks, such as 2D or 3D object detection, recognition, and registration. As known, 2D feature extraction and matching have already been achieved great success. Unfortunately, in the field of 3D, the current methods fail to support the extensive application of 3D LiDAR sensors in robotic vision tasks, due to the poor descriptiveness and inefficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. The novelty of LinK3D lies in that it fully considers the characteristics (such as the sparsity, and complexity of scenes) of LiDAR point clouds, and represents the keypoint with its robust neighbor keypoints, which provide strong distinction in the description of the keypoint. The proposed LinK3D has been evaluated on two public datasets (i.e., KITTI, Steven VLP16), and the experimental results show that our method greatly outperforms the state-of-the-art in matching performance. More importantly, LinK3D shows excellent real-time performance, faster than the sensor frame rate at 10 Hz of a typical rotating LiDAR sensor. LinK3D only takes an average of 32 milliseconds to extract features from the point cloud collected by a 64-beam LiDAR, and takes merely about 8 milliseconds to match two LiDAR scans when executed in a notebook with an Intel Core i7 @2.2 GHz processor. Moreover, our method can be widely extended to various 3D vision applications. In this paper, we apply the proposed LinK3D to the LiDAR odometry and place recognition task of LiDAR SLAM. The experimental results show that our method can improve the efficiency and accuracy of LiDAR SLAM system.

Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.

Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at //github.com/kstant0725/SpectralNet .

北京阿比特科技有限公司