亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding. For the existing RGB-T semantic segmentation methods, however, the effective exploration of the complementary relationship between different modalities is not implemented in the information interaction between multiple levels. To address such an issue, the Context-Aware Interaction Network (CAINet) is proposed for RGB-T semantic segmentation, which constructs interaction space to exploit auxiliary tasks and global context for explicitly guided learning. Specifically, we propose a Context-Aware Complementary Reasoning (CACR) module aimed at establishing the complementary relationship between multimodal features with the long-term context in both spatial and channel dimensions. Further, considering the importance of global contextual and detailed information, we propose the Global Context Modeling (GCM) module and Detail Aggregation (DA) module, and we introduce specific auxiliary supervision to explicitly guide the context interaction and refine the segmentation map. Extensive experiments on two benchmark datasets of MFNet and PST900 demonstrate that the proposed CAINet achieves state-of-the-art performance. The code is available at //github.com/YingLv1106/CAINet.

相關內容

IFIP TC13 Conference on Human-Computer Interaction是人機交互領域的研究者和實踐者展示其工作的重要平臺。多年來,這些會議吸引了來自幾個國家和文化的研究人員。官網鏈接: · Neural Networks · 線性的 · Performer · 變換 ·
2024 年 2 月 19 日

We use Physics-Informed Neural Networks (PINNs) to solve the discrete-time nonlinear observer state estimation problem. Integrated within a single-step exact observer linearization framework, the proposed PINN approach aims at learning a nonlinear state transformation map by solving a system of inhomogeneous functional equations. The performance of the proposed PINN approach is assessed via two illustrative case studies for which the observer linearizing transformation map can be derived analytically. We also perform an uncertainty quantification analysis for the proposed PINN scheme and we compare it with conventional power-series numerical implementations, which rely on the computation of a power series solution.

Multi-Robot Path Planning (MRPP) on graphs, equivalently known as Multi-Agent Path Finding (MAPF), is a well-established NP-hard problem with critically important applications. As serial computation in (near)-optimally solving MRPP approaches the computation efficiency limit, parallelization offers a promising route to push the limit further, especially in handling hard or large MRPP instances. In this study, we initiated a \emph{targeted} parallelization effort to boost the performance of conflict-based search for MRPP. Specifically, when instances are relatively small but robots are densely packed with strong interactions, we apply a decentralized parallel algorithm that concurrently explores multiple branches that leads to markedly enhanced solution discovery. On the other hand, when instances are large with sparse robot-robot interactions, we prioritize node expansion and conflict resolution. Our innovative multi-threaded approach to parallelizing bounded-suboptimal conflict search-based algorithms demonstrates significant improvements over baseline serial methods in success rate or runtime. Our contribution further pushes the understanding of MRPP and charts a promising path for elevating solution quality and computational efficiency through parallel algorithmic strategies.

Model-based methods are widely used for reconstruction in compressed sensing (CS) magnetic resonance imaging (MRI), using regularizers to describe the images of interest. The reconstruction process is equivalent to solving a composite optimization problem. Accelerated proximal methods (APMs) are very popular approaches for such problems. This paper proposes a complex quasi-Newton proximal method (CQNPM) for the wavelet and total variation based CS MRI reconstruction. Compared with APMs, CQNPM requires fewer iterations to converge but needs to compute a more challenging proximal mapping called weighted proximal mapping (WPM). To make CQNPM more practical, we propose efficient methods to solve the related WPM. Numerical experiments on reconstructing non-Cartesian MRI data demonstrate the effectiveness and efficiency of CQNPM.

This paper proposes an adaptive behavioral decision-making method for autonomous vehicles (AVs) focusing on complex merging scenarios. Leveraging principles from non-cooperative game theory, we develop a vehicle interaction behavior model that defines key traffic elements and integrates a multifactorial reward function. Maximum entropy inverse reinforcement learning (IRL) is employed for behavior model parameter optimization. Optimal matching parameters can be obtained using the interaction behavior feature vector and the behavior probabilities output by the vehicle interaction model. Further, a behavioral decision-making method adapted to dynamic environments is proposed. By establishing a mapping model between multiple environmental variables and model parameters, it enables parameters online learning and recognition, and achieves to output interactive behavior probabilities of AVs. Quantitative analysis employing naturalistic driving datasets (highD and exiD) and real-vehicle test data validates the model's high consistency with human decision-making. In 188 tested interaction scenarios, the average human-like similarity rate is 81.73%, with a notable 83.12% in the highD dataset. Furthermore, in 145 dynamic interactions, the method matches human decisions at 77.12%, with 6913 consistence instances. Moreover, in real-vehicle tests, a 72.73% similarity with 0% safety violations are obtained. Results demonstrate the effectiveness of our proposed method in enabling AVs to make informed adaptive behavior decisions in interactive environments.

Structured sparsity is an efficient way to prune the complexity of modern Machine Learning (ML) applications and to simplify the handling of sparse data in hardware. In such cases, the acceleration of structured-sparse ML models is handled by sparse systolic tensor arrays. The increasing prevalence of ML in safety-critical systems requires enhancing the sparse tensor arrays with online error detection for managing random hardware failures. Algorithm-based fault tolerance has been proposed as a low-cost mechanism to check online the result of computations against random hardware failures. In this work, we address a key architectural challenge with structured-sparse tensor arrays: how to provide online error checking for a range of structured sparsity levels while maintaining high utilization of the hardware. Experimental results highlight the minimum hardware overhead incurred by the proposed checking logic and its error detection properties after injecting random hardware faults on sparse tensor arrays that execute layers of ResNet50 CNN.

We present Q-ViD, a simple approach for video question answering (video QA), that unlike prior methods, which are based on complex architectures, computationally expensive pipelines or use closed models like GPTs, Q-ViD relies on a single instruction-aware open vision-language model (InstructBLIP) to tackle videoQA using frame descriptions. Specifically, we create captioning instruction prompts that rely on the target questions about the videos and leverage InstructBLIP to obtain video frame captions that are useful to the task at hand. Subsequently, we form descriptions of the whole video using the question-dependent frame captions, and feed that information, along with a question-answering prompt, to a large language model (LLM). The LLM is our reasoning module, and performs the final step of multiple-choice QA. Our simple Q-ViD framework achieves competitive or even higher performances than current state of the art models on a diverse range of videoQA benchmarks, including NExT-QA, STAR, How2QA, TVQA and IntentQA.

Few-shot Knowledge Graph (KG) completion is a focus of current research, where each task aims at querying unseen facts of a relation given its few-shot reference entity pairs. Recent attempts solve this problem by learning static representations of entities and references, ignoring their dynamic properties, i.e., entities may exhibit diverse roles within task relations, and references may make different contributions to queries. This work proposes an adaptive attentional network for few-shot KG completion by learning adaptive entity and reference representations. Specifically, entities are modeled by an adaptive neighbor encoder to discern their task-oriented roles, while references are modeled by an adaptive query-aware aggregator to differentiate their contributions. Through the attention mechanism, both entities and references can capture their fine-grained semantic meanings, and thus render more expressive representations. This will be more predictive for knowledge acquisition in the few-shot scenario. Evaluation in link prediction on two public datasets shows that our approach achieves new state-of-the-art results with different few-shot sizes.

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.

We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.

北京阿比特科技有限公司