亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

With the advancement of maritime unmanned aerial vehicles (UAVs) and deep learning technologies, the application of UAV-based object detection has become increasingly significant in the fields of maritime industry and ocean engineering. Endowed with intelligent sensing capabilities, the maritime UAVs enable effective and efficient maritime surveillance. To further promote the development of maritime UAV-based object detection, this paper provides a comprehensive review of challenges, relative methods, and UAV aerial datasets. Specifically, in this work, we first briefly summarize four challenges for object detection on maritime UAVs, i.e., object feature diversity, device limitation, maritime environment variability, and dataset scarcity. We then focus on computational methods to improve maritime UAV-based object detection performance in terms of scale-aware, small object detection, view-aware, rotated object detection, lightweight methods, and others. Next, we review the UAV aerial image/video datasets and propose a maritime UAV aerial dataset named MS2ship for ship detection. Furthermore, we conduct a series of experiments to present the performance evaluation and robustness analysis of object detection methods on maritime datasets. Eventually, we give the discussion and outlook on future works for maritime UAV-based object detection. The MS2ship dataset is available at \href{//github.com/zcj234/MS2ship}{//github.com/zcj234/MS2ship}.

相關內容

目標檢測,也叫目標提取,是一種與計算機視覺和圖像處理有關的計算機技術,用于檢測數字圖像和視頻中特定類別的語義對象(例如人,建筑物或汽車)的實例。深入研究的對象檢測領域包括面部檢測和行人檢測。 對象檢測在計算機視覺的許多領域都有應用,包括圖像檢索和視頻監視。

知識薈萃

精品入門和進階教程、論文和代碼整理等

更多

查看相關VIP內容、論文、資訊等

When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs. However, Large Multimodal Models (LMMs) encounter two issues in such scenarios: (1) a lack of fine-grained perception, and (2) a tendency to blend information across multiple images. We first extensively investigate the capability of LMMs to perceive fine-grained visual details when dealing with multiple input images. The research focuses on two aspects: first, image-to-image matching (to evaluate whether LMMs can effectively reason and pair relevant images), and second, multi-image-to-text matching (to assess whether LMMs can accurately capture and summarize detailed image information). We conduct evaluations on a range of both open-source and closed-source large models, including GPT-4V, Gemini, OpenFlamingo, and MMICL. To enhance model performance, we further develop a Contrastive Chain-of-Thought (CoCoT) prompting approach based on multi-input multimodal models. This method requires LMMs to compare the similarities and differences among multiple image inputs, and then guide the models to answer detailed questions about multi-image inputs based on the identified similarities and differences. Our experimental results showcase CoCoT's proficiency in enhancing the multi-image comprehension capabilities of large multimodal models.

We propose a data-driven approach for propagating uncertainty in stochastic power grid simulations and apply it to the estimation of transmission line failure probabilities. A reduced-order equation governing the evolution of the observed line energy probability density function is derived from the Fokker--Planck equation of the full-order continuous Markov process. Our method consists of estimates produced by numerically integrating this reduced equation. Numerical experiments for scalar- and vector-valued energy functions are conducted using the classical multimachine model under spatiotemporally correlated noise perturbation. The method demonstrates a more sample-efficient approach for computing probabilities of tail events when compared with kernel density estimation. Moreover, it produces vastly more accurate estimates of joint event occurrence when compared with independent models.

This work studies the modeling and optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS) aided wireless communication systems in the presence of mutual coupling among the RIS elements. Specifically, we first derive the mutual coupling aware BD-RIS aided communication model using scattering and impedance parameter analysis. Based on the obtained communication model, we propose a general BD-RIS optimization algorithm applicable to different architectures of BD-RIS to maximize the channel gain. Numerical results validate the effectiveness of the proposed design and demonstrate that the larger the mutual coupling the larger the gain offered by BD-RIS over conventional diagonal RIS.

We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.

This project explores adversarial training techniques to develop fairer Deep Neural Networks (DNNs) to mitigate the inherent bias they are known to exhibit. DNNs are susceptible to inheriting bias with respect to sensitive attributes such as race and gender, which can lead to life-altering outcomes (e.g., demographic bias in facial recognition software used to arrest a suspect). We propose a robust optimization problem, which we demonstrate can improve fairness in several datasets, both synthetic and real-world, using an affine linear model. Leveraging second order information, we are able to find a solution to our optimization problem more efficiently than a purely first order method.

Wireless communication highly depends on the cellular ground base station (GBS). A failure of the cellular GBS, fully or partially, during natural or man-made disasters creates a communication gap in the disaster-affected areas. In such situations, public safety communication (PSC) can significantly save the national infrastructure, property, and lives. Throughout emergencies, the PSC can provide mission-critical communication and video transmission services in the affected area. Unmanned aerial vehicles (UAVs) as flying base stations (UAV-BSs) are particularly suitable for PSC services as they are flexible, mobile, and easily deployable. This manuscript considers a multi-UAV-assisted PSC network with an observational UAV receiving videos from the affected area's ground users (AGUs) and transmitting them to the nearby GBS via a relay UAV. The objective of the proposed study is to maximize the average utility of the video streams generated by the AGUs upon reaching the GBS. This is achieved by optimizing the positions of the observational and relay UAVs, as well as the distribution of communication resources, such as bandwidth, and transmit power, while satisfying the system-designed constraints, such as transmission rate, rate outage probability, transmit power budget, and available bandwidth. To this end, a joint UAVs placement and resource allocation problem is mathematically formulated. The proposed problem poses a significant challenge for a solution. Considering the block coordinate descent and successive convex approximation techniques, an efficient iterative algorithm is proposed. Finally, simulation results are provided which show that our proposed approach outperforms the existing methods.

With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.

We present a large-scale study on unsupervised spatiotemporal representation learning from videos. With a unified perspective on four recent image-based frameworks, we study a simple objective that can easily generalize all these methods to space-time. Our objective encourages temporally-persistent features in the same video, and in spite of its simplicity, it works surprisingly well across: (i) different unsupervised frameworks, (ii) pre-training datasets, (iii) downstream datasets, and (iv) backbone architectures. We draw a series of intriguing observations from this study, e.g., we discover that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds. In addition to state-of-the-art results in multiple benchmarks, we report a few promising cases in which unsupervised pre-training can outperform its supervised counterpart. Code is made available at //github.com/facebookresearch/SlowFast

北京阿比特科技有限公司