亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper presents a communication and energy-aware Multi-UAV Coverage Path Planning (mCPP) method for scenarios requiring continuous inter-UAV communication, such as cooperative search and rescue and surveillance missions. Unlike existing mCPP solutions that focus on energy, time, or coverage efficiency, our approach generates coverage paths that require minimal the communication range to maintain inter-UAV connectivity while also optimizing energy consumption. The mCPP problem is formulated as a multi-objective optimization task, aiming to minimize both the communication range requirement and energy consumption. Our approach significantly reduces the communication range needed for maintaining connectivity while ensuring energy efficiency, outperforming state-of-the-art methods. Its effectiveness is validated through simulations on complex and arbitrary shaped regions of interests, including scenarios with no-fly zones. Additionally, real-world experiment demonstrate its high accuracy, achieving 99\% consistency between the estimated and actual communication range required during a multi-UAV coverage mission involving three UAVs.

相關內容

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processing high-resolution images with different aspect ratios. For the language component, we leverage DeepSeekMoE models with the Multi-head Latent Attention mechanism, which compresses Key-Value cache into latent vectors, to enable efficient inference and high throughput. Trained on an improved vision-language dataset, DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models. Codes and pre-trained models are publicly accessible at //github.com/deepseek-ai/DeepSeek-VL2.

We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their unique recovery given the topological ordering (TO). Based on this, we design a two-stage causal generative model that first infers in a zero-shot manner a valid TO from observations, and then learns the generative SCM on the ordered variables. To infer TOs, we propose to amortize the learning of TOs on synthetically generated datasets by sequentially predicting the leaves of graphs seen during training. To learn SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems. The code is available on \href{//github.com/microsoft/causica/tree/main/research_experiments/fip}{Github}.

This paper presents a consensus-based payload algorithm (CBPA) to deal with the condition of robots' capability decrease for multi-robot task allocation. During the execution of complex tasks, robots' capabilities could decrease with the consumption of payloads, which causes a problem that the robot coalition would not meet the tasks' requirements in real time. The proposed CBPA is an enhanced version of the consensus-based bundle algorithm (CBBA) and comprises two primary core phases: the payload bundle construction and consensus phases. In the payload bundle construction phase, CBPA introduces a payload assignment matrix to track the payloads carried by the robots and the demands of multi-robot tasks in real time. Then, robots share their respective payload assignment matrix in the consensus phase. These two phases are iterated to dynamically adjust the number of robots performing multi-robot tasks and the number of tasks each robot performs and obtain conflict-free results to ensure that the robot coalition meets the demand and completes all tasks as quickly as possible. Physical experiment shows that CBPA is appropriate in complex and dynamic scenarios where robots need to collaborate and task requirements are tightly coupled to the robots' payloads. Numerical experiments show that CBPA has higher total task gains than CBBA.

Existing Vision-Language Pretraining (VLP) methods have achieved remarkable improvements across a variety of vision-language tasks, confirming their effectiveness in capturing coarse-grained semantic correlations. However, their capability for fine-grained understanding, which is critical for many nuanced vision-language applications, remains limited. Prevailing VLP models often overlook the intricate distinctions in expressing different modal features and typically depend on the similarity of holistic features for cross-modal interactions. Moreover, these models directly align and integrate features from different modalities, focusing more on coarse-grained general representations, thus failing to capture the nuanced differences necessary for tasks demanding a more detailed perception. In response to these limitations, we introduce Negative Augmented Samples(NAS), a refined vision-language pretraining model that innovatively incorporates NAS to specifically address the challenge of fine-grained understanding. NAS utilizes a Visual Dictionary(VD) as a semantic bridge between visual and linguistic domains. Additionally, it employs a Negative Visual Augmentation(NVA) method based on the VD to generate challenging negative image samples. These samples deviate from positive samples exclusively at the token level, thereby necessitating that the model discerns the subtle disparities between positive and negative samples with greater precision. Comprehensive experiments validate the efficacy of NAS components and underscore its potential to enhance fine-grained vision-language comprehension.

This paper introduces a new stochastic geometry-based model to analyze the Request-to-Send/Clear-to-Send (RTS/CTS) handshake mechanism in wireless local area networks (WLANs). We develop an advanced hard-core point process model, termed the dual-zone hard-core process (DZHCP), which extends traditional hard-core models to capture the spatial interactions and exclusion effects introduced by the RTS/CTS mechanism. This model integrates key parameters accounting for the thinning effects imposed by RTS/CTS, enabling a refined characterization of active transmitters in the network. Analytical expressions are derived for the intensity of the DZHCP, the mean interference, and an approximation of the success probability, providing insight into how network performance depends on critical design parameters. Our results provide better estimates of interference levels and success probability, which could inform strategies for better interference management and improved performance in future WLAN designs.

This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in ProbDR, that describes the graph Laplacian (an estimate for the precision/inverse covariance) matrix using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, by showing that variances corresponding to these covariances are low (and misspecified), and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied, and pose two areas of research arising from these interpretations.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

北京阿比特科技有限公司