亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work, we investigate the joint visibility region (VR) detection and channel estimation (CE) problem for extremely large-scale multiple-input-multiple-output (XL-MIMO) systems considering both the spherical wavefront effect and spatial non-stationary (SnS) property. Unlike existing SnS CE methods that rely on the statistical characteristics of channels in the spatial or delay domain, we propose an approach that simultaneously exploits the antenna-domain spatial correlation and the wavenumber-domain sparsity of SnS channels. To this end, we introduce a two-stage VR detection and CE scheme. In the first stage, the belief regarding the visibility of antennas is obtained through a VR detection-oriented message passing (VRDO-MP) scheme, which fully exploits the spatial correlation among adjacent antenna elements. In the second stage, leveraging the VR information and wavenumber-domain sparsity, we accurately estimate the SnS channel employing the belief-based orthogonal matching pursuit (BB-OMP) method. Simulations show that the proposed algorithms lead to a significant enhancement in VR detection and CE accuracy as compared to existing methods, especially in low signal-to-noise ratio (SNR) scenarios.

相關內容

In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) on visual commonsense reasoning (VCR) problems. We find that VLMs and LLMs-based decision pipelines are good at different kinds of VCR problems. Pre-trained VLMs exhibit strong performance for problems involving understanding the literal visual content, which we noted as visual commonsense understanding (VCU). For problems where the goal is to infer conclusions beyond image content, which we noted as visual commonsense inference (VCI), VLMs face difficulties, while LLMs, given sufficient visual evidence, can use commonsense to infer the answer well. We empirically validate this by letting LLMs classify VCR problems into these two categories and show the significant difference between VLM and LLM with image caption decision pipelines on two subproblems. Moreover, we identify a challenge with VLMs' passive perception, which may miss crucial context information, leading to incorrect reasoning by LLMs. Based on these, we suggest a collaborative approach, named ViCor, where pre-trained LLMs serve as problem classifiers to analyze the problem category, then either use VLMs to answer the question directly or actively instruct VLMs to concentrate on and gather relevant visual elements to support potential commonsense inferences. We evaluate our framework on two VCR benchmark datasets and outperform all other methods that do not require in-domain fine-tuning.

This work focuses on addressing two major challenges in the context of large-scale nonconvex Bi-Level Optimization (BLO) problems, which are increasingly applied in machine learning due to their ability to model nested structures. These challenges involve ensuring computational efficiency and providing theoretical guarantees. While recent advances in scalable BLO algorithms have primarily relied on lower-level convexity simplification, our work specifically tackles large-scale BLO problems involving nonconvexity in both the upper and lower levels. We simultaneously address computational and theoretical challenges by introducing an innovative single-loop gradient-based algorithm, utilizing the Moreau envelope-based reformulation, and providing non-asymptotic convergence analysis for general nonconvex BLO problems. Notably, our algorithm relies solely on first-order gradient information, enhancing its practicality and efficiency, especially for large-scale BLO learning tasks. We validate our approach's effectiveness through experiments on various synthetic problems, two typical hyper-parameter learning tasks, and a real-world neural architecture search application, collectively demonstrating its superior performance.

To scale quantum computers to useful levels, we must build networks of quantum computational nodes that can share entanglement for use in distributed forms of quantum algorithms. In one proposed architecture, node-to-node entanglement is created when nodes emit photons entangled with stationary memories, with the photons routed through a switched interconnect to a shared pool of Bell state analyzers (BSAs). Designs that optimize switching circuits will reduce loss and crosstalk, raising entanglement rates and fidelity. We present optimal designs for switched interconnects constrained to planar layouts, appropriate for silicon waveguides and Mach-Zehnder interferometer (MZI) $2 \times 2$ switch points. The architectures for the optimal designs are scalable and algorithmically structured to pair any arbitrary inputs in a rearrangeable, non-blocking way. For pairing $N$ inputs, $N(N - 2)/4$ switches are required, which is less than half of number of switches required for full permutation switching networks. An efficient routing algorithm is also presented for each architecture. These designs can also be employed in reverse for entanglement generation using a shared pool of entangled paired photon sources.

One of the challenges of human-swarm interaction (HSI) is how to manage the operator's workload. In order to do this, we propose a novel neurofeedback technique for the real-time measurement of workload using functional near-infrared spectroscopy (fNIRS). The objective is to develop a baseline for workload measurement in human-swarm interaction using fNIRS and to develop an interface that dynamically adapts to the operator's workload. The proposed method consists of using fNIRS device to measure brain activity, process this through a machine learning algorithm, and pass it on to the HSI interface. By dynamically adapting the HSI interface, the swarm operator's workload could be reduced and the performance improved.

As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.

In this paper, we investigate the performance of large-scale heterogeneous low Earth orbit (LEO) satellite networks in the context of three association schemes. In contrast to existing studies, where single-tier LEO satellite-based network deployments are considered, the developed framework captures the heterogeneous nature of real-world satellite network deployments. More specifically, we propose an analytical framework to evaluate the performance of multi-tier LEO satellite-based networks, where the locations of LEO satellites are approximated as points of independent Poisson point processes, with different density, transmit power, and altitude. We propose three association schemes for the considered network topology based on: 1) the Euclidean distance, 2) the average received power, and 3) a random selection. By using stochastic geometry tools, analytical expressions for the association probability, the downlink coverage probability, as well as the spectral efficiency are derived for each association scheme, where the interference is considered. Moreover, we assess the achieved network performance under several different fading environments, including low, typical, and severe fading conditions, namely non-fading, shadowed-Rician and Rayleigh fading channels, respectively. Our results reveal the impact of fading channels on the coverage probability, and illustrate that the average power-based association scheme outperforms in terms of achieved coverage and spectral efficiency performance against the other two association policies. Furthermore, we highlight the impact of the proposed association schemes and the network topology on the optimal number of LEO satellites, providing guidance for the planning of multi-tier LEO satellite-based networks in order to enhance network performance.

In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network.

Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.

With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

北京阿比特科技有限公司