亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Location fingerprinting based on RSSI becomes a mainstream indoor localization technique due to its advantage of not requiring the installation of new infrastructure and the modification of existing devices, especially given the prevalence of Wi-Fi-enabled devices and the ubiquitous Wi-Fi access in modern buildings. The use of AI/ML technologies like DNNs makes location fingerprinting more accurate and reliable, especially for large-scale multi-building and multi-floor indoor localization. The application of DNNs for indoor localization, however, depends on a large amount of preprocessed and deliberately-labeled data for their training. Considering the difficulty of the data collection in an indoor environment, especially under the current epidemic situation of COVID-19, we investigate three different methods of RSSI data augmentation based on Multi-Output Gaussian Process (MOGP), i.e., by a single floor, by neighboring floors, and by a single building; unlike Single-Output Gaussian Process (SOGP), MOGP can take into account the correlation among RSSI observations from multiple Access Points (APs) deployed closely to each other (e.g., APs on the same floor of a building) by collectively handling them. The feasibility of the MOGP-based RSSI data augmentation is demonstrated through experiments based on the state-of-the-art RNN indoor localization model and the UJIIndoorLoc, i.e., the most popular publicly-available multi-building and multi-floor indoor localization database, where the RNN model trained with the UJIIndoorLoc database augmented by using the whole RSSI data of a building in fitting an MOGP model (i.e., by a single building) outperforms the other two augmentation methods as well as the RNN model trained with the original UJIIndoorLoc database, resulting in the mean three-dimensional positioning error of 8.42 m.

相關內容

數據增強在機器學習領域多指采用一些方法(比如數據蒸餾,正負樣本均衡等)來提高模型數據集的質量,增強數據。

Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets; covering different acoustic, linguistic and prosodic conditions. Finally, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling inadequacies in our testing references. Our models, allowing to transcribe audio samples in a linguistic mix involving Tunisian Arabic, English and French, and all the data used during training and testing are released for public use and further improvements.

While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.

Intelligent reflecting surface (IRS) has emerged as a promising technique to extend the wireless signal coverage of access point (AP) and improve the communication performance cost-effectively. In order to reduce the path-loss of the cascaded user-IRS-AP channels, the IRS-integrated AP architecture has been proposed to deploy the IRSs and the antenna array of the AP within the same antenna radome. To reduce the pilot overhead for estimating all IRS-involved channels, in this paper, we propose a novel codebook-based IRS reflection design for the IRS-integrated AP to enhance the coverage performance in a given area. In particular, the codebook consisting of a small number of codewords is designed offline by employing an efficient sector division strategy based on the azimuth angle. To ensure the performance of each sector, we optimize its corresponding codeword for IRS reflection pattern to maximize the sector-min-average-effective-channel-power (SMAECP) by applying the alternating optimization (AO) and semidefinite relaxation (SDR) methods. With the designed codebook, the AP performs the IRS reflection training by sequentially applying all codewords and selects the one achieving the best communication performance for data transmission. Numerical results show that our proposed codebook design can enhance the average channel power of the whole coverage area, as compared to the system without IRS. Moreover, our proposed codebook-based IRS reflection design is shown to achieve significant performance gain over other benchmark schemes in both single-user and multi-user transmissions.

To alleviate the expensive human labeling, semi-supervised semantic segmentation employs a few labeled images and an abundant of unlabeled images to predict the pixel-level label map with the same size. Previous methods often adopt co-training using two convolutional networks with the same architecture but different initialization, which fails to capture the sufficiently diverse features. This motivates us to use tri-training and develop the triple-view encoder to utilize the encoders with different architectures to derive diverse features, and exploit the knowledge distillation skill to learn the complementary semantics among these encoders. Moreover, existing methods simply concatenate the features from both encoder and decoder, resulting in redundant features that require large memory cost. This inspires us to devise a dual-frequency decoder that selects those important features by projecting the features from the spatial domain to the frequency domain, where the dual-frequency channel attention mechanism is introduced to model the feature importance. Therefore, we propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation, including the triple-view encoder and the dual-frequency decoder. Extensive experiments were conducted on two benchmarks, \ie, Pascal VOC 2012 and Cityscapes, whose results verify the superiority of the proposed method with a good tradeoff between precision and inference speed.

Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowledge to enhance LLMs prompting can significantly improve LLMs performance in KGQA. However, their approaches lack a well-formed verbalization of KG knowledge, i.e., they ignore the gap between KG representations and textual representations. To this end, we propose an answer-sensitive KG-to-Text approach that can transform KG knowledge into well-textualized statements most informative for KGQA. Based on this approach, we propose a KG-to-Text enhanced LLMs framework for solving the KGQA task. Experiments on several KGQA benchmarks show that the proposed KG-to-Text augmented LLMs approach outperforms previous KG-augmented LLMs approaches regarding answer accuracy and usefulness of knowledge statements.

Beam selection for joint transmission in cell-free massive multi-input multi-output systems faces the problem of extremely high training overhead and computational complexity. The traffic-aware quality of service additionally complicates the beam selection problem. To address this issue, we propose a traffic-aware hierarchical beam selection scheme performed in a dual timescale. In the long-timescale, the central processing unit collects wide beam responses from base stations (BSs) to predict the power profile in the narrow beam space with a convolutional neural network, based on which the cascaded multiple-BS beam space is carefully pruned. In the short-timescale, we introduce a centralized reinforcement learning (RL) algorithm to maximize the satisfaction rate of delay w.r.t. beam selection within multiple consecutive time slots. Moreover, we put forward three scalable distributed algorithms including hierarchical distributed Lyapunov optimization, fully distributed RL, and centralized training with decentralized execution of RL to achieve better scalability and better tradeoff between the performance and the execution signal overhead. Numerical results demonstrate that the proposed schemes significantly reduce both model training cost and beam training overhead and are easier to meet the user-specific delay requirement, compared to existing methods.

Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.

Deployment of Internet of Things (IoT) devices and Data Fusion techniques have gained popularity in public and government domains. This usually requires capturing and consolidating data from multiple sources. As datasets do not necessarily originate from identical sensors, fused data typically results in a complex data problem. Because military is investigating how heterogeneous IoT devices can aid processes and tasks, we investigate a multi-sensor approach. Moreover, we propose a signal to image encoding approach to transform information (signal) to integrate (fuse) data from IoT wearable devices to an image which is invertible and easier to visualize supporting decision making. Furthermore, we investigate the challenge of enabling an intelligent identification and detection operation and demonstrate the feasibility of the proposed Deep Learning and Anomaly Detection models that can support future application that utilizes hand gesture data from wearable devices.

Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks. Despite its powerful capacity to learn and generalize from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep, which limit the model scalability. In this work, we propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism, \ie node self-attention, neighborhood attention, and layer memory attention. We explain why the proposed attentive modules can improve GNN for few-shot learning with theoretical analysis and illustrations. Extensive experiments show that the proposed Attentive GNN outperforms the state-of-the-art GNN-based methods for few-shot learning over the mini-ImageNet and Tiered-ImageNet datasets, with both inductive and transductive settings.

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.

北京阿比特科技有限公司