亚洲十八禁无码在线免费观看_亚洲综合在线观看一区二区三区_老司机国内精品久久久久久_日韩激情国产精品无码_91精品久久久久久久久无码_亚洲精品自拍偷拍_久久久国产精华液2023特点

Extracting image semantics effectively and assigning corresponding labels to multiple objects or attributes for natural images is challenging due to the complex scene contents and confusing label dependencies. Recent works have focused on modeling label relationships with graph and understanding object regions using class activation maps (CAM). However, these methods ignore the complex intra- and inter-category relationships among specific semantic features, and CAM is prone to generate noisy information. To this end, we propose a novel semantic-aware dual contrastive learning framework that incorporates sample-to-sample contrastive learning (SSCL) as well as prototype-to-sample contrastive learning (PSCL). Specifically, we leverage semantic-aware representation learning to extract category-related local discriminative features and construct category prototypes. Then based on SSCL, label-level visual representations of the same category are aggregated together, and features belonging to distinct categories are separated. Meanwhile, we construct a novel PSCL module to narrow the distance between positive samples and category prototypes and push negative samples away from the corresponding category prototypes. Finally, the discriminative label-level features related to the image content are accurately captured by the joint training of the above three parts. Experiments on five challenging large-scale public datasets demonstrate that our proposed method is effective and outperforms the state-of-the-art methods. Code and supplementary materials are released on //github.com/yu-gi-oh-leilei/SADCL.

相關內容

contrastive

關注 1

回合 · INFORMS · 圖 · 3D · 表示 ·

2023 年 9 月 19 日

Vision-based Situational Graphs Generating Optimizable 3D Scene Representations

Ali Tourani,Hriday Bavle,Jose Luis Sanchez-Lopez,Deniz Isinsu Avsar,Rafael Munoz Salinas,Holger Voos

from arxiv, 7 pages, 6 figures, 2 tables

3D scene graphs offer a more efficient representation of the environment by hierarchically organizing diverse semantic entities and the topological relationships among them. Fiducial markers, on the other hand, offer a valuable mechanism for encoding comprehensive information pertaining to environments and the objects within them. In the context of Visual SLAM (VSLAM), especially when the reconstructed maps are enriched with practical semantic information, these markers have the potential to enhance the map by augmenting valuable semantic information and fostering meaningful connections among the semantic objects. In this regard, this paper exploits the potential of fiducial markers to incorporate a VSLAM framework with hierarchical representations that generates optimizable multi-layered vision-based situational graphs. The framework comprises a conventional VSLAM system with low-level feature tracking and mapping capabilities bolstered by the incorporation of a fiducial marker map. The fiducial markers aid in identifying walls and doors in the environment, subsequently establishing meaningful associations with high-level entities, including corridors and rooms. Experimental results are conducted on a real-world dataset collected using various legged robots and benchmarked against a Light Detection And Ranging (LiDAR)-based framework (S-Graphs) as the ground truth. Consequently, our framework not only excels in crafting a richer, multi-layered hierarchical map of the environment but also shows enhancement in robot pose accuracy when contrasted with state-of-the-art methodologies.

相似度 · CLUES · INTERACT · MoDELS · INFORMS ·

2023 年 9 月 18 日

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

Ziyang Wang,Yi-Lin Sung,Feng Cheng,Gedas Bertasius,Mohit Bansal

from arxiv, ICCV 2023

The canonical approach to video-text retrieval leverages a coarse-grained or fine-grained alignment between visual and textual information. However, retrieving the correct video according to the text query is often challenging as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To this end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA. Specifically, our model captures the cross-modal similarity information at different granularity levels. To alleviate the effect of irrelevant visual clues, we also apply an Interactive Similarity Aggregation module (ISA) to consider the importance of different visual features while aggregating the cross-modal similarity to obtain a similarity score for each granularity. Finally, we apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them, alleviating over- and under-representation issues at different levels. By jointly considering the crossmodal similarity of different granularity, UCoFiA allows the effective unification of multi-grained alignments. Empirically, UCoFiA outperforms previous state-of-the-art CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%, 1.4% and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT, Activity-Net, and DiDeMo, respectively. Our code is publicly available at //github.com/Ziyang412/UCoFiA.

Networking · Integration · Continuity · QoS · Everything（軟件） ·

2023 年 9 月 18 日

Near-optimal Cloud-Network Integrated Resource Allocation for Latency-Sensitive B5G

Masoud Shokrnezhad,Tarik Taleb

Nowadays, while the demand for capacity continues to expand, the blossoming of Internet of Everything is bringing in a paradigm shift to new perceptions of communication networks, ushering in a plethora of totally unique services. To provide these services, Virtual Network Functions (VNFs) must be established and reachable by end-users, which will generate and consume massive volumes of data that must be processed locally for service responsiveness and scalability. For this to be realized, a solid cloud-network Integrated infrastructure is a necessity, and since cloud and network domains would be diverse in terms of characteristics but limited in terms of capability, communication and computing resources should be jointly controlled to unleash its full potential. Although several innovative methods have been proposed to allocate the resources, most of them either ignored network resources or relaxed the network as a simple graph, which are not applicable to Beyond 5G because of its dynamism and stringent QoS requirements. This paper fills in the gap by studying the joint problem of communication and computing resource allocation, dubbed CCRA, including VNF placement and assignment, traffic prioritization, and path selection considering capacity constraints as well as link and queuing delays, with the goal of minimizing overall cost. We formulate the problem as a non-linear programming model, and propose two approaches, dubbed B\&B-CCRA and WF-CCRA respectively, based on the Branch \& Bound and Water-Filling algorithms. Numerical simulations show that B\&B-CCRA can solve the problem optimally, whereas WF-CCRA can provide near-optimal solutions in significantly less time.

求逆 · Networking · Extensibility · 連結 · 再縮放 ·

2023 年 9 月 16 日

Invertible Mosaic Image Hiding Network for Very Large Capacity Image Steganography

Zihan Chen,Tianrui Liu,Jun-Jie Huang,Wentao Zhao,Xing Bi,Meng Wang

The existing image steganography methods either sequentially conceal secret images or conceal a concatenation of multiple images. In such ways, the interference of information among multiple images will become increasingly severe when the number of secret images becomes larger, thus restrict the development of very large capacity image steganography. In this paper, we propose an Invertible Mosaic Image Hiding Network (InvMIHNet) which realizes very large capacity image steganography with high quality by concealing a single mosaic secret image. InvMIHNet consists of an Invertible Image Rescaling (IIR) module and an Invertible Image Hiding (IIH) module. The IIR module works for downscaling the single mosaic secret image form by spatially splicing the multiple secret images, and the IIH module then conceal this mosaic image under the cover image. The proposed InvMIHNet successfully conceal and reveal up to 16 secret images with a small number of parameters and memory consumption. Extensive experiments on ImageNet-1K, COCO and DIV2K show InvMIHNet outperforms state-of-the-art methods in terms of both the imperceptibility of stego image and recover accuracy of secret image.

小樣本學習 · Learning · INFORMS · Networking · MoDELS ·

2023 年 9 月 15 日

Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation

Linhan Wang,Shuo Lei,Jianfeng He,Shengkun Wang,Min Zhang,Chang-Tien Lu

from arxiv, 10 pages, 6 figures. Accepted to Sigspatial 2023. arXiv admin note: text overlap with arXiv:2104.01538 by other authors

Remote sensing image semantic segmentation is an important problem for remote sensing image interpretation. Although remarkable progress has been achieved, existing deep neural network methods suffer from the reliance on massive training data. Few-shot remote sensing semantic segmentation aims at learning to segment target objects from a query image using only a few annotated support images of the target class. Most existing few-shot learning methods stem primarily from their sole focus on extracting information from support images, thereby failing to effectively address the large variance in appearance and scales of geographic objects. To tackle these challenges, we propose a Self-Correlation and Cross-Correlation Learning Network for the few-shot remote sensing image semantic segmentation. Our model enhances the generalization by considering both self-correlation and cross-correlation between support and query images to make segmentation predictions. To further explore the self-correlation with the query image, we propose to adopt a classical spectral method to produce a class-agnostic segmentation mask based on the basic visual information of the image. Extensive experiments on two remote sensing image datasets demonstrate the effectiveness and superiority of our model in few-shot remote sensing image semantic segmentation. Code and models will be accessed at //github.com/linhanwang/SCCNet.

MoDELS · binary · INFORMS · Weight · 相似度 ·

2023 年 9 月 15 日

Uncertainty-Aware Multi-View Visual Semantic Embedding

Wenzhang Wei,Zhipeng Gui,Changguang Wu,Anqi Zhao,Xingguang Wang,Huayi Wu

The key challenge in image-text retrieval is effectively leveraging semantic information to measure the similarity between vision and language data. However, using instance-level binary labels, where each image is paired with a single text, fails to capture multiple correspondences between different semantic units, leading to uncertainty in multi-modal semantic understanding. Although recent research has captured fine-grained information through more complex model structures or pre-training techniques, few studies have directly modeled uncertainty of correspondence to fully exploit binary labels. To address this issue, we propose an Uncertainty-Aware Multi-View Visual Semantic Embedding (UAMVSE)} framework that decomposes the overall image-text matching into multiple view-text matchings. Our framework introduce an uncertainty-aware loss function (UALoss) to compute the weighting of each view-text loss by adaptively modeling the uncertainty in each view-text correspondence. Different weightings guide the model to focus on different semantic information, enhancing the model's ability to comprehend the correspondence of images and texts. We also design an optimized image-text matching strategy by normalizing the similarity matrix to improve model performance. Experimental results on the Flicker30k and MS-COCO datasets demonstrate that UAMVSE outperforms state-of-the-art models.

Agent · Learning · 貝葉斯風險 · 時間步 · 樣本 ·

2023 年 9 月 14 日

Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing

Hadar Szostak,Kobi Cohen

from arxiv, A short version of this paper was presented at the annual Allerton Conference on Communication, Control, and Computing (Allerton) 2022

We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.

INFORMS · 互信息 · Extensibility · 后向 · 可辨認的 ·

2020 年 6 月 30 日

Adversarial Mutual Information for Text Generation

Boyuan Pan,Yazheng Yang,Kaizhao Liang,Bhavya Kailkhura,Zhongming Jin,Xian-Sheng Hua,Deng Cai,Bo Li

from arxiv, Published at ICML 2020

Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.

圖 · 圖卷積神經網絡/圖卷積網絡 · 知識圖譜 · 正則化項 · 圖卷積 ·

2019 年 5 月 11 日

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization

Hongwei Wang,Fuzheng Zhang,Mengdi Zhang,Jure Leskovec,Miao Zhao,Wenjie Li,Zhongyuan Wang

from arxiv, KDD 2019 research track oral

Knowledge graphs capture interlinked information between entities and they represent an attractive source of structured information that can be harnessed for recommender systems. However, existing recommender engines use knowledge graphs by manually designing features, do not allow for end-to-end training, or provide poor scalability. Here we propose Knowledge Graph Convolutional Networks (KGCN), an end-to-end trainable framework that harnesses item relationships captured by the knowledge graph to provide better recommendations. Conceptually, KGCN computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relations for a given user and then transforming the knowledge graph into a user-specific weighted graph. Then, KGCN applies a graph convolutional neural network that computes an embedding of an item node by propagating and aggregating knowledge graph neighborhood information. Moreover, to provide better inductive bias KGCN uses label smoothness (LS), which provides regularization over edge weights and we prove that it is equivalent to label propagation scheme on a graph. Finally, We unify KGCN and LS regularization, and present a scalable minibatch implementation for KGCN-LS model. Experiments show that KGCN-LS outperforms strong baselines in four datasets. KGCN-LS also achieves great performance in sparse scenarios and is highly scalable with respect to the knowledge graph size.

圖片分類 · 生成式對抗網絡 · Networking · 未標記 · GANs ·

2018 年 2 月 10 日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Zilong Zhong,Jonathan Li

from arxiv, Accepted by AAAI-18

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.