精品自在线观看影片天天看_亚洲精品国产字幕久久AV_久久九九RE热这里有精品_久久精品搭讪一区二区三区无码_无码专区三级大片三级大片无码专区大黄片_亚洲三区欧美一区二区在线_国产精品免费看久久久麻豆

Existing pedestrian attribute recognition (PAR) algorithms adopt pre-trained CNN (e.g., ResNet) as their backbone network for visual feature learning, which might obtain sub-optimal results due to the insufficient employment of the relations between pedestrian images and attribute labels. In this paper, we formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels. Specifically, the attribute phrases are first expanded into sentences, and then the pre-trained vision-language model CLIP is adopted as our backbone for feature embedding of visual images and attribute descriptions. The contrastive learning objective connects the vision and language modalities well in the CLIP-based feature space, and the Transformer layers used in CLIP can capture the long-range relations between pixels. Then, a multi-modal Transformer is adopted to fuse the dual features effectively and feed-forward network is used to predict attributes. To optimize our network efficiently, we propose the region-aware prompt tuning technique to adjust very few parameters (i.e., only the prompt vectors and classification heads) and fix both the pre-trained VL model and multi-modal Transformer. Our proposed PAR algorithm only adjusts 0.75% learnable parameters compared with the fine-tuning strategy. It also achieves new state-of-the-art performance on both standard and zero-shot settings for PAR, including RAPv1, RAPv2, WIDER, PA100K, and PETA-ZS, RAP-ZS datasets. The source code and pre-trained models will be released on //github.com/Event-AHU/OpenPAR.

相關內容

CLIP

關注 1

可理解性 · MoDELS · 表示 · Networking · Performer ·

2024 年 2 月 6 日

Position Paper: Toward New Frameworks for Studying Model Representations

Satvik Golechha,James Dao

Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn. Most works in MI so far have studied behaviors and capabilities that are trivial and token-aligned. However, most capabilities are not that trivial, which advocates for the study of hidden representations inside these networks as the unit of analysis. We do a literature review, formalize representations for features and behaviors, highlight their importance and evaluation, and perform some basic exploration in the mechanistic interpretability of representations. With discussion and exploratory results, we justify our position that studying representations is an important and under-studied field, and that currently established methods in MI are not sufficient to understand representations, thus pushing for the research community to work toward new frameworks for studying representations.

Continuity · 估計/估計量 · 優化器 · 代價 · MoDELS ·

2024 年 2 月 6 日

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

Alexander Kolesov,Petr Mokrov,Igor Udovichenko,Milena Gazdieva,Gudmund Pammer,Anastasis Kratsios,Evgeny Burnaev,Alexander Korotin

Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.

帶符號距離 · Networking · 塑造 · Microsoft Surface · Neural Networks ·

2024 年 2 月 5 日

Zero-Level-Set Encoder for Neural Distance Fields

Stefan Rhys Jeske,Jonathan Klein,Dominik L. Michels,Jan Bender

Neural shape representation generally refers to representing 3D geometry using neural networks, e.g., to compute a signed distance or occupancy value at a specific spatial position. In this paper, we present a novel encoder-decoder neural network for embedding 3D shapes in a single forward pass. Our architecture is based on a multi-scale hybrid system incorporating graph-based and voxel-based components, as well as a continuously differentiable decoder. Furthermore, the network is trained to solve the Eikonal equation and only requires knowledge of the zero-level set for training and inference. This means that in contrast to most previous work, our network is able to output valid signed distance fields without explicit prior knowledge of non-zero distance values or shape occupancy. We further propose a modification of the loss function in case that surface normals are not well defined, e.g., in the context of non-watertight surfaces and non-manifold geometry. Overall, this can help reduce the computational overhead of training and evaluating neural distance fields, as well as enabling the application to difficult shapes. We finally demonstrate the efficacy, generalizability and scalability of our method on datasets consisting of deforming shapes, both based on simulated data and raw 3D scans. We further show single-class and multi-class encoding, on both fixed and variable vertex-count inputs, showcasing a wide range of possible applications.

Networking · 優化器 · 圖 · INFORMS · AIM ·

2024 年 2 月 5 日

Quantum Switches for Gottesman-Kitaev-Preskill Qubit-based All-Photonic Quantum Networks

Mohadeseh Azari,Paul Polakos,Kaushik P. Seshadreesan

from arxiv, 13 pages, 8 Figures

The Gottesman-Kitaev-Preskill (GKP) code, being information theoretically near optimal for quantum communication over Gaussian thermal-loss optical channels, is likely to be the encoding of choice for advanced quantum networks of the future. Quantum repeaters based on GKP-encoded light have been shown to support high end-to-end entanglement rates across large distances despite realistic finite squeezing in GKP code preparation and homodyne detection inefficiencies. Here, we introduce a quantum switch for GKP-qubit-based quantum networks, whose architecture involves multiplexed GKP-qubit-based entanglement link generation with clients, and their all-photonic storage, together enabled by GKP-qubit graph state resources. For bipartite entanglement distribution between clients via entanglement swapping, the switch uses a multi-client generalization of a recently introduced $\textit{entanglement-ranking-based link matching}$ protocol heuristic. Since generating the GKP-qubit graph state resource is hardware intensive, given a total resource budget and an arbitrary layout of clients, we address the question of their optimal allocation towards the different client-pair connections served by the switch such that the sum throughput of the switch is maximized while also being fair in terms of the individual entanglement rates. We illustrate our results for an exemplary data center network, where the data center is a client of a switch and all of its other clients aim to connect to the data center alone -- a scenario that also captures the general case of a gateway router connecting a local area network to a global network. Together with compatible quantum repeaters, our quantum switch provides a way to realize quantum networks of arbitrary topology.

MIMO · massive MIMO · Analysis · 極大 · Wireless Networks ·

2024 年 2 月 3 日

Finite-Precision Arithmetic Transceiver for Massive MIMO Systems

Yiming Fang,Li Chen,Yunfei Chen,Huarui Yin

from arxiv, 16 pages, 8 figures. Submitted to IEEE JSAC for possible publication

Efficient implementation of massive multiple-input-multiple-output (MIMO) transceivers is essential for the next-generation wireless networks. To reduce the high computational complexity of the massive MIMO transceiver, in this paper, we propose a new massive MIMO architecture using finite-precision arithmetic. First, we conduct the rounding error analysis and derive the lower bound of the achievable rate for single-input-multiple-output (SIMO) using maximal ratio combining (MRC) and multiple-input-single-output (MISO) systems using maximal ratio transmission (MRT) with finite-precision arithmetic. Then, considering the multi-user scenario, the rounding error analysis of zero-forcing (ZF) detection and precoding is derived by using the normal equations (NE) method. The corresponding lower bounds of the achievable sum rate are also derived and asymptotic analyses are presented. Built upon insights from these analyses and lower bounds, we propose a mixed-precision architecture for massive MIMO systems to offset performance gaps due to finite-precision arithmetic. The corresponding analysis of rounding errors and computational costs is obtained. Simulation results validate the derived bounds and underscore the superiority of the proposed mixed-precision architecture to the conventional structure.

INTERACT · 控制器 · MoDELS · 可約的 · HTTPS ·

2024 年 2 月 2 日

Scalable Multi-modal Model Predictive Control via Duality-based Interaction Predictions

Hansung Kim,Siddharth H. Nair,Francesco Borrelli

from arxiv, Submitted to IEEE Intelligent Vehicles Symposium 2024

We propose a hierarchical architecture designed for scalable real-time Model Predictive Control (MPC) in complex, multi-modal traffic scenarios. This architecture comprises two key components: 1) RAID-Net, a novel attention-based Recurrent Neural Network that predicts relevant interactions along the MPC prediction horizon between the autonomous vehicle and the surrounding vehicles using Lagrangian duality, and 2) a reduced Stochastic MPC problem that eliminates irrelevant collision avoidance constraints, enhancing computational efficiency. Our approach is demonstrated in a simulated traffic intersection with interactive surrounding vehicles, showcasing a 12x speed-up in solving the motion planning problem. A video demonstrating the proposed architecture in multiple complex traffic scenarios can be found here: //youtu.be/-TcMeolCLWc

Processing（編程語言） · 操作 ·

2024 年 2 月 1 日

Rule Formats for Nominal Process Calculi

Luca Aceto,Ignacio Fábregas,álvaro García-Pérez,Anna Ingólfsdóttir,Yolanda Ortega-Mallén

from arxiv, Conference version of arXiv:1807.02081

The nominal transition systems (NTSs) of Parrow et al. describe the operational semantics of nominal process calculi. We study NTSs in terms of the nominal residual transition systems (NRTSs) that we introduce. We provide rule formats for the specifications of NRTSs that ensure that the associated NRTS is an NTS and apply them to the operational specification of the early pi-calculus. Our study stems from the recent Nominal SOS of Cimini et al. and from earlier works in nominal sets and nominal logic by Gabbay, Pitts and their collaborators.

結構化學習 · 圖 · 稀疏 · 圖形處理器 · Neural Networks ·

2021 年 12 月 13 日

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Yinhua Piao,Sangseon Lee,Dohoon Lee,Sun Kim

from arxiv, Accepted by AAAI 2022

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structures can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document.

entity · Performer · 圖 · 知識圖譜 · MoDELS ·

2019 年 6 月 4 日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Deepak Nathani,Jatin Chauhan,Charu Sharma,Manohar Kaul

from arxiv, accepted as long paper in ACL 2019

The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.

Performer · state-of-the-art · 注意力機制 · Networking · MoDELS ·

2017 年 12 月 6 日

Distance-based Self-Attention Network for Natural Language Inference

Jinbae Im,Sungzoon Cho

from arxiv, 12 pages, 13 figures

Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.