苍井空无码免费换线_亚洲丁香婷婷久久综合激情综合_国际精品久久久毛片久久久久久久_香蕉国产精品偷在线_超碰性爱在线播放_日韩欧美特级一区二区三区四区_三级片一级片在线观看

Growing interest in conversational agents promote twoway human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual questionanswer pair(s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions. The proposed method initially extracts list of answer words, then does nearest question generation that uses the caption and answer word to generate synthetic question. Next, the relevant question generator converts the nearest question to relevant language question by dependency parsing and in-order tree traversal, finally, fine-tune a ViLBERT model with the question-answer pair(s) generated at end. We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperform SOTA methods on BLEU scores. We also show the results wrt baseline models and ablation study.

知識薈萃

精品入門和進階教程、論文(wen)和代碼整理等

查看相關VIP內容(rong)、論文、資(zi)訊等

優化器 · 得分 · Agent · 知識 (knowledge) · 情景 ·

2023 年 10 月 26 日

Optimal Scoring Rule Design under Partial Knowledge

Yiling Chen,Fang-Yi Yu

This paper studies the design of optimal proper scoring rules when the principal has partial knowledge of an agent's signal distribution. Recent work characterizes the proper scoring rules that maximize the increase of an agent's payoff when the agent chooses to access a costly signal to refine a posterior belief from her prior prediction, under the assumption that the agent's signal distribution is fully known to the principal. In our setting, the principal only knows about a set of distributions where the agent's signal distribution belongs. We formulate the scoring rule design problem as a max-min optimization that maximizes the worst-case increase in payoff across the set of distributions. We propose an efficient algorithm to compute an optimal scoring rule when the set of distributions is finite, and devise a fully polynomial-time approximation scheme that accommodates various infinite sets of distributions. We further remark that widely used scoring rules, such as the quadratic and log rules, as well as previously identified optimal scoring rules under full knowledge, can be far from optimal in our partial knowledge settings.

代碼 · MoDELS · Nuance · Notability · 語言模型化 ·

2023 年 10 月 25 日

Language Agnostic Code Embeddings

Saiteja Utpala,Alex Gu,Pin Yu Chen

Recently, code language models have achieved notable advancements in addressing a diverse array of essential code comprehension and generation tasks. Yet, the field lacks a comprehensive deep dive and understanding of the code embeddings of multilingual code models. In this paper, we present a comprehensive study on multilingual code embeddings, focusing on the cross-lingual capabilities of these embeddings across different programming languages. Through probing experiments, we demonstrate that code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details, primarily focusing on semantics. Further, we show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks, leading to an absolute increase of up to +17 in the Mean Reciprocal Rank (MRR).

INTERACT · 模態 · Learning · 多峰值 · INFORMS ·

2023 年 10 月 25 日

Learning Unseen Modality Interaction

Yunhua Zhang,Hazel Doughty,Cees G. M. Snoek

from arxiv, Published at NeurIPS 2023

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive for generalization to unseen modality combinations during inference. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved. This allows the information to be accumulated with a simple summation operation across available modalities. To reduce overfitting to less discriminative modality combinations during training, we further improve the model learning with pseudo-supervision indicating the reliability of a modality's prediction. We demonstrate that our approach is effective for diverse tasks and modalities by evaluating it for multimodal video classification, robot state regression, and multimedia retrieval. Project website: //xiaobai1217.github.io/Unseen-Modality-Interaction/.

語言模型化 · MoDELS · 知識 (knowledge) · Performer · 優化器 ·

2023 年 10 月 25 日

Retrieve Anything To Augment Large Language Models

Peitian Zhang,Shitao Xiao,Zheng Liu,Zhicheng Dou,Jian-Yun Nie

Large language models (LLMs) face significant challenges stemming from their inherent limitations in knowledge, memory, alignment, and action. These challenges cannot be addressed by LLMs alone, but should rely on assistance from the external world, such as knowledge base, memory store, demonstration examples, and tools. Retrieval augmentation stands as a vital mechanism for bridging the gap between LLMs and the external assistance. However, conventional methods encounter two pressing issues. On the one hand, the general-purpose retrievers are not properly optimized for the retrieval augmentation of LLMs. On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios. In this work, we present a novel approach, the LLM-Embedder, which comprehensively supports the diverse retrieval augmentation needs of LLMs with one unified embedding model. Training such a unified model is non-trivial, as various retrieval tasks aim to capture distinct semantic relationships, often subject to mutual interference. To address this challenge, we systematically optimize our training methodology. This includes reward formulation based on LLMs' feedback, the stabilization of knowledge distillation, multi-task fine-tuning with explicit instructions, and homogeneous in-batch negative sampling. These optimization strategies contribute to the outstanding empirical performance of the LLM-Embedder. Notably, it yields remarkable enhancements in retrieval augmentation for LLMs, surpassing both general-purpose and task-specific retrievers in various evaluation scenarios. Our checkpoint and source code are publicly available at //github.com/FlagOpen/FlagEmbedding.

DNN · Learning · 切分點 · Networking · 聯邦學習 ·

2023 年 10 月 24 日

Accelerating Split Federated Learning over Wireless Communication Networks

Ce Xu,Jinxuan Li,Yuan Liu,Yushi Ling,Miaowen Wen

The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.

INFORMS · 解碼 · 可約的 · HTTPS · 泛函 ·

2023 年 10 月 24 日

Semantic Change Driven Generative Semantic Communication Framework

Wanting Yang,Zehui Xiong,Hongyang Du,Yanli Yuan,Tony Q. S. Quek

The burgeoning generative artificial intelligence technology offers novel insights into the development of semantic communication (SemCom) frameworks. These frameworks hold the potential to address the challenges associated with the black-box nature inherent in existing end-to-end training manner for the existing SemCom framework, as well as deterioration of the user experience caused by the inevitable error floor in deep learning-based SemCom. In this paper, we focus on the widespread remote monitoring scenario, and propose a semantic change driven generative SemCom framework. Therein, the semantic encoder and semantic decoder can be optimized independently. Specifically, we develop a modular semantic encoder with value of information based semantic sampling function. In addition, we propose a conditional denoising diffusion probabilistic mode-assisted semantic decoder that relies on received semantic information from the source, namely, the semantic map, and the local static scene information to remotely regenerate scenes. Moreover, we demonstrate the effectiveness of the proposed semantic encoder and decoder as well as the considerable potential in reducing energy consumption through simulation based on the realistic $\mathcal{F}$ composite channel fading model. The code is available at //github.com/wty2011jl/SCDGSC.git.

prototype · 可約的 · RNN · 文本分類 · MoDELS ·

2023 年 10 月 24 日

Interpretable Text Classification Via Prototype Trajectories

Dat Hong,Stephen S. Baek,Tong Wang

We propose a novel interpretable deep neural network for text classification, called ProtoryNet, based on a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each sentence to the corresponding active prototype. The RNN backbone then captures the temporal pattern of the prototypes, which we refer to as prototype trajectories. Prototype trajectories enable intuitive and fine-grained interpretation of the reasoning process of the RNN model, in resemblance to how humans analyze texts. We also design a prototype pruning procedure to reduce the total number of prototypes used by the model for better interpretability. Experiments on multiple public data sets show that ProtoryNet is more accurate than the baseline prototype-based deep neural net and reduces the performance gap compared to state-of-the-art black-box models. In addition, after prototype pruning, the resulting ProtoryNet models only need less than or around 20 prototypes for all datasets, which significantly benefits interpretability. Furthermore, we report a survey result indicating that human users find ProtoryNet more intuitive and easier to understand than other prototype-based methods.

潛在 · 表示 · MoDELS · AIM · 線性的 ·

2023 年 10 月 23 日

Course Correcting Koopman Representations

Mahan Fathi,Clement Gehring,Jonathan Pilault,David Kanaa,Pierre-Luc Bacon,Ross Goroshin

Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.

離散化 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2019 年 3 月 28 日

Learning Discrete Structures for Graph Neural Networks

Luca Franceschi,Mathias Niepert,Massimiliano Pontil,Xiao He

from arxiv, 18 pages

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

BLEU · MoDELS · 注意力機制 · Transformer · Networking ·

2017 年 12 月 6 日

Attention Is All You Need

Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin

from arxiv, 15 pages, 5 figures

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.