曰本中文字幕一区二区三区高清-亚洲国产A精品一区不卡

In this paper, we focus on a scenario where a single image contains objects of the same category but varying sizes, and we propose a lightweight approach that can not only recognize their category labels but also their real sizes. Our approach utilizes commonsense knowledge to assist a deep neural network (DNN) based coarse-grained object detector to achieve accurate size-related fine-grained detection. Specifically, we introduce a commonsense knowledge inference module (CKIM) that maps the coarse-grained labels produced by the DL detector to size-related fine-grained labels. Experimental results demonstrate that our approach achieves accurate fine-grained detections with a reduced amount of annotated data, and smaller model size, compared with baseline methods. Our code is available at: //github.com/ZJLAB-AMMI/CKIM.

相關內容

知識 (knowledge)

關注 12

通過學習、實踐或探索所獲得的認識、判斷或技能。

MoDELS · 語言模型化 · Automator · Extensibility · 張成子空間 ·

2023 年 10 月 25 日

Evaluating Hallucinations in Chinese Large Language Models

Qinyuan Cheng,Tianxiang Sun,Wenwei Zhang,Siyin Wang,Xiangyang Liu,Mozhi Zhang,Junliang He,Mianqiu Huang,Zhangyue Yin,Kai Chen,Xipeng Qiu

from arxiv, Work in progress

In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models.

LDPC · INFORMS · 解碼 · 可交換的 · Performer ·

2023 年 10 月 24 日

A Spatially Coupled LDPC Coding Scheme with Scalable Decoders for Space Division Multiplexing

Haizheng Li,Laurent Schmalen

from arxiv, 3 pages plus comments, 3 figures, European Conference on Optical Communication (ECOC) 2023

In this paper, we study the application of spatially coupled LDPC codes with sub-block locality for space division multiplexing. We focus on the information exchange between the sub-blocks and compare decoding strategies with respect to the complexity, performance and the information flow.

HTTPS · 操作 · 變換 · 論文 · WEB ·

2023 年 10 月 23 日

Curved Space-Filling Tiles Using Voronoi Decomposition with Line, and Curve Segments Closed Under Wallpaper Symmetries

Haard Panchal,Ergun Akleman,Vinayak Krishnamurthy,Tolga Talha Yildiz,Varda Grover

from arxiv, 11

In this paper, we present a new approach to obtain symmetric tiles with curved edges. Our approach is based on using higher-order Voronoi sites that are closed under wallpaper symmetries. The resulting Voronoi tessellations provide us with symmetric tiles with curved edges. We have developed a web application that provides real-time tile design. Our application can be found at //voronoi.viz.tamu.edu. One of our key findings in this paper is that not all symmetry operations are useful for creating curved tiles. In particular, all symmetries that use mirror operation produce straight lines that are useless for creating new tiles. This result is interesting because it suggests that we need to avoid mirror transformations to produce unusual space-filling tiles in 2D and 3D using Voronoi tessellations.

THz · 估計/估計量 · MIMO · 通道 · Performer ·

2023 年 10 月 23 日

Time-Domain Channel Estimation for Extremely Large MIMO THz Communications with Beam Squint

Evangelos Vlachos,Aryan Kaushik,Yonina C. Eldar,George C. Alexandropoulos

In this paper, we study the problem of extremely large (XL) multiple-input multiple-output (MIMO) channel estimation in the Terahertz (THz) frequency band, considering the presence of propagation delays across the entire array apertures, which leads to frequency selectivity, a problem known as beam squint. Multi-carrier transmission schemes which are usually deployed to address this problem, suffer from high peak-to-average power ratio, which is specifically dominant in THz communications where low transmit power is realized. Diverging from the usual approach, we devise a novel channel estimation problem formulation in the time domain for single-carrier (SC) modulation, which favors transmissions in THz, and incorporate the beam-squint effect in a sparse vector recovery problem that is solved via sparse optimization tools. In particular, the beam squint and the sparse MIMO channel are jointly tracked by using an alternating minimization approach that decomposes the two estimation problems. The presented performance evaluation results validate that the proposed SC technique exhibits superior performance than the conventional one as well as than state-of-the-art multi-carrier approaches.

SVBRDF · Ray · Learning · Color · BRDF ·

2023 年 10 月 23 日

Relit-NeuLF: Efficient Relighting and Novel View Synthesis via Neural 4D Light Field

Zhong Li,Liangchen Song,Zhang Chen,Xiangyu Du,Lele Chen,Junsong Yuan,Yi Xu

from arxiv, 10 pages

In this paper, we address the problem of simultaneous relighting and novel view synthesis of a complex scene from multi-view images with a limited number of light sources. We propose an analysis-synthesis approach called Relit-NeuLF. Following the recent neural 4D light field network (NeuLF), Relit-NeuLF first leverages a two-plane light field representation to parameterize each ray in a 4D coordinate system, enabling efficient learning and inference. Then, we recover the spatially-varying bidirectional reflectance distribution function (SVBRDF) of a 3D scene in a self-supervised manner. A DecomposeNet learns to map each ray to its SVBRDF components: albedo, normal, and roughness. Based on the decomposed BRDF components and conditioning light directions, a RenderNet learns to synthesize the color of the ray. To self-supervise the SVBRDF decomposition, we encourage the predicted ray color to be close to the physically-based rendering result using the microfacet model. Comprehensive experiments demonstrate that the proposed method is efficient and effective on both synthetic data and real-world human face data, and outperforms the state-of-the-art results. We publicly released our code on GitHub. You can find it here: //github.com/oppo-us-research/RelitNeuLF

Conformer · 噪聲 · 量子機器學習 · MoDELS · Machine Learning ·

2023 年 10 月 22 日

Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning

Sangwoo Park,Osvaldo Simeone

from arxiv, added detailed discussion on quantum hardware noise

In this work, we aim at augmenting the decisions output by quantum models with "error bars" that provide finite-sample coverage guarantees. Quantum models implement implicit probabilistic predictors that produce multiple random decisions for each input through measurement shots. Randomness arises not only from the inherent stochasticity of quantum measurements, but also from quantum gate noise and quantum measurement noise caused by noisy hardware. Furthermore, quantum noise may be correlated across shots and it may present drifts in time. This paper proposes to leverage such randomness to define prediction sets for both classification and regression that provably capture the uncertainty of the model. The approach builds on probabilistic conformal prediction (PCP), while accounting for the unique features of quantum models. Among the key technical innovations, we introduce a new general class of non-conformity scores that address the presence of quantum noise, including possible drifts. Experimental results, using both simulators and current quantum computers, confirm the theoretical calibration guarantees of the proposed framework.

簇 · Wireless Networks · 塊 · 優化器 · 基 ·

2023 年 10 月 22 日

Bandwidth Efficient Livestreaming in Mobile Wireless Networks: A Peer-to-Peer ACIDE Solution

Andrei Negulescu,Weijia Shang

from arxiv, 8 pages, 6 figures, Conference Submission

In this paper, a media distribution model, Active Control in an Intelligent and Distributed Environment (ACIDE), and solutions are proposed for video and audio livestreaming in mobile wireless networks. A base station and a cluster formed by a number of users are the essential components. Inside a cluster, users can establish peer to peer communications. The users that are members of a cluster are considered peers. This paper addresses the problem of minimizing the bandwidth allocated to a cluster of n peers such that a continuous media play of all the peers is guaranteed. The basic idea is to send the livestream media in packages. A media package is divided into n blocks. The distribution of blocks to the peers of a cluster follows a two-phase, multi-step approach. In phase 1 each peer receives one block with the optimal size from the base station. In phase 2, peers exchange their media blocks simultaneously in a few steps. Then the media package can be reconstructed and a live media can be played continuously. Allocated bandwidth, the amount of bandwidth the base station has to allocate to this cluster in order to play live streaming media without interruptions, is a function of many parameters such as the block sizes, download and upload bandwidth values of peers. This problem is formulated as an optimization problem. A solution is proposed to find the optimal block sizes such that the allocated bandwidth is minimized. Both theoretical model and simulations show that when the number of peers is large, the optimal allocated bandwidth approaches the lower bound that is the bandwidth required for multicasting. In other words, the allocated bandwidth may be reduced n times.

THz · TD · 優化器 · Microsoft Surface · 基 ·

2023 年 10 月 21 日

Wideband Beamforming for STAR-RIS-assisted THz Communications with Three-Side Beam Split

Wencai Yan,Wanming Hao,Gangcan Sun,Chongwen Huang,Qingqing Wu

In this paper, we consider the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-assisted THz communications with three-side beam split. Except for the beam split at the base station (BS), we analyze the double-side beam split at the STAR-RIS for the first time. To relieve the double-side beam split effect, we propose a time delayer (TD)-based fully-connected structure at the STAR-RIS. As a further advance, a low-hardware complexity and low-power consumption sub-connected structure is developed, where multiple STAR-RIS elements share one TD. Meanwhile, considering the practical scenario, we investigate a multi-STAR-RIS and multi-user communication system, and a sum rate maximization problem is formulated by jointly optimizing the hybrid analog/digital beamforming, time delays at the BS as well as the double-layer phase-shift coefficients, time delays and amplitude coefficients at the STAR-RISs. Based on this, we first allocate users for each STAR-RIS, and then derive the analog beamforming, time delays at the BS, and the double-layer phase-shift coefficients, time delays at each STAR-RIS. Next, we develop an alternative optimization algorithm to calculate the digital beamforming at the BS and amplitude coefficients at the STAR-RISs. Finally, the numerical results verify the effectiveness of the proposed schemes.

鏈路預測 · 圖 · 圖形處理器 · Neural Networks · Networking ·

2021 年 6 月 16 日

Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

Zhaocheng Zhu,Zuobai Zhang,Louis-Pascal Xhonneux,Jian Tang

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.