高清一区二区三区视频在线观看_日本中文字幕高清专区久久_亚洲中文字幕日韩在线中文字幕日韩_热99精品香蕉视频_亚洲综合一区二区三区在线_久久亚洲国产中文精品影院_99精品国产综合久久麻豆

Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based approaches is that they usually require special pre-defined glyph components, e.g., strokes and radicals, which is infeasible for AFFG of different languages. In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. We calculate the similarity scores of the target character and the referenced samples by measuring the distance along the corresponding channels from the content features, and assigning them as the weights for aggregating the global style features. To better capture the local styles, a cross-attention-based style transfer module is adopted to transfer the styles of reference glyphs to the components, where the components are self-learned discrete latent codes through vector quantization without manual definition. With these designs, our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics. The experimental results reflect the effectiveness and generalization of the proposed method on different linguistic scripts, and also show its superiority when compared with other state-of-the-art methods. The source code can be found at //github.com/awei669/VQ-Font.

相關內容

相似(si)度

關注 2

MoDELS · 集成 · 假陽性 · 置信度 · 目標檢測 ·

2023 年 10 月 29 日

BEA: Revisiting anchor-based object detection DNN using Budding Ensemble Architecture

Syed Sha Qutub,Neslihan Kose,Rafael Rosales,Michael Paulitsch,Korbinian Hagn,Florian Geissler,Yang Peng,Gereon Hinz,Alois Knoll

from arxiv, 14 pages, 5 pages supplementary material. Accepted at BMVC-2023

This paper introduces the Budding Ensemble Architecture (BEA), a novel reduced ensemble architecture for anchor-based object detection models. Object detection models are crucial in vision-based tasks, particularly in autonomous systems. They should provide precise bounding box detections while also calibrating their predicted confidence scores, leading to higher-quality uncertainty estimates. However, current models may make erroneous decisions due to false positives receiving high scores or true positives being discarded due to low scores. BEA aims to address these issues. The proposed loss functions in BEA improve the confidence score calibration and lower the uncertainty error, which results in a better distinction of true and false positives and, eventually, higher accuracy of the object detection models. Both Base-YOLOv3 and SSD models were enhanced using the BEA method and its proposed loss functions. The BEA on Base-YOLOv3 trained on the KITTI dataset results in a 6% and 3.7% increase in mAP and AP50, respectively. Utilizing a well-balanced uncertainty estimation threshold to discard samples in real-time even leads to a 9.6% higher AP50 than its base model. This is attributed to a 40% increase in the area under the AP50-based retention curve used to measure the quality of calibration of confidence scores. Furthermore, BEA-YOLOV3 trained on KITTI provides superior out-of-distribution detection on Citypersons, BDD100K, and COCO datasets compared to the ensembles and vanilla models of YOLOv3 and Gaussian-YOLOv3.

語言模型化 · MoDELS · 講稿 · 可理解性 · 論文 ·

2023 年 10 月 26 日

Evaluation of large language models using an Indian language LGBTI+ lexicon

Aditya Joshi,Shruta Rawat,Alpana Dange

from arxiv, Selected for publication in the AI Ethics Journal published by the Artificial Intelligence Robotics Ethics Society (AIRES)

Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU. Such benchmarks do not examine responsible behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+ context where social stereotypes may result in variation in LGBTI+ terminology. Therefore, domain-specific lexicons or dictionaries may be useful as a representative list of words against which the LLM's behaviour needs to be evaluated. This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages. The methodology consists of four steps: formulating NLP tasks relevant to the expected behaviour, creating prompts that test LLMs, using the LLMs to obtain the output and, finally, manually evaluating the results. Our qualitative analysis shows that the three LLMs we experiment on are unable to detect underlying hateful content. Similarly, we observe limitations in using machine translation as means to evaluate natural language understanding in languages other than English. The methodology presented in this paper can be useful for LGBTI+ lexicons in other languages as well as other domain-specific lexicons. The work done in this paper opens avenues for responsible behaviour of LLMs, as demonstrated in the context of prevalent social perception of the LGBTI+ community.

可理解性 · 均值 · 語言模型化 · MoDELS · state-of-the-art ·

2023 年 10 月 26 日

Meaning and understanding in large language models

Vladimír Havlík

from arxiv, 20 pages

Can a machine understand the meanings of natural language? Recent developments in the generative large language models (LLMs) of artificial intelligence have led to the belief that traditional philosophical assumptions about machine understanding of language need to be revised. This article critically evaluates the prevailing tendency to regard machine language performance as mere syntactic manipulation and the simulation of understanding, which is only partial and very shallow, without sufficient referential grounding in the world. The aim is to highlight the conditions crucial to attributing natural language understanding to state-of-the-art LLMs, where it can be legitimately argued that LLMs not only use syntax but also semantics, their understanding not being simulated but duplicated; and determine how they ground the meanings of linguistic expressions.

重要性采樣 · 邊緣化 · 樣本 · 估計/估計量 · 后向 ·

2023 年 10 月 26 日

Using autodiff to estimate posterior moments, marginals and samples

Sam Bowyer,Thomas Heap,Laurence Aitchison

Importance sampling is a popular technique in Bayesian inference: by reweighting samples drawn from a proposal distribution we are able to obtain samples and moment estimates from a Bayesian posterior over some $n$ latent variables. Recent work, however, indicates that importance sampling scales poorly -- in order to accurately approximate the true posterior, the required number of importance samples grows is exponential in the number of latent variables [Chatterjee and Diaconis, 2018]. Massively parallel importance sampling works around this issue by drawing $K$ samples for each of the $n$ latent variables and reasoning about all $K^n$ combinations of latent samples. In principle, we can reason efficiently over $K^n$ combinations of samples by exploiting conditional independencies in the generative model. However, in practice this requires complex algorithms that traverse backwards through the graphical model, and we need separate backward traversals for each computation (posterior expectations, marginals and samples). Our contribution is to exploit the source term trick from physics to entirely avoid the need to hand-write backward traversals. Instead, we demonstrate how to simply and easily compute all the required quantities -- posterior expectations, marginals and samples -- by differentiating through a slightly modified marginal likelihood estimator.

語音增強 · MoDELS · 方差 · 生成模型 · 自編碼器 ·

2023 年 10 月 26 日

A weighted-variance variational autoencoder model for speech enhancement

Ali Golmakani,Mostafa Sadeghi,Xavier Alameda-Pineda,Romain Serizel

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

推斷 · 查準率/準確率 · 置信度 · 有偏 · 近似 ·

2023 年 10 月 25 日

Simulation based stacking

Yuling Yao,Bruno Régaldo-Saint Blancard,Justin Domke

Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a provable asymptotic guarantee, we present a general stacking framework to make use of all available posterior approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.

優化器 · 非凸 · CASES · CASE · 平滑 ·

2023 年 10 月 18 日

Optimal and parameter-free gradient minimization methods for smooth optimization

Guanghui Lan,Yuyuan Ouyang,Zhe Zhang

We propose novel optimal and parameter-free algorithms for computing an approximate solution for smooth optimization with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of the (projected) gradient is not greater than $\varepsilon$, we have the following results for the cases of convex, strongly convex, and nonconvex problems: a) for the convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L\|x_0 - x^*\|\varepsilon}$, where $L$ is the Lipschitz constant of the gradient function and $x^*$ is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|)$, where $\mu$ is the strong convexity constant; c) for the nonconvex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$, where $l$ is the lower curvature constant. Our complexity results match the lower complexity bounds of all three cases of problems. Our analysis can be applied to both unconstrained problems and problems with constrained feasible sets; we demonstrate our strategy for analyzing the complexity of computing solutions with small projected gradient norm in the convex case. For all the convex, strongly convex, and nonconvex cases, we also propose parameter-free algorithms that does not require the knowledge of any problem parameter. To the best of our knowledge, our paper is the first one that achieves the $O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$ complexity for convex problems with constraint feasible sets, the $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon$ complexity for nonconvex problems, and optimal complexities for convex, strongly convex, and nonconvex problems through parameter-free algorithms.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.

哈希學習 · 圖像檢索 · 卷積神經網絡 · 優化器 · 損失函數（機器學習） ·

2020 年 6 月 10 日

A survey on deep hashing for image retrieval

Xiaopeng Zhang

Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.

ELMo · MoDELS · 一詞多義性 · 雙向語言模型 · 語言模型化 ·

2018 年 3 月 22 日

Deep contextualized word representations

Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer

from arxiv, NAACL 2018. Originally posted to openreview 27 Oct 2017. v2 updated for NAACL camera ready

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.