久久久久精品电影,久久久一卡二卡三卡四卡,日日人人天天夜夜一区二区三区,日本三级片一区二区三区免费看

_{^{<dd id='stja8'><tbody id='stja8'><td id='stja8'><optgroup id='stja8'><strong id='stja8'></strong></optgroup><address id='stja8'><ul id='stja8'></ul></address><big id='stja8'></big></td><table id='stja8'></table></tbody><pre id='stja8'></pre></dd><span id='stja8'><b id='stja8'></b></span>}}


<dfn id='stja8'><optgroup id='stja8'></optgroup></dfn><tfoot id='stja8'><bdo id='stja8'><div id='stja8'></div><i id='stja8'><dt id='stja8'></dt></i></bdo></tfoot>

_{<fieldset id='stja8'></fieldset>}

結構化學習 · Learning · 有向非循環圖 · 有向 · 知識 (knowledge) ·

2024 年 1 月 18 日

Guided structure learning of DAGs for count data

Thi Kim Hue Nguyen,Monica Chiogna,Davide Risso,Erika Banzato

from arxiv, arXiv admin note: text overlap with arXiv:1810.10854

In this paper, we tackle structure learning of Directed Acyclic Graphs (DAGs), with the idea of exploiting available prior knowledge of the domain at hand to guide the search of the best structure. In particular, we assume to know the topological ordering of variables in addition to the given data. We study a new algorithm for learning the structure of DAGs, proving its theoretical consistence in the limit of infinite observations. Furthermore, we experimentally compare the proposed algorithm to a number of popular competitors, in order to study its behavior in finite samples.

相關內容

結構化學習

關注 0

離散化 · 周期的 · 傅立葉變換 · 統計量 · 泛函 ·

2024 年 3 月 1 日

Universality of almost periodicity in bounded discrete time series

Tsuyoshi Yoneda

We consider arbitrary bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find an almost periodic function which suitably characterizes the corresponding time series.

Machine Learning · Learning · motivation · Transformer模型 · 變換 ·

2024 年 2 月 29 日

Machine learning for modular multiplication

Kristin Lauter,Cathy Yuanchen Li,Krystal Maughan,Rachel Newton,Megha Srivastava

from arxiv, 14 pages, 12 figures. Comments welcome!

Motivated by cryptographic applications, we investigate two machine learning approaches to modular multiplication: namely circular regression and a sequence-to-sequence transformer model. The limited success of both methods demonstrated in our results gives evidence for the hardness of tasks involving modular multiplication upon which cryptosystems are based.

MoDELS · 語言模型化 · Performer · 大語言模型 · AIM ·

2024 年 2 月 29 日

PeLLE: Encoder-based language models for Brazilian Portuguese based on open data

Guilherme Lamartine de Mello,Marcelo Finger,and Felipe Serras,Miguel de Mello Carpi,Marcos Menon Jose,Pedro Henrique Domingues,Paulo Cavalim

from arxiv, 15 pages

In this paper we present PeLLE, a family of large language models based on the RoBERTa architecture, for Brazilian Portuguese, trained on curated, open data from the Carolina corpus. Aiming at reproducible results, we describe details of the pretraining of the models. We also evaluate PeLLE models against a set of existing multilingual and PT-BR refined pretrained Transformer-based LLM encoders, contrasting performance of large versus smaller-but-curated pretrained models in several downstream tasks. We conclude that several tasks perform better with larger models, but some tasks benefit from smaller-but-curated data in its pretraining.

Weight · 泛函 · 相同 · 類別 · 論文 ·

2024 年 2 月 29 日

High multiplicity of positive solutions in a superlinear problem of Moore-Nehari type

Pablo Cubillos,Julián López-Gómez,Andrea Tellini

In this paper we consider a superlinear one-dimensional elliptic boundary value problem that generalizes the one studied by Moore and Nehari in [43]. Specifically, we deal with piecewise-constant weight functions in front of the nonlinearity with an arbitrary number $\kappa\geq 1$ of vanishing regions. We study, from an analytic and numerical point of view, the number of positive solutions, depending on the value of a parameter $\lambda$ and on $\kappa$. Our main results are twofold. On the one hand, we study analytically the behavior of the solutions, as $\lambda\downarrow-\infty$, in the regions where the weight vanishes. Our result leads us to conjecture the existence of $2^{\kappa+1}-1$ solutions for sufficiently negative $\lambda$. On the other hand, we support such a conjecture with the results of numerical simulations which also shed light on the structure of the global bifurcation diagrams in $\lambda$ and the profiles of positive solutions. Finally, we give additional numerical results suggesting that the same high multiplicity result holds true for a much larger class of weights, also arbitrarily close to situations where there is uniqueness of positive solutions.

正交 · 可約的 · 離散化 · 穩健性 · 泛函 ·

2024 年 2 月 29 日

An Adaptive Orthogonal Basis Method for Computing Multiple Solutions of Differential Equations with polynomial nonlinearities

Lin Li,Yangyi Ye,Wenrui Hao,Huiyuan Li

This paper presents an innovative approach, the Adaptive Orthogonal Basis Method, tailored for computing multiple solutions to differential equations characterized by polynomial nonlinearities. Departing from conventional practices of predefining candidate basis pools, our novel method adaptively computes bases, considering the equation's nature and structural characteristics of the solution. It further leverages companion matrix techniques to generate initial guesses for subsequent computations. Thus this approach not only yields numerous initial guesses for solving such equations but also adapts orthogonal basis functions to effectively address discretized nonlinear systems. Through a series of numerical experiments, this paper demonstrates the method's effectiveness and robustness. By reducing computational costs in various applications, this novel approach opens new avenues for uncovering multiple solutions to differential equations with polynomial nonlinearities.

規范化的 · 估計/估計量 · CASE · 損失函數（機器學習） · 重參數化技巧 ·

2024 年 2 月 28 日

Training normalizing flows with computationally intensive target probability distributions

Piotr Bialas,Piotr Korcyl,Tomasz Stebel

from arxiv, 16 pages, 5 figures, 6 tables, 3 listings. Revised version as published in CPC. Added results for a other values of hoping parameter

Machine learning techniques, in particular the so-called normalizing flows, are becoming increasingly popular in the context of Monte Carlo simulations as they can effectively approximate target probability distributions. In the case of lattice field theories (LFT) the target distribution is given by the exponential of the action. The common loss function's gradient estimator based on the "reparametrization trick" requires the calculation of the derivative of the action with respect to the fields. This can present a significant computational cost for complicated, non-local actions like e.g. fermionic action in QCD. In this contribution, we propose an estimator for normalizing flows based on the REINFORCE algorithm that avoids this issue. We apply it to two dimensional Schwinger model with Wilson fermions at criticality and show that it is up to ten times faster in terms of the wall-clock time as well as requiring up to $30\%$ less memory than the reparameterization trick estimator. It is also more numerically stable allowing for single precision calculations and the use of half-float tensor cores. We present an in-depth analysis of the origins of those improvements. We believe that these benefits will appear also outside the realm of the LFT, in each case where the target probability distribution is computationally intensive.

多樣性 · 查全率/召回率 · 查準率/準確率 · MoDELS · Nuance ·

2024 年 2 月 28 日

Exploring Precision and Recall to assess the quality and diversity of LLMs

Florian Le Bronnec,Alexandre Verine,Benjamin Negrevergne,Yann Chevaleyre,Alexandre Allauzen

from arxiv, 21 pages, 15 figures, Under Review

This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and Mistral, focusing on the adaptation of Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals significant insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges faced by current LLMs in generating diverse and high-quality text.

Microsoft Surface · Neural Networks · Networking · MoDELS · 損失函數（機器學習） ·

2021 年 5 月 28 日

Incorporating prior financial domain knowledge into neural networks for implied volatility surface prediction

Yu Zheng,Yongxin Yang,Bowei Chen

from arxiv, 8 pages, SIGKDD 2021

In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.

Neural Networks · Parse · Networking · 粵港澳大灣區數字經濟研究院 · 解析樹 ·

2021 年 2 月 25 日

How to represent part-whole hierarchies in a neural network

Geoffrey Hinton

from arxiv, 43 pages, 5 figures

This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.