亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

<tr id='4O7rY'><strong id='oNFGa'></strong><small id='qgVPe'></small><button id='0AqZo'></button><li id='Wsf1R'><noscript id='LFl9B'><big id='WPmfl'></big><dt id='r9iPH'></dt></noscript></li></tr><ol id='c69Wl'><option id='i7uQ9'><table id='FSqTi'><blockquote id='KJgJm'><tbody id='SBHen'></tbody></blockquote></table></option></ol><u id='L4PEX'></u><kbd id='brKrm'><kbd id='1tsOJ'></kbd></kbd>

<code id='XU4PN'><strong id='HWwzl'></strong></code>

<fieldset id='adg00'></fieldset>

<span id='zWmIT'></span>

<ins id='iBGqU'></ins>

<acronym id='4zOPe'><em id='ciVgx'></em><td id='ZgSQ8'><div id='48I6v'></div></td></acronym><address id='n0GF6'><big id='sMCpY'><big id='aDdnm'></big><legend id='mzfym'></legend></big></address>

<i id='27YvY'><div id='DbbU0'><ins id='hQ1ld'></ins></div></i>

<i id='LVNuW'></i>

·

縮放 · 賭博機/老虎機 · 樣本 · 方差 · 均值 ·

2021 年 6 月 10 日

Diffusion Asymptotics for Sequential Experiments

Stefan Wager,Kuang Xu

We propose a new diffusion-asymptotic analysis for sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $ n $ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the behavior of a class of sequentially randomized Markov experiments converges to a diffusion limit, given as the solution of a stochastic differential equation. The diffusion limit thus enables us to derive refined, instance-specific characterization of the stochastic dynamics of adaptive experiments. As an application of this framework, we use the diffusion limit to obtain several new insights on the regret and belief evolution of Thompson sampling. We show that a version of Thompson sampling with an asymptotically uninformative prior variance achieves nearly-optimal instance-specific regret scaling when the reward gaps are relatively large. We also demonstrate that, in this regime, the posterior beliefs underlying Thompson sampling are highly unstable over time.

相關內容

MoDELS · 稀疏 · prototype · 稀疏激活 · 縮放 ·

2021 年 8 月 9 日

M6-T: Exploring Sparse Expert Models and Beyond

An Yang,Junyang Lin,Rui Men,Chang Zhou,Le Jiang,Xianyan Jia,Ang Wang,Jie Zhang,Jiamang Wang,Yong Li,Di Zhang,Wei Lin,Lin Qu,Jingren Zhou,Hongxia Yang

from arxiv, 16 pages, 8 figures

Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling. Still it is a mystery how MoE layers bring quality gains by leveraging the parameters with sparse activation. In this work, we investigate several key factors in sparse expert models. We observe that load imbalance may not be a significant problem affecting model quality, contrary to the perspectives of recent studies, while the number of sparsely activated experts $k$ and expert capacity $C$ in top-$k$ routing can significantly make a difference in this context. Furthermore, we take a step forward to propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing. This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models. We push the model scale to over $1$ trillion parameters and implement it on solely $480$ NVIDIA V100-32GB GPUs, in comparison with the recent SOTAs on $2048$ TPU cores. The proposed giant model achieves substantial speedup in convergence over the same-size baseline.

損失函數（機器學習） · MoDELS · 聯合分布 · 統計量 · 泛函 ·

2021 年 8 月 9 日

Bayesian decision-theoretic design of experiments under an alternative model

Antony M. Overstall,James M. McGree

from arxiv, Supplementary material appears as an appendix

Traditionally Bayesian decision-theoretic design of experiments proceeds by choosing a design to minimise expectation of a given loss function over the space of all designs. The loss function encapsulates the aim of the experiment, and the expectation is taken with respect to the joint distribution of all unknown quantities implied by the statistical model that will be fitted to observed responses. In this paper, an extended framework is proposed whereby the expectation of the loss is taken with respect to a joint distribution implied by an alternative statistical model. Motivation for this includes promoting robustness, ensuring computational feasibility and for allowing realistic prior specification when deriving a design. To aid in exploring the new framework, an asymptotic approximation to the expected loss under an alternative model is derived, and the properties of different loss functions are established. The framework is then demonstrated on a linear regression versus full-treatment model scenario, on estimating parameters of a non-linear model under model discrepancy and a cubic spline model under an unknown number of basis functions.

估計/估計量 · Fractal · 縮放 · Performer · 相關系數 ·

2021 年 8 月 9 日

Wavelet eigenvalue regression in high dimensions

Patrice Abry,B. Cooper Boniece,Gustavo Didier,Herwig Wendt

from arxiv, Companion to //arxiv.org/abs/2102.05761

In this paper, we construct the wavelet eigenvalue regression methodology in high dimensions. We assume that possibly non-Gaussian, finite-variance $p$-variate measurements are made of a low-dimensional $r$-variate ($r \ll p$) fractional stochastic process with non-canonical scaling coordinates and in the presence of additive high-dimensional noise. The measurements are correlated both time-wise and between rows. Building upon the asymptotic and large scale properties of wavelet random matrices in high dimensions, the wavelet eigenvalue regression is shown to be consistent and, under additional assumptions, asymptotically Gaussian in the estimation of the fractal structure of the system. We further construct a consistent estimator of the effective dimension $r$ of the system that significantly increases the robustness of the methodology. The estimation performance over finite samples is studied by means of simulations.

隨機梯度下降 · SGD · 正則化項 · 飽和 · 線性的 ·

2021 年 8 月 7 日

On the Saturation Phenomenon of Stochastic Gradient Descent for Linear Inverse Problems

Bangti Jin,Zehui Zhou,Jun Zou

from arxiv, to appear at SIAM/ASA J. Uncertainty Quantification, with error corrected

Stochastic gradient descent (SGD) is a promising method for solving large-scale inverse problems, due to its excellent scalability with respect to data size. The current mathematical theory in the lens of regularization theory predicts that SGD with a polynomially decaying stepsize schedule may suffer from an undesirable saturation phenomenon, i.e., the convergence rate does not further improve with the solution regularity index when it is beyond a certain range. In this work, we present a refined convergence rate analysis of SGD, and prove that saturation actually does not occur if the initial stepsize of the schedule is sufficiently small. Several numerical experiments are provided to complement the analysis.

統計量 · 推斷 · 估計/估計量 · 自助法/自舉法 · SimPLe ·

2021 年 8 月 7 日

Central limit theorems for high dimensional dependent data

Jinyuan Chang,Xiaohui Chen,Mingcong Wu

Motivated by statistical inference problems in high-dimensional time series analysis, we derive non-asymptotic error bounds for Gaussian approximations of sums of high-dimensional dependent random vectors on hyper-rectangles, simple convex sets and sparsely convex sets. We investigate the quantitative effect of temporal dependence on the rates of convergence to normality over three different dependency frameworks ($\alpha$-mixing, $m$-dependent, and physical dependence measure). In particular, we establish new error bounds under the $\alpha$-mixing framework and derive faster rate over existing results under the physical dependence measure. To implement the proposed results in practical statistical inference problems, we also derive a data-driven parametric bootstrap procedure based on a kernel-type estimator for the long-run covariance matrices.

Networking · 級聯 · 推斷 · 情景 · 極大 ·

2021 年 6 月 7 日

Network Inference and Influence Maximization from Samples

Wei Chen,Xiaoming Sun,Jialin Zhang,Zhijie Zhang

from arxiv, Accepted by ICML 2021

Influence maximization is the task of selecting a small number of seed nodes in a social network to maximize the spread of the influence from these seeds, and it has been widely investigated in the past two decades. In the canonical setting, the whole social network as well as its diffusion parameters is given as input. In this paper, we consider the more realistic sampling setting where the network is unknown and we only have a set of passively observed cascades that record the set of activated nodes at each diffusion step. We study the task of influence maximization from these cascade samples (IMS), and present constant approximation algorithms for this task under mild conditions on the seed set distribution. To achieve the optimization goal, we also provide a novel solution to the network inference problem, that is, learning diffusion parameters and the network structure from the cascade data. Comparing with prior solutions, our network inference algorithm requires weaker assumptions and does not rely on maximum-likelihood estimation and convex programming. Our IMS algorithms enhance the learning-and-then-optimization approach by allowing a constant approximation ratio even when the diffusion parameters are hard to learn, and we do not need any assumption related to the network structure or diffusion parameters.

變換 · Extensibility · INFORMS · Performer · MoDELS ·

2020 年 12 月 17 日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou,Shanghang Zhang,Jieqi Peng,Shuai Zhang,Jianxin Li,Hui Xiong,Wancai Zhang

from arxiv, 7 pages (main), 5 pages (appendix) and to be appeared in AAAI2021

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ Self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

解碼 · 次最優 · 推斷 · 圖像字幕 · Performer ·

2019 年 2 月 28 日

Insertion-based Decoding with automatically Inferred Generation Order

Jiatao Gu,Qi Liu,Kyunghyun Cho

from arxiv, New version with clearer formulations and extended pages. Work in progress

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm -- InDIGO -- which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared to the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

圖像分割 · Performer · 相似度 · 穩健性 · Processing（編程語言） ·

2018 年 3 月 17 日

Adaptive strategy for superpixel-based region-growing image segmentation

Mahaman Sani Chaibou,Pierre-Henri Conze,Karim Kalti,Basel Solaiman,Mohamed Ali Mahjoub

This work presents a region-growing image segmentation approach based on superpixel decomposition. From an initial contour-constrained over-segmentation of the input image, the image segmentation is achieved by iteratively merging similar superpixels into regions. This approach raises two key issues: (1) how to compute the similarity between superpixels in order to perform accurate merging and (2) in which order those superpixels must be merged together. In this perspective, we firstly introduce a robust adaptive multi-scale superpixel similarity in which region comparisons are made both at content and common border level. Secondly, we propose a global merging strategy to efficiently guide the region merging process. Such strategy uses an adpative merging criterion to ensure that best region aggregations are given highest priorities. This allows to reach a final segmentation into consistent regions with strong boundary adherence. We perform experiments on the BSDS500 image dataset to highlight to which extent our method compares favorably against other well-known image segmentation algorithms. The obtained results demonstrate the promising potential of the proposed approach.

Twitter · 情感分析 · AIM · 情感分類 · 流 ·

2015 年 9 月 14 日

Twitter Sentiment Analysis

Afroze Ibrahim Baqapuri

from arxiv, Bachelors Thesis Report

This project addresses the problem of sentiment analysis in twitter; that is classifying tweets according to the sentiment expressed in them: positive, negative or neutral. Twitter is an online micro-blogging and social-networking platform which allows users to write short status updates of maximum length 140 characters. It is a rapidly expanding service with over 200 million registered users - out of which 100 million are active users and half of them log on twitter on a daily basis - generating nearly 250 million tweets per day. Due to this large amount of usage we hope to achieve a reflection of public sentiment by analysing the sentiments expressed in the tweets. Analysing the public sentiment is important for many applications such as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like stock exchange. The aim of this project is to develop a functional classifier for accurate and automatic sentiment classification of an unknown tweet stream.

閱讀: 0 點贊: 0

小貼士

登錄享

相關主題

賭博機/老虎機

北京阿比特科技有限公司

注冊地址：北京市海淀區羊坊店路18號2幢3層301-191

<dir id='dc4s0'><del id='dc4s0'><del id='dc4s0'></del><pre id='dc4s0'><pre id='dc4s0'><option id='dc4s0'><address id='dc4s0'></address><bdo id='dc4s0'><tr id='dc4s0'><acronym id='dc4s0'><pre id='dc4s0'></pre></acronym><div id='dc4s0'></div></tr></bdo></option></pre><small id='dc4s0'><address id='dc4s0'><u id='dc4s0'><legend id='dc4s0'><option id='dc4s0'><abbr id='dc4s0'></abbr><li id='dc4s0'><pre id='dc4s0'></pre></li></option></legend><select id='dc4s0'></select></u></address></small></pre></del><sup id='dc4s0'></sup><blockquote id='dc4s0'><dt id='dc4s0'></dt></blockquote><blockquote id='dc4s0'></blockquote></dir><tt id='dc4s0'></tt><u id='dc4s0'><tt id='dc4s0'><form id='dc4s0'></form></tt><td id='dc4s0'><dt id='dc4s0'></dt></td></u>

<code id='dc4s0'><i id='dc4s0'><q id='dc4s0'><legend id='dc4s0'><pre id='dc4s0'><style id='dc4s0'><acronym id='dc4s0'><i id='dc4s0'><form id='dc4s0'><option id='dc4s0'><center id='dc4s0'></center></option></form></i></acronym></style><tt id='dc4s0'></tt></pre></legend></q></i></code><center id='dc4s0'></center>

<dd id='dc4s0'></dd>

<style id='dc4s0'></style><sub id='dc4s0'><dfn id='dc4s0'><abbr id='dc4s0'><big id='dc4s0'><bdo id='dc4s0'></bdo></big></abbr></dfn></sub>_{<dir id='dc4s0'></dir>}