无码人妻一区二区三区在线不卡_日韩A级毛片免费视频_高中小鲜肉自慰GAY免费_祼体美女黄色网站_毛片内射一区二区三区_中文字幕伦一区二区三区_国产精品久久青青7777

from arxiv, 43 pages, 3 figures. Some details were added to the proof of Theorem 9. Section V-C was added with some discussion of the maximal error case. Some other minor edits

A new converse bound is presented for the two-user multiple-access channel under the average probability of error constraint. This bound shows that for most channels of interest, the second-order coding rate -- that is, the difference between the best achievable rates and the asymptotic capacity region as a function of blocklength $n$ with fixed probability of error -- is $O(1/\sqrt{n})$ bits per channel use. The principal tool behind this converse proof is a new measure of dependence between two random variables called wringing dependence, as it is inspired by Ahlswede's wringing technique. The $O(1/\sqrt{n})$ gap is shown to hold for any channel satisfying certain regularity conditions, which includes all discrete-memoryless channels and the Gaussian multiple-access channel. Exact upper bounds as a function of the probability of error are proved for the coefficient in the $O(1/\sqrt{n})$ term, although for most channels they do not match existing achievable bounds.

相關內容

通道

關注 1

奇異的 · UniFormer · 線性的 · 正則化項 · 查準率/準確率 ·

2021 年 11 月 24 日

On the convergence of Broyden's method and some accelerated schemes for singular problems

Florian Mannel

from arxiv, 32 pages, 8 tables, 1 figure

We consider Broyden's method and some accelerated schemes for nonlinear equations having a strongly regular singularity of first order with a one-dimensional nullspace. Our two main results are as follows. First, we show that the use of a preceding Newton--like step ensures convergence for starting points in a starlike domain with density 1. This extends the domain of convergence of these methods significantly. Second, we establish that the matrix updates of Broyden's method converge q-linearly with the same asymptotic factor as the iterates. This contributes to the long--standing question whether the Broyden matrices converge by showing that this is indeed the case for the setting at hand. Furthermore, we prove that the Broyden directions violate uniform linear independence, which implies that existing results for convergence of the Broyden matrices cannot be applied. Numerical experiments of high precision confirm the enlarged domain of convergence, the q-linear convergence of the matrix updates, and the lack of uniform linear independence. In addition, they suggest that these results can be extended to singularities of higher order and that Broyden's method can converge r-linearly without converging q-linearly. The underlying code is freely available.

ARM · 賭博機/老虎機 · 樣本 · Weight · 目標函數 ·

2021 年 11 月 24 日

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

Kaito Ariu,Masahiro Kato,Junpei Komiyama,Kenichiro McAlinn,Chao Qin

from arxiv, Submitted to Econometrica

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.

INFORMS · 極小點 · 互信息 · CASE · Extensibility ·

2021 年 11 月 24 日

The CEO problem with inter-block memory

Victoria Kostina,Babak Hassibi

An $n$-dimensional source with memory is observed by $K$ isolated encoders via parallel channels, who compress their observations to transmit to the decoder via noiseless rate-constrained links while leveraging their memory of the past. At each time instant, the decoder receives $K$ new codewords from the observers, combines them with the past received codewords, and produces a minimum-distortion estimate of the latest block of $n$ source symbols. This scenario extends the classical one-shot CEO problem to multiple rounds of communication with communicators maintaining the memory of the past. We extend the Berger-Tung inner and outer bounds to the scenario with inter-block memory, showing that the minimum asymptotically (as $n \to \infty$) achievable sum rate required to achieve a target distortion is bounded by minimal directed mutual information problems. For the Gauss-Markov source observed via $K$ parallel AWGN channels, we show that the inner bound is tight and solve the corresponding minimal directed mutual information problem, thereby establishing the minimum asymptotically achievable sum rate. Finally, we explicitly bound the rate loss due to a lack of communication among the observers; that bound is attained with equality in the case of identical observation channels. The general coding theorem is proved via a new nonasymptotic bound that uses stochastic likelihood coders and whose asymptotic analysis yields an extension of the Berger-Tung inner bound to the causal setting. The analysis of the Gaussian case is facilitated by reversing the channels of the observers.

Performer · 泛函 · 可交換的 · SimPLe · 值域 ·

2021 年 11 月 22 日

On the Local Communication Complexity of Counting and Modular Arithmetic

Bala Kalyanasundaram,Calvin Newport

from arxiv, 23 pages, 1 figure

In standard number-in-hand multi-party communication complexity, performance is measured as the total number of bits transmitted globally in the network. In this paper, we study a variation called local communication complexity in which performance instead measures the maximum number of bits sent or received at any one player. We focus on a simple model where $n$ players, each with one input bit, execute a protocol by exchanging messages to compute a function on the $n$ input bits. We ask what can and cannot be solved with a small local communication complexity in this setting. We begin by establishing a non-trivial lower bound on the local complexity for a specific function by proving that counting the number of $1$'s among the first $17$ input bits distributed among the participants requires a local complexity strictly greater than $1$. We further investigate whether harder counting problems of this type can yield stronger lower bounds, providing a largely negative answer by showing that constant local complexity is sufficient to count the number $1$ bits over the entire input, and therefore compute any symmetric function. In addition to counting, we show that both sorting and searching can be computed in constant local complexity. We then use the counting solution as a subroutine to demonstrate that constant local complexity is also sufficient to compute many standard modular arithmetic operations on two operands, including: comparisons, addition, subtraction, multiplication, division, and exponentiation. Finally we establish that function $GCD(x,y)$ where $x$ and $y$ are in the range $[1,n]$ has local complexity of $O(1)$. Our work highlights both new techniques for proving lower bounds on this metric and the power of even a small amount of local communication.

輸入分布 · 標量 · 約束 · 優化器 · 梯度上升 ·

2021 年 11 月 22 日

Scalar Gaussian Wiretap Channel with Peak Amplitude Constraint: Numerical Computation of the Optimal Input Distribution

Luca Barletta,Alex Dytso

from arxiv, Submitted to IEEE ICC 2022. arXiv admin note: substantial text overlap with arXiv:2111.11371

This paper studies a scalar Gaussian wiretap channel where instead of an average input power constraint, we consider a peak amplitude constraint on the input. The goal is to obtain insights into the secrecy-capacity and the structure of the secrecy-capacity-achieving distribution. Capitalizing on the recent theoretical progress on the structure of the secrecy-capacity-achieving distribution, this paper develops a numerical procedure, based on the gradient ascent algorithm and a version of the Blahut-Arimoto algorithm, for computing the secrecy-capacity and the secrecy-capacity-achieving input and output distributions.

邊緣化 · 對率損失 · FAST · 線性分類 · Performer ·

2021 年 7 月 1 日

Fast Margin Maximization via Dual Acceleration

Ziwei Ji,Nathan Srebro,Matus Telgarsky

from arxiv, ICML 2021

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

圖形處理器 · Neural Networks · 圖 · 規范化的 · Principle ·

2021 年 2 月 16 日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Tianle Cai,Shengjie Luo,Keyulu Xu,Di He,Tie-Yan Liu,Liwei Wang

Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.

簇 · GROUP · 聚類方法 · Performer · contrastive ·

2019 年 2 月 28 日

Efficient Parameter-free Clustering Using First Neighbor Relations

M. Saquib Sarfraz,Vivek Sharma,Rainer Stiefelhagen

from arxiv, CVPR 2019

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

可約的 · 參數空間 · Neural Networks · Networking · 修正線性單元/整流線性單元 ·

2018 年 8 月 17 日

Reducing Parameter Space for Neural Network Training

Tong Qin,Ling Zhou,Dongbin Xiu

from arxiv, 17 pages, 8 figures

For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.

MoDELS · SimPLe · CC · 模型評估 · 高斯混合（模型） ·

2018 年 2 月 24 日

The Search Problem in Mixture Models

Avik Ray,Joe Neeman,Sujay Sanghavi,Sanjay Shakkottai

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.