亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A new converse bound is presented for the two-user multiple-access channel under the average probability of error constraint. This bound shows that for most channels of interest, the second-order coding rate -- that is, the difference between the best achievable rates and the asymptotic capacity region as a function of blocklength $n$ with fixed probability of error -- is $O(1/\sqrt{n})$ bits per channel use. The principal tool behind this converse proof is a new measure of dependence between two random variables called wringing dependence, as it is inspired by Ahlswede's wringing technique. The $O(1/\sqrt{n})$ gap is shown to hold for any channel satisfying certain regularity conditions, which includes all discrete-memoryless channels and the Gaussian multiple-access channel. Exact upper bounds as a function of the probability of error are proved for the coefficient in the $O(1/\sqrt{n})$ term, although for most channels they do not match existing achievable bounds.

相關內容

We consider Broyden's method and some accelerated schemes for nonlinear equations having a strongly regular singularity of first order with a one-dimensional nullspace. Our two main results are as follows. First, we show that the use of a preceding Newton--like step ensures convergence for starting points in a starlike domain with density 1. This extends the domain of convergence of these methods significantly. Second, we establish that the matrix updates of Broyden's method converge q-linearly with the same asymptotic factor as the iterates. This contributes to the long--standing question whether the Broyden matrices converge by showing that this is indeed the case for the setting at hand. Furthermore, we prove that the Broyden directions violate uniform linear independence, which implies that existing results for convergence of the Broyden matrices cannot be applied. Numerical experiments of high precision confirm the enlarged domain of convergence, the q-linear convergence of the matrix updates, and the lack of uniform linear independence. In addition, they suggest that these results can be extended to singularities of higher order and that Broyden's method can converge r-linearly without converging q-linearly. The underlying code is freely available.

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive the asymptotic optimality of exploration sampling.

An $n$-dimensional source with memory is observed by $K$ isolated encoders via parallel channels, who compress their observations to transmit to the decoder via noiseless rate-constrained links while leveraging their memory of the past. At each time instant, the decoder receives $K$ new codewords from the observers, combines them with the past received codewords, and produces a minimum-distortion estimate of the latest block of $n$ source symbols. This scenario extends the classical one-shot CEO problem to multiple rounds of communication with communicators maintaining the memory of the past. We extend the Berger-Tung inner and outer bounds to the scenario with inter-block memory, showing that the minimum asymptotically (as $n \to \infty$) achievable sum rate required to achieve a target distortion is bounded by minimal directed mutual information problems. For the Gauss-Markov source observed via $K$ parallel AWGN channels, we show that the inner bound is tight and solve the corresponding minimal directed mutual information problem, thereby establishing the minimum asymptotically achievable sum rate. Finally, we explicitly bound the rate loss due to a lack of communication among the observers; that bound is attained with equality in the case of identical observation channels. The general coding theorem is proved via a new nonasymptotic bound that uses stochastic likelihood coders and whose asymptotic analysis yields an extension of the Berger-Tung inner bound to the causal setting. The analysis of the Gaussian case is facilitated by reversing the channels of the observers.

In standard number-in-hand multi-party communication complexity, performance is measured as the total number of bits transmitted globally in the network. In this paper, we study a variation called local communication complexity in which performance instead measures the maximum number of bits sent or received at any one player. We focus on a simple model where $n$ players, each with one input bit, execute a protocol by exchanging messages to compute a function on the $n$ input bits. We ask what can and cannot be solved with a small local communication complexity in this setting. We begin by establishing a non-trivial lower bound on the local complexity for a specific function by proving that counting the number of $1$'s among the first $17$ input bits distributed among the participants requires a local complexity strictly greater than $1$. We further investigate whether harder counting problems of this type can yield stronger lower bounds, providing a largely negative answer by showing that constant local complexity is sufficient to count the number $1$ bits over the entire input, and therefore compute any symmetric function. In addition to counting, we show that both sorting and searching can be computed in constant local complexity. We then use the counting solution as a subroutine to demonstrate that constant local complexity is also sufficient to compute many standard modular arithmetic operations on two operands, including: comparisons, addition, subtraction, multiplication, division, and exponentiation. Finally we establish that function $GCD(x,y)$ where $x$ and $y$ are in the range $[1,n]$ has local complexity of $O(1)$. Our work highlights both new techniques for proving lower bounds on this metric and the power of even a small amount of local communication.

This paper studies a scalar Gaussian wiretap channel where instead of an average input power constraint, we consider a peak amplitude constraint on the input. The goal is to obtain insights into the secrecy-capacity and the structure of the secrecy-capacity-achieving distribution. Capitalizing on the recent theoretical progress on the structure of the secrecy-capacity-achieving distribution, this paper develops a numerical procedure, based on the gradient ascent algorithm and a version of the Blahut-Arimoto algorithm, for computing the secrecy-capacity and the secrecy-capacity-achieving input and output distributions.

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$. This contrasts with a rate of $\mathcal{O}(1/\log(t))$ for standard gradient descent, and $\mathcal{O}(1/t)$ for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive non-uniform sampling via the dual variables.

Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司