亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The relationship between the number of training data points, the number of parameters in a statistical model, and the generalization capabilities of the model has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime, and believe that the standard bias-variance trade-off holds in the under-parameterized regime. In this paper, we present a simple example that provably exhibits double descent in the under-parameterized regime. For simplicity, we look at the ridge regularized least squares denoising problem with data on a line embedded in high-dimension space. By deriving an asymptotically accurate formula for the generalization error, we observe sample-wise and parameter-wise double descent with the peak in the under-parameterized regime rather than at the interpolation point or in the over-parameterized regime. Further, the peak of the sample-wise double descent curve corresponds to a peak in the curve for the norm of the estimator, and adjusting $\mu$, the strength of the ridge regularization, shifts the location of the peak. We observe that parameter-wise double descent occurs for this model for small $\mu$. For larger values of $\mu$, we observe that the curve for the norm of the estimator has a peak but that this no longer translates to a peak in the generalization error. Moreover, we study the training error for this problem. The considered problem setup allows for studying the interaction between two regularizers. We provide empirical evidence that the model implicitly favors using the ridge regularizer over the input data noise regularizer. Thus, we show that even though both regularizers regularize the same quantity, i.e., the norm of the estimator, they are not equivalent.

相關內容

Standard multiparameter eigenvalue problems (MEPs) are systems of $k\ge 2$ linear $k$-parameter square matrix pencils. Recently, a new form of multiparameter eigenvalue problems has emerged: a rectangular MEP (RMEP) with only one multivariate rectangular matrix pencil, where we are looking for combinations of the parameters for which the rank of the pencil is not full. Applications include finding the optimal least squares autoregressive moving average (ARMA) model and the optimal least squares realization of autonomous linear time-invariant (LTI) dynamical system. For linear and polynomial RMEPs, we give the number of solutions and show how these problems can be solved numerically by a transformation into a standard MEP. For the transformation we provide new linearizations for quadratic multivariate matrix polynomials with a specific structure of monomials and consider mixed systems of rectangular and square multivariate matrix polynomials. This numerical approach seems computationally considerably more attractive than the block Macaulay method, the only other currently available numerical method for polynomial RMEPs.

We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.

This work establishes provably faster convergence rates for gradient descent via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.

Mart{\'\i}nez-Pe{\~n}as and Kschischang (IEEE Trans.\ Inf.\ Theory, 2019) proposed lifted linearized Reed--Solomon codes as suitable codes for error control in multishot network coding. We show how to construct and decode \ac{LILRS} codes. Compared to the construction by Mart{\'\i}nez-Pe{\~n}as--Kschischang, interleaving allows to increase the decoding region significantly and decreases the overhead due to the lifting (i.e., increases the code rate), at the cost of an increased packet size. We propose two decoding schemes for \ac{LILRS} that are both capable of correcting insertions and deletions beyond half the minimum distance of the code by either allowing a list or a small decoding failure probability. We propose a probabilistic unique {\LOlike} decoder for \ac{LILRS} codes and an efficient interpolation-based decoding scheme that can be either used as a list decoder (with exponential worst-case list size) or as a probabilistic unique decoder. We derive upper bounds on the decoding failure probability of the probabilistic-unique decoders which show that the decoding failure probability is very small for most channel realizations up to the maximal decoding radius. The tightness of the bounds is verified by Monte Carlo simulations.

This paper introduces a simplified variation of the PaDiM (Pixel-Wise Anomaly Detection through Instance Modeling) method for anomaly detection in images, fitting a single multivariate Gaussian (MVG) distribution to the feature vectors extracted from a backbone convolutional neural network (CNN) and using their Mahalanobis distance as the anomaly score. We introduce an intermediate step in this framework by applying a whitening transformation to the feature vectors, which enables the generation of heatmaps capable of visually explaining the features learned by the MVG. The proposed technique is evaluated on the MVTec-AD dataset, and the results show the importance of visual model validation, providing insights into issues in this framework that were otherwise invisible. The visualizations generated for this paper are publicly available at //doi.org/10.5281/zenodo.7937978.

The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.

In this manuscript, we propose to utilize the generative neural network-based variational autoencoder for channel estimation. The variational autoencoder models the underlying true but unknown channel distribution as a conditional Gaussian distribution in a novel way. The derived channel estimator exploits the internal structure of the variational autoencoder to parameterize an approximation of the mean squared error optimal estimator resulting from the conditional Gaussian channel models. We provide a rigorous analysis under which conditions a variational autoencoder-based estimator is mean squared error optimal. We then present considerations that make the variational autoencoder-based estimator practical and propose three different estimator variants that differ in their access to channel knowledge during the training and evaluation phase. In particular, the proposed estimator variant trained solely on noisy pilot observations is particularly noteworthy as it does not require access to noise-free, ground-truth channel data during training or evaluation. Extensive numerical simulations first analyze the internal behavior of the variational autoencoder-based estimators and then demonstrate excellent channel estimation performance compared to related classical and machine learning-based state-of-the-art channel estimators.

A Two-Stage approach is described that literally "straighten outs" any potentially nonlinear relationship between a y-outcome variable and each of p = 2 or more potential x-predictor variables. The y-outcome is then predicted from all p of these "linearized" spline-predictors using the form of Generalized Ridge Regression that is most likely to yield minimal MSE risk under Normal distribution-theory. These estimates are then compared and contrasted with those from the Generalized Additive Model that uses the same x-variables.

We consider additive Schwarz methods for boundary value problems involving the $p$-Laplacian. Although the existing theoretical estimates indicate a sublinear convergence rate for these methods, empirical evidence from numerical experiments demonstrates a linear convergence rate. In this paper, we narrow the gap between these theoretical and empirical results by presenting a novel convergence analysis. Firstly, we present an abstract convergence theory of additive Schwarz methods written in terms of a quasi-norm. This quasi-norm exhibits behavior similar to the Bregman distance of the convex energy functional associated to the problem. Secondly, we provide a quasi-norm version of the Poincar'{e}--Friedrichs inequality, which is essential for deriving a quasi-norm stable decomposition for a two-level domain decomposition setting. By utilizing these two key elements, we establish a new bound for the linear convergence rate of the methods.

We consider error-correction coding schemes for adversarial wiretap channels (AWTCs) in which the channel can a) read a fraction of the codeword bits up to a bound $r$ and b) flip a fraction of the bits up to a bound $p$. The channel can freely choose the locations of the bit reads and bit flips via a process with unbounded computational power. Codes for the AWTC are of broad interest in the area of information security, as they can provide data resiliency in settings where an attacker has limited access to a storage or transmission medium. We investigate a family of non-linear codes known as pseudolinear codes, which were first proposed by Guruswami and Indyk (FOCS 2001) for constructing list-decodable codes independent of the AWTC setting. Unlike general non-linear codes, pseudolinear codes admit efficient encoders and have succinct representations. We focus on unique decoding and show that random pseudolinear codes can achieve rates up to the binary symmetric channel (BSC) capacity $1-H_2(p)$ for any $p,r$ in the less noisy region: $p<1/2$ and $r<1-H_2(p)$ where $H_2(\cdot)$ is the binary entropy function. Thus, pseudolinear codes are the first known optimal-rate binary code family for the less noisy AWTC that admit efficient encoders. The above result can be viewed as a derandomization result of random general codes in the AWTC setting, which in turn opens new avenues for applying derandomization techniques to randomized constructions of AWTC codes. Our proof applies a novel concentration inequality for sums of random variables with limited independence which may be of interest as an analysis tool more generally.

北京阿比特科技有限公司