露脸视频一区二区三区在线播放_日韩精品国产阿V免费在线观看网址_人人摸人人干人人射_日韩一区二区三区在线现役_又爽又色的视频在线观看_欧美日韩高清免费不卡一区_亚洲国产日韩V在线欧美

During multiple testing, researchers often adjust their alpha level to control the familywise error rate for a statistical inference about a joint union alternative hypothesis (e.g., "H1 or H2"). However, in some cases, they do not make this inference. Instead, they make separate inferences about each of the individual hypotheses that comprise the joint hypothesis (e.g., H1 and H2). For example, a researcher might use a Bonferroni correction to adjust their alpha level from the conventional level of 0.050 to 0.025 when testing H1 and H2, find a significant result for H1 (p < 0.025) and not for H2 (p > .0.025), and so claim support for H1 and not for H2. However, these separate individual inferences do not require an alpha adjustment. Only a statistical inference about the union alternative hypothesis "H1 or H2" requires an alpha adjustment because it is based on "at least one" significant result among the two tests, and so it depends on the familywise error rate. When a researcher corrects their alpha level during multiple testing but does not make an inference about the union alternative hypothesis, their correction is redundant. In the present article, I discuss this redundant correction problem, including its reduction in statistical power for tests of individual hypotheses and its potential causes vis-\`a-vis error rate confusions and the alpha adjustment ritual. I also provide three illustrations of redundant corrections from recent psychology studies. I conclude that redundant corrections represent a symptom of statisticism, and I call for a more nuanced inference-based approach to multiple testing corrections.

相關內容

推斷(duan)

關注 5

MoDELS · 塊 · 蒙特卡羅 · 統計量 · 頻率主義學派 ·

2024 年 3 月 6 日

Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels

Vishesh Karwa,Debdeep Pati,Sonja Petrovi?,Liam Solus,Nikita Alexeev,Mateja Rai?,Dane Wilburne,Robert Williams,Bowei Yan

from arxiv, substantial revision from v3, updated simulations and theoretical discussions

We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel, and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behavior. The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.

Cognition · MoDELS · 線性的 · Sigmoid（一種激活函數） · 分段 ·

2024 年 3 月 6 日

A comparison of mixed-models for the analysis of non-linear longitudinal data: application to late-life cognitive trajectories

Maude Wagner,Donald R. Hedeker,Tianhao Wang,Graciela Muniz-Terrera,Ana W. Capuano

from arxiv, 34 pages, 7 Figures, 1 Table

Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used to answer research questions related to non-linear changes in cognitive aging: the linear mixed model (LMM) with a quadratic term, LMM with splines, the functional mixed model, the piecewise linear mixed model, and the sigmoidal mixed model. We first theoretically describe the models. Next, using data from two prospective cohorts with annual cognitive testing, we compared the interpretation of the models by investigating associations of education on cognitive change before death. Lastly, we performed a simulation study to empirically evaluate the models and provide practical recommendations. Except for the LMM-quadratic, the fit of all models was generally adequate to capture non-linearity of cognitive change and models were relatively robust. Although spline-based models have no interpretable nonlinearity parameters, their convergence was easier to achieve, and they allow graphical interpretation. In contrast, piecewise and sigmoidal models, with interpretable non-linear parameters, may require more data to achieve convergence.

模型評估 · 泛化理論 · Networking · 正則化項 · 權重衰減 ·

2024 年 3 月 4 日

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

Darshil Doshi,Aritra Das,Tianyu He,Andrey Gromov

from arxiv, 9+20 pages, 7+25 figures, 2 tables

Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones. Namely, we consider multi-layer perceptron (MLP) and Transformer architectures trained on modular arithmetic tasks, where ($\xi \cdot 100\%$) of labels are corrupted (\emph{i.e.} some results of the modular operations in the training set are incorrect). We show that (i) it is possible for the network to memorize the corrupted labels \emph{and} achieve $100\%$ generalization at the same time; (ii) the memorizing neurons can be identified and pruned, lowering the accuracy on corrupted data and improving the accuracy on uncorrupted data; (iii) regularization methods such as weight decay, dropout and BatchNorm force the network to ignore the corrupted data during optimization, and achieve $100\%$ accuracy on the uncorrupted dataset; and (iv) the effect of these regularization methods is (``mechanistically'') interpretable: weight decay and dropout force all the neurons to learn generalizing representations, while BatchNorm de-amplifies the output of memorizing neurons and amplifies the output of the generalizing ones. Finally, we show that in the presence of regularization, the training dynamics involves two consecutive stages: first, the network undergoes \emph{grokking} dynamics reaching high train \emph{and} test accuracy; second, it unlearns the memorizing representations, where the train accuracy suddenly jumps from $100\%$ to $100 (1-\xi)\%$.

估計/估計量 · 極大似然 · 最大似然估計 · 簇 · 分離的 ·

2024 年 3 月 4 日

Nonparametric consistency for maximum likelihood estimation and clustering based on mixtures of elliptically-symmetric distributions

Pietro Coretto,Christian Hennig

The consistency of the maximum likelihood estimator for mixtures of elliptically-symmetric distributions for estimating its population version is shown, where the underlying distribution $P$ is nonparametric and does not necessarily belong to the class of mixtures on which the estimator is based. In a situation where $P$ is a mixture of well enough separated but nonparametric distributions it is shown that the components of the population version of the estimator correspond to the well separated components of $P$. This provides some theoretical justification for the use of such estimators for cluster analysis in case that $P$ has well separated subpopulations even if these subpopulations differ from what the mixture model assumes.

MoDELS · Processing（編程語言） · Performer · 估計/估計量 · 知識 (knowledge) ·

2024 年 3 月 4 日

On decision-theoretic model assessment for structural deterioration monitoring

Nicholas E. Silionis,Konstantinos N. Anyfantis

from arxiv, 25 pages, 14 figures, 1 table

As data from monitored structures become increasingly available, the demand grows for it to be used efficiently to add value to structural operation and management. One way in which this can be achieved is to use structural response measurements to assess the usefulness of models employed to describe deterioration processes acting on a structure, as well the mechanical behavior of the latter. This is what this work aims to achieve by first, framing Structural Health Monitoring as a Bayesian model updating problem, in which the quantities of inferential interest characterize the deterioration process and/or structural state. Then, using the posterior estimates of these quantities, a decision-theoretic definition is proposed to assess the structural and/or deterioration models based on (a) their ability to explain the data and (b) their performance on downstream decision support-based tasks. The proposed framework is demonstrated on strain response data obtained from a test specimen which was subjected to three-point bending while simultaneously exposed to accelerated corrosion leading to thickness loss. Results indicate that the level of \textit{a priori} domain knowledge on the deterioration form is critical.

Weight · 泛函 · 核化 · 可約的 · Processing（編程語言） ·

2024 年 3 月 2 日

Kpop: A kernel balancing approach for reducing specification assumptions in survey weighting

Erin Hartman,Chad Hazlett,Ciara Sterbenz

With the precipitous decline in response rates, researchers and pollsters have been left with highly non-representative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables, $X$ must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly-used calibration weights -- which make the weighted mean $X$ in the sample equal that of the population -- only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of $X$ are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix $\mathbf{X}$ with a kernel matrix, $\mathbf{K}$ encoding high-order information about $\mathbf{X}$. Weights are then found to make the weighted average row of $\mathbf{K}$ among sampled units approximately equal that of the target population. This produces good calibration on a wide range of smooth functions of $X$, without relying on the user to decide which $X$ or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 U.S. presidential election.

SGD · 線性的 · Networking · 廣義線性模型 · 線性模型 ·

2024 年 3 月 1 日

Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD

Luca Arnaboldi,Florent Krzakala,Bruno Loureiro,Ludovic Stephan

This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d \log d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

有向 · 均值 · Extensibility · 可辨認的 · 線性的 ·

2024 年 3 月 1 日

Changepoint problem with angular data using a measure of variation based on the intrinsic geometry of torus

Surojit Biswas,Buddhananda Banerjee,Arnab Kumar Laha

In many temporally ordered data sets, it is observed that the parameters of the underlying distribution change abruptly at unknown times. The detection of such changepoints is important for many applications. While this problem has been studied substantially in the linear data setup, not much work has been done for angular data. In this article, we utilize the intrinsic geometry of a torus to introduce the notion of the `square of an angle' and use it to propose a new measure of variation, called the `curved variance', of an angular random variable. Using the above ideas, we propose new tests for the existence of changepoint(s) in the concentration, mean direction, and/or both of these. The limiting distributions of the test statistics are derived and their powers are obtained using extensive simulation. It is seen that the tests have better power than the corresponding existing tests. The proposed methods have been implemented on three real-life data sets revealing interesting insights. In particular, our method when used to detect simultaneous changes in mean direction and concentration for hourly wind direction measurements of the cyclonic storm `Amphan' identified changepoints that could be associated with important meteorological events.

生成方法 · MoDELS · 論文 · binary · 受試者工作特征 ·

2024 年 2 月 29 日

Resolving power: A general approach to compare the distinguishing ability of threshold-free evaluation metrics

Colin S. Beam

from arxiv, 20 pages, 9 figures, 2 tables

Selecting an evaluation metric is fundamental to model development, but uncertainty remains about when certain metrics are preferable and why. This paper introduces the concept of resolving power to describe the ability of an evaluation metric to distinguish between binary classifiers of similar quality. This ability depends on two attributes: 1. The metric's response to improvements in classifier quality (its signal), and 2. The metric's sampling variability (its noise). The paper defines resolving power generically as a metric's sampling uncertainty scaled by its signal. The primary application of resolving power is to assess threshold-free evaluation metrics, such as the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation study compares the AUROC and the AUPRC in a variety of contexts. It finds that the AUROC generally has greater resolving power, but that the AUPRC is better when searching among high-quality classifiers applied to low prevalence outcomes. The paper concludes by proposing an empirical method to estimate resolving power that can be applied to any dataset and any initial classification model.

Neural Networks · 統計量 · Networking · 泛化誤差 · Learning ·

2024 年 2 月 29 日

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin,Antoine Maillard,Jean Barbier,Florent Krzakala,Nicolas Macris,Lenka Zdeborová

from arxiv, 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap.