亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic -- often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework of Barber and Janson (2022) offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this work, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an $\ell_1$ penalty or a shape constraint).

相關內容

極大似然估計方法(Maximum Likelihood Estimate,MLE)也稱為最大概似估計或最大似然估計,是求估計的另一種方法,最大概似是1821年首先由德國數學家高斯(C. F. Gauss)提出,但是這個方法通常被歸功于英國的統計學家羅納德·費希爾(R. A. Fisher) 它是建立在極大似然原理的基礎上的一個統計方法,極大似然原理的直觀想法是,一個隨機試驗如有若干個可能的結果A,B,C,... ,若在一次試驗中,結果A出現了,那么可以認為實驗條件對A的出現有利,也即出現的概率P(A)較大。極大似然原理的直觀想法我們用下面例子說明。設甲箱中有99個白球,1個黑球;乙箱中有1個白球.99個黑球。現隨機取出一箱,再從抽取的一箱中隨機取出一球,結果是黑球,這一黑球從乙箱抽取的概率比從甲箱抽取的概率大得多,這時我們自然更多地相信這個黑球是取自乙箱的。一般說來,事件A發生的概率與某一未知參數theta有關, theta取值不同,則事件A發生的概率P(A/theta)也不同,當我們在一次試驗中事件A發生了,則認為此時的theta值應是t的一切可能取值中使P(A/theta)達到最大的那一個,極大似然估計法就是要選取這樣的t值作為參數t的估計值,使所選取的樣本在被選的總體中出現的可能性為最大。

In this work, we propose two criteria for linear codes obtained from the Plotkin sum construction being symplectic self-orthogonal (SO) and linear complementary dual (LCD). As specific constructions, several classes of symplectic SO codes with good parameters including symplectic maximum distance separable codes are derived via $\ell$-intersection pairs of linear codes and generalized Reed-Muller codes. Also symplectic LCD codes are constructed from general linear codes. Furthermore, we obtain some binary symplectic LCD codes, which are equivalent to quaternary trace Hermitian additive complementary dual codes that outperform best-known quaternary Hermitian LCD codes reported in the literature. In addition, we prove that symplectic SO and LCD codes obtained in these ways are asymptotically good.

In this paper, we derive a kinetic description of swarming particle dynamics in an interacting multi-agent system featuring emerging leaders and followers. Agents are classically characterized by their position and velocity plus a continuous parameter quantifying their degree of leadership. The microscopic processes ruling the change of velocity and degree of leadership are independent, non-conservative and non-local in the physical space, so as to account for long-range interactions. Out of the kinetic description, we obtain then a macroscopic model under a hydrodynamic limit reminiscent of that used to tackle the hydrodynamics of weakly dissipative granular gases, thus relying in particular on a regime of small non-conservative and short-range interactions. Numerical simulations in one- and two-dimensional domains show that the limiting macroscopic model is consistent with the original particle dynamics and furthermore can reproduce classical emerging patterns typically observed in swarms.

Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible, as measured by a broad class of norms. We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. Code is available at //github.com/EleutherAI/concept-erasure.

Linear regression models have been extensively considered in the literature. However, in some practical applications they may not be appropriate all over the range of the covariate. In this paper, a more flexible model is introduced by considering a regression model $Y=r(X)+\varepsilon$ where the regression function $r(\cdot)$ is assumed to be linear for large values in the domain of the predictor variable $X$. More precisely, we assume that $r(x)=\alpha_0+\beta_0 x$ for $x> u_0$, where the value $u_0$ is identified as the smallest value satisfying such a property. A penalized procedure is introduced to estimate the threshold $u_0$. The considered proposal focusses on a semiparametric approach since no parametric model is assumed for the regression function for values smaller than $u_0$. Consistency properties of both the threshold estimator and the estimators of $(\alpha_0,\beta_0)$ are derived, under mild assumptions. Through a numerical study, the small sample properties of the proposed procedure and the importance of introducing a penalization are investigated. The analysis of a real data set allows us to demonstrate the usefulness of the penalized estimators.

This work presents a comparative study to numerically compute impulse approximate controls for parabolic equations with various boundary conditions. Theoretical controllability results have been recently investigated using a logarithmic convexity estimate at a single time based on a Carleman commutator approach. We propose a numerical algorithm for computing the impulse controls with minimal $L^2$-norms by adapting a penalized Hilbert Uniqueness Method (HUM) combined with a Conjugate Gradient (CG) method. We consider static boundary conditions (Dirichlet and Neumann) and dynamic boundary conditions. Some numerical experiments based on our developed algorithm are given to validate and compare the theoretical impulse controllability results.

Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.

Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. {While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function.} The results hold for arbitrary exchangeable scores, including {\it adaptive} ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.

Multiple testing is an important research direction that has gained major attention in recent years. Currently, most multiple testing procedures are designed with p-values or Local false discovery rate (Lfdr) statistics. However, p-values obtained by applying probability integral transform to some well-known test statistics often do not incorporate information from the alternatives, resulting in suboptimal procedures. On the other hand, Lfdr based procedures can be asymptotically optimal but their guarantee on false discovery rate (FDR) control relies on consistent estimation of Lfdr, which is often difficult in practice especially when the incorporation of side information is desirable. In this article, we propose a novel and flexibly constructed class of statistics, called rho-values, which combines the merits of both p-values and Lfdr while enjoys superiorities over methods based on these two types of statistics. Specifically, it unifies these two frameworks and operates in two steps, ranking and thresholding. The ranking produced by rho-values mimics that produced by Lfdr statistics, and the strategy for choosing the threshold is similar to that of p-value based procedures. Therefore, the proposed framework guarantees FDR control under weak assumptions; it maintains the integrity of the structural information encoded by the summary statistics and the auxiliary covariates and hence can be asymptotically optimal. We demonstrate the efficacy of the new framework through extensive simulations and two data applications.

The main goal of this work is to improve the efficiency of training binary neural networks, which are low latency and low energy networks. The main contribution of this work is the proposal of two solutions comprised of topology changes and strategy training that allow the network to achieve near the state-of-the-art performance and efficient training. The time required for training and the memory required in the process are two factors that contribute to efficient training.

The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.

北京阿比特科技有限公司