亚洲精品无码国产爽快A片百度,一级A婬片试看28分钟,露脸公妇仑乱在线观看日本,欧洲一卡2卡三卡4卡免费视频

Datasets with extreme observations and/or heavy-tailed error distributions are commonly encountered and should be analyzed with careful consideration of these features from a statistical perspective. Small deviations from an assumed model, such as the presence of outliers, can cause classical regression procedures to break down, potentially leading to unreliable inferences. Other distributional deviations, such as heteroscedasticity, can be handled by going beyond the mean and modelling the scale parameter in terms of covariates. We propose a method that accounts for heavy tails and heteroscedasticity through the use of a generalized normal distribution (GND). The GND contains a kurtosis-characterizing shape parameter that moves the model smoothly between the normal distribution and the heavier-tailed Laplace distribution - thus covering both classical and robust regression. A key component of statistical inference is determining the set of covariates that influence the response variable. While correctly accounting for kurtosis and heteroscedasticity is crucial to this endeavour, a procedure for variable selection is still required. For this purpose, we use a novel penalized estimation procedure that avoids the typical computationally demanding grid search for tuning parameters. This is particularly valuable in the distributional regression setting where the location and scale parameters depend on covariates, since the standard approach would have multiple tuning parameters (one for each distributional parameter). We achieve this by using a "smooth information criterion" that can be optimized directly, where the tuning parameters are fixed at $\log(n)$ in the BIC case.

相關內容

穩健性

關注 3

相似度 · 哈希學習 · 無監督 · 代碼 · 貝塔分布 ·

2023 年 2 月 15 日

Unsupervised Hashing via Similarity Distribution Calibration

Kam Woh Ng,Xiatian Zhu,Jiun Tian Hoe,Chee Seng Chan,Tianyu Zhang,Yi-Zhe Song,Tao Xiang

Existing unsupervised hashing methods typically adopt a feature similarity preservation paradigm. As a result, they overlook the intrinsic similarity capacity discrepancy between the continuous feature and discrete hash code spaces. Specifically, since the feature similarity distribution is intrinsically biased (e.g., moderately positive similarity scores on negative pairs), the hash code similarities of positive and negative pairs often become inseparable (i.e., the similarity collapse problem). To solve this problem, in this paper a novel Similarity Distribution Calibration (SDC) method is introduced. Instead of matching individual pairwise similarity scores, SDC aligns the hash code similarity distribution towards a calibration distribution (e.g., beta distribution) with sufficient spread across the entire similarity capacity/range, to alleviate the similarity collapse problem. Extensive experiments show that our SDC outperforms the state-of-the-art alternatives on both coarse category-level and instance-level image retrieval tasks, often by a large margin. Code is available at //github.com/kamwoh/sdc.

線性的 · 服務器 · 閾值 · MoDELS · CASES ·

2023 年 2 月 15 日

General Framework for Linear Secure Distributed Matrix Multiplication with Byzantine Servers

Okko Makkonen,Camilla Hollanti

from arxiv, 27 pages, 2 figures. Version 2 appeared at ITW 2022. Versions 3 and 4 correspond to an extended journal submission

In this paper, a general framework for linear secure distributed matrix multiplication (SDMM) is introduced. The model allows for a neat treatment of straggling and Byzantine servers via a star product interpretation as well as simplified security proofs. Known properties of star products also immediately yield a lower bound for the recovery threshold as well as an upper bound for the number of colluding workers the system can tolerate. Another bound on the recovery threshold is given by the decodability condition, which generalizes a bound for GASP codes. The framework produces many of the known SDMM schemes as special cases, thereby providing unification for the previous literature on the topic. Furthermore, error behavior specific to SDMM is discussed and interleaved codes are proposed as a suitable means for efficient error correction in the proposed model. Analysis of the error correction capability under natural assumptions about the error distribution is also provided, largely based on well-known results on interleaved codes. Error detection and other error distributions are also discussed.

知識 (knowledge) · MoDELS · 蒸餾 · 集成 · 集成學習 ·

2023 年 2 月 14 日

Multi-teacher knowledge distillation as an effective method for compressing ensembles of neural networks

Konrad Zuchniak

from arxiv, Doctoral dissertation in the field of computer science, machine learning. Application of knowledge distillation as aggregation of ensemble models. Along with several uses. 140 pages, 67 figures, 13 tables

Deep learning has contributed greatly to many successes in artificial intelligence in recent years. Today, it is possible to train models that have thousands of layers and hundreds of billions of parameters. Large-scale deep models have achieved great success, but the enormous computational complexity and gigantic storage requirements make it extremely difficult to implement them in real-time applications. On the other hand, the size of the dataset is still a real problem in many domains. Data are often missing, too expensive, or impossible to obtain for other reasons. Ensemble learning is partially a solution to the problem of small datasets and overfitting. However, ensemble learning in its basic version is associated with a linear increase in computational complexity. We analyzed the impact of the ensemble decision-fusion mechanism and checked various methods of sharing the decisions including voting algorithms. We used the modified knowledge distillation framework as a decision-fusion mechanism which allows in addition compressing of the entire ensemble model into a weight space of a single model. We showed that knowledge distillation can aggregate knowledge from multiple teachers in only one student model and, with the same computational complexity, obtain a better-performing model compared to a model trained in the standard manner. We have developed our own method for mimicking the responses of all teachers at the same time, simultaneously. We tested these solutions on several benchmark datasets. In the end, we presented a wide application use of the efficient multi-teacher knowledge distillation framework. In the first example, we used knowledge distillation to develop models that could automate corrosion detection on aircraft fuselage. The second example describes detection of smoke on observation cameras in order to counteract wildfires in forests.

Networking · Neural Networks · Performer · 估計/估計量 · Weight ·

2023 年 2 月 14 日

Benchmarking Bayesian neural networks and evaluation metrics for regression tasks

Brian Staber,Sébastien Da Veiga

Due to the growing adoption of deep neural networks in many fields of science and engineering, modeling and estimating their uncertainties has become of primary importance. Despite the growing literature about uncertainty quantification in deep learning, the quality of the uncertainty estimates remains an open question. In this work, we assess for the first time the performance of several approximation methods for Bayesian neural networks on regression tasks by evaluating the quality of the confidence regions with several coverage metrics. The selected algorithms are also compared in terms of predictivity, kernelized Stein discrepancy and maximum mean discrepancy with respect to a reference posterior in both weight and function space. Our findings show that (i) some algorithms have excellent predictive performance but tend to largely over or underestimate uncertainties (ii) it is possible to achieve good accuracy and a given target coverage with finely tuned hyperparameters and (iii) the promising kernel Stein discrepancy cannot be exclusively relied on to assess the posterior approximation. As a by-product of this benchmark, we also compute and visualize the similarity of all algorithms and corresponding hyperparameters: interestingly we identify a few clusters of algorithms with similar behavior in weight space, giving new insights on how they explore the posterior distribution.

MoDELS · 向量化 · 稀疏 · 可辨認的 · 狀態轉移矩陣 ·

2023 年 2 月 10 日

Bayesian Sparse Vector Autoregressive Switching Models with Application to Human Gesture Phase Segmentation

Beniamino Hadj-Amar,Jack Jewson,Marina Vannucci

from arxiv, 24 pages, 9 figures

We propose a sparse vector autoregressive (VAR) hidden semi-Markov model (HSMM) for modeling temporal and contemporaneous (e.g. spatial) dependencies in multivariate nonstationary time series. The HSMM's generic state distribution is embedded in a special transition matrix structure, facilitating efficient likelihood evaluations and arbitrary approximation accuracy. To promote sparsity of the VAR coefficients, we deploy an $l_1$-ball projection prior, which combines differentiability with a positive probability of obtaining exact zeros, achieving variable selection within each switching state. This also facilitates posterior estimation via Hamiltonian Monte Carlo (HMC). We further place non-local priors on the parameters of the HSMM dwell distribution improving the ability of Bayesian model selection to distinguish whether the data is better supported by the simpler hidden Markov model (HMM), or the more flexible HSMM. Our proposed methodology is illustrated via an application to human gesture phase segmentation based on sensor data, where we successfully identify and characterize the periods of rest and active gesturing, as well as the dynamical patterns involved in the gesture movements associated with each of these states.

Performer · 近似 · 泛函 · Learning · dynamic programming ·

2023 年 2 月 10 日

The Impact of Data Distribution on Q-learning with Function Approximation

Pedro P. Santos,Diogo S. Carvalho,Alberto Sardinha,Francisco S. Melo

We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results. We start by reviewing theoretical bounds on the performance of approximate dynamic programming algorithms. We then introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation, both online and offline. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. According to our results: (i) high entropy data distributions are well-suited for learning in an offline manner; and (ii) a certain degree of data diversity (data coverage) and data quality (closeness to optimal policy) are jointly desirable for offline learning.

估計/估計量 · 極大似然 · 最大似然估計 · 近似 · 似然 ·

2023 年 2 月 10 日

Estimation of parameters of the logistic exponential distribution under progressive type-I hybrid censored sample

Subhankar Dutta,Suchandan Kayal

The paper addresses the problem of estimation of the model parameters of the logistic exponential distribution based on progressive type-I hybrid censored sample. The maximum likelihood estimates are obtained and computed numerically using Newton-Raphson method. Further, the Bayes estimates are derived under squared error, LINEX and generalized entropy loss functions. Two types (independent and bivariate) of prior distributions are considered for the purpose of Bayesian estimation. It is seen that the Bayes estimates are not of explicit forms.Thus, Lindley's approximation technique is employed to get approximate Bayes estimates. Interval estimates of the parameters based on normal approximate of the maximum likelihood estimates and normal approximation of the log-transformed maximum likelihood estimates are constructed. The highest posterior density credible intervals are obtained by using the importance sampling method. Furthermore, numerical computations are reported to review some of the results obtained in the paper. A real life dataset is considered for the purpose of illustrations.

近似 · MoDELS · 線性的 · Extensibility · Markov ·

2023 年 2 月 10 日

Bayesian non-conjugate regression via variational belief updating

Cristian Castiglione,Mauro Bernardi

from arxiv, 51 pages, 13 figures, 3 tables

We present an efficient semiparametric variational method to approximate the Gibbs posterior distribution of Bayesian regression models, which predict the data through a linear combination of the available covariates. Remarkable cases are generalized linear mixed models, support vector machines, quantile and expectile regression. The variational optimization algorithm we propose only involves the calculation of univariate numerical integrals, when no analytic solutions are available. Neither differentiability, nor conjugacy, nor elaborate data-augmentation strategies are required. Several generalizations of the proposed approach are discussed in order to account for additive models, shrinkage priors, dynamic and spatial models, providing a unifying framework for statistical learning that cover a wide range of applications. The properties of our semiparametric variational approximation are then assessed through a theoretical analysis and an extensive simulation study, in which we compare our proposal with Markov chain Monte Carlo, conjugate mean field variational Bayes and Laplace approximation in terms of signal reconstruction, posterior approximation accuracy and execution time. A real data example is then presented through a probabilistic load forecasting application on the US power load consumption data.

Prompt · 學成 · Extensibility · 替代損失 · 講稿 ·

2022 年 5 月 6 日

Prompt Distribution Learning

Yuning Lu,Jianzhuang Liu,Yonggang Zhang,Yajing Liu,Xinmei Tian

from arxiv, Accepted by CVPR 2022

We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.

相似度 · INFORMS · 估計/估計量 · Extensibility · 無監督 ·

2021 年 3 月 10 日

SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance

Fu-Zhao Ou,Xingyu Chen,Ruixin Zhang,Yuge Huang,Shaoxin Li,Jilin Li,Yong Li,Liujuan Cao,Yuan-Gen Wang

In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.