精品自在线观看影片天天看,久久国产乱子伦精品噜噜

This paper proposes a new test for the comparison of conditional quantile curves when the outcome of interest, typically a duration, is subject to right censoring. The test can be applied both in the case of two independent samples and for paired data, and can be used for the comparison of quantiles at a fixed quantile level, a finite set of levels or a range of quantile levels. The asymptotic distribution of the proposed test statistics is obtained both under the null hypothesis and under local alternatives. We describe a bootstrap procedure in order to approximate the critical values, and present the results of a simulation study, in which the performance of the tests for small and moderate sample sizes is studied and compared with the behavior of alternative tests. Finally, we apply the proposed tests on a data set concerning diabetic retinopathy.

相關內容

情景(jing)

關注 1

最優 · 風險度量 · 最優性 · 核估計 · 一致 ·

2023 年 3 月 29 日

Estimation of extreme $L^1$-multivariate expectiles with functional covariates

Elena Di Bernardino,Thomas Lalo?,Cambyse Pakzad

from arxiv, 21 pages

The present article is devoted to the semi-parametric estimation of multivariate expectiles for extreme levels. The considered multivariate risk measures also include the possible conditioning with respect to a functional covariate, belonging to an infinite-dimensional space. By using the first order optimality condition, we interpret these expectiles as solutions of a multidimensional nonlinear optimum problem. Then the inference is based on a minimization algorithm of gradient descent type, coupled with consistent kernel estimations of our key statistical quantities such as conditional quantiles, conditional tail index and conditional tail dependence functions. The method is valid for equivalently heavy-tailed marginals and under a multivariate regular variation condition on the underlying unknown random vector with arbitrary dependence structure. Our main result establishes the consistency in probability of the optimum approximated solution vectors with a speed rate. This allows us to estimate the global computational cost of the whole procedure according to the data sample size.

合成數據 · 合成 · 統計量 · 假設檢驗 · 參數模型 ·

2023 年 3 月 29 日

One Step to Efficient Synthetic Data

Jordan Awan,Zhanrui Cai

from arxiv, 17 pages before appendices/references

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on synthetic data, our procedure can also be used to perform approximate hypothesis tests in the presence of intractable likelihood functions.

因子 · PCA · 主成分分析 · 損失 · 魯棒 ·

2023 年 3 月 29 日

Huber Principal Component Analysis for Large-dimensional Factor Models

Yong He,Lingxiao Li,Dong Liu,Wen-Xin Zhou

Factor models have been widely used in economics and finance. However, the heavy-tailed nature of macroeconomic and financial data is often neglected in the existing literature. To address this issue and achieve robustness, we propose an approach to estimate factor loadings and scores by minimizing the Huber loss function, which is motivated by the equivalence of conventional Principal Component Analysis (PCA) and the constrained least squares method in the factor model. We provide two algorithms that use different penalty forms. The first algorithm, which we refer to as Huber PCA, minimizes the $\ell_2$-norm-type Huber loss and performs PCA on the weighted sample covariance matrix. The second algorithm involves an element-wise type Huber loss minimization, which can be solved by an iterative Huber regression algorithm. Our study examines the theoretical minimizer of the element-wise Huber loss function and demonstrates that it has the same convergence rate as conventional PCA when the idiosyncratic errors have bounded second moments. We also derive their asymptotic distributions under mild conditions. Moreover, we suggest a consistent model selection criterion that relies on rank minimization to estimate the number of factors robustly. We showcase the benefits of Huber PCA through extensive numerical experiments and a real financial portfolio selection example. An R package named ``HDRFA" has been developed to implement the proposed robust factor analysis.

非凸 · 稀疏 · 回歸函數 · 收斂速率 · 估計誤差 ·

2023 年 3 月 29 日

Sparse Quantile Regression

Le-Yu Chen,Sokbae Lee

from arxiv, 51 pages, 3 figures, 3 tables

We consider both $\ell _{0}$-penalized and $\ell _{0}$-constrained quantile regression estimators. For the $\ell _{0}$-penalized estimator, we derive an exponential inequality on the tail probability of excess quantile prediction risk and apply it to obtain non-asymptotic upper bounds on the mean-square parameter and regression function estimation errors. We also derive analogous results for the $\ell _{0}$-constrained estimator. The resulting rates of convergence are nearly minimax-optimal and the same as those for $\ell _{1}$-penalized and non-convex penalized estimators. Further, we characterize expected Hamming loss for the $\ell _{0}$-penalized estimator. We implement the proposed procedure via mixed integer linear programming and also a more scalable first-order approximation algorithm. We illustrate the finite-sample performance of our approach in Monte Carlo experiments and its usefulness in a real data application concerning conformal prediction of infant birth weights (with $n\approx 10^{3}$ and up to $p>10^{3}$). In sum, our $\ell _{0}$-based method produces a much sparser estimator than the $\ell _{1}$-penalized and non-convex penalized approaches without compromising precision.

樣本 · 分析 · 一致 · 有效性 · 演示 ·

2023 年 3 月 28 日

Exploring the validity of the complete case analysis for regression models with a right-censored covariate

Marissa C. Ashner,Tanya P. Garcia

Despite its drawbacks, the complete case analysis is commonly used in regression models with missing covariates. Understanding when implementing complete cases will lead to consistent parameter estimation is vital before use. Here, our aim is to demonstrate when a complete case analysis is appropriate for a nuanced type of missing covariate, the randomly right-censored covariate. Across the censored covariate literature, different assumptions are made to ensure a complete case analysis produces a consistent estimator, which leads to confusion in practice. We make several contributions to dispel this confusion. First, we summarize the language surrounding the assumptions that lead to a consistent complete case estimator. Then, we show a unidirectional hierarchical relationship between these assumptions, which leads us to one sufficient assumption to consider before using a complete case analysis. Lastly, we conduct a simulation study to illustrate the performance of a complete case analysis with a right-censored covariate under different censoring mechanism assumptions, and we demonstrate its use with a Huntington disease data example.

多元數 · 多元數據 · 元數據 · 隨機變量 · 表示 ·

2023 年 3 月 27 日

Finite representation of quantile sets for multivariate data via vector linear programming

Andreas L?hne,Benjamin Wei?ing

A well-known result states that empirical quantiles for finitely distributed univariate random variables can be obtained by solving a linear program. We show in this short note that multivariate empirical quantiles can be obtained in a very similar way by solving a vector linear program. This connection provides a new approach for computing Tukey depth regions and more general cone quantile sets.

分布回歸 · 參數化 · 聯合分布 · 風險度量 · 條件風險 ·

2023 年 3 月 24 日

Bivariate Distribution Regression with Application to Insurance Data

Yunyun Wang,Tatsushi Oka,Dan Zhu

Understanding variable dependence, particularly eliciting their statistical properties given a set of covariates, provides the mathematical foundation in practical operations management such as risk analysis and decision making given observed circumstances. This article presents an estimation method for modeling the conditional joint distribution of bivariate outcomes based on the distribution regression and factorization methods. This method is considered semiparametric in that it allows for flexible modeling of both the marginal and joint distributions conditional on covariates without imposing global parametric assumptions across the entire distribution. In contrast to existing parametric approaches, our method can accommodate discrete, continuous, or mixed variables, and provides a simple yet effective way to capture distributional dependence structures between bivariate outcomes and covariates. Various simulation results confirm that our method can perform similarly or better in finite samples compared to the alternative methods. In an application to the study of a motor third-part liability insurance portfolio, the proposed method effectively estimates risk measures such as the conditional Value-at-Risks and Expexted Sortfall. This result suggests that this semiparametric approach can serve as an alternative in insurance risk management.

非線性模型 · 線性模型 · 加性模型 · 核估計 · 度量空間 ·

2023 年 3 月 24 日

Functional Regression Models with Functional Response: New Approaches and a Comparative Study

Mohammad Darbalaei,Morteza Amini,Manuel Febrero-Bande,Manuel Oviedo-de la Fuente

from arxiv, Submitted

This paper proposes three new approaches for additive functional regression models with functional responses. The first one is a reformulation of the linear regression model, and the last two are on the yet scarce case of additive nonlinear functional regression models. Both proposals are based on extensions of similar models for scalar responses. One of our nonlinear models is based on constructing a Spectral Additive Model (the word "Spectral" refers to the representation of the covariates in an $\mcal{L}_2$ basis), which is restricted (by construction) to Hilbertian spaces. The other one extends the kernel estimator, and it can be applied to general metric spaces since it is only based on distances. We include our new approaches as well as real datasets in an R package. The performances of the new proposals are compared with previous ones, which we review theoretically and practically in this paper. The simulation results show the advantages of the nonlinear proposals and the small loss of efficiency when the simulation scenario is truly linear. Finally, the supplementary material provides a visualization tool for checking the linearity of the relationship between a single covariate and the response.

信息聚類 · 多狀態 · 狀態模型 · ICS · 康復 ·

2023 年 3 月 23 日

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

Samuel Anyaso-Samuel,Somnath Datta

from arxiv, 22 pages, 4 figures, 4 tables

Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intra-cluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study, and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.