好诱人的搜子好爽免费观看_东京热加勒比中文无码_又刺激又舒服又色又爽的视频_电影免费看的网站_精品日韩AV无码一区二区中文_最新嫖妓视频在线观看_国产日韩AV免费无码一区二

Non-parametric maximum likelihood estimation encompasses a group of classic methods to estimate distribution-associated functions from potentially censored and truncated data, with extensive applications in survival analysis. These methods, including the Kaplan-Meier estimator and Turnbull's method, often result in overfitting, especially when the sample size is small. We propose an improvement to these methods by applying kernel smoothing to their raw estimates, based on a BIC-type loss function that balances the trade-off between optimizing model fit and controlling model complexity. In the context of a longitudinal study with repeated observations, we detail our proposed smoothing procedure and optimization algorithm. With extensive simulation studies over multiple realistic scenarios, we demonstrate that our smoothing-based procedure provides better overall accuracy in both survival function estimation and individual-level time-to-event prediction by reducing overfitting. Our smoothing procedure decreases the discrepancy between the estimated and true simulated survival function using interval-censored data by up to 49% compared to the raw un-smoothed estimate, with similar improvements of up to 41% and 23% in within-sample and out-of-sample prediction, respectively. Finally, we apply our method to real data on censored breast cancer diagnosis, which similarly shows improvement when compared to empirical survival estimates from uncensored data. We provide an R package, SISE, for implementing our penalized likelihood method.

相關內容

估(gu)(gu)計/估(gu)(gu)計量

關注 3

簧風琴 · Pattern Recognition · Integration · 相互獨立的 · 判別器 ·

2021 年 10 月 5 日

A new harmonium for pattern recognition in survival data

Hylke C. Donker,Harry J. M. Groen

from arxiv, 21 pages, 3 figures

Background: Survival analysis concerns the study of timeline data where the event of interest may remain unobserved (i.e., censored). Studies commonly record more than one type of event, but conventional survival techniques focus on a single event type. We set out to integrate both multiple independently censored time-to-event variables as well as missing observations. Methods: An energy-based approach is taken with a bi-partite structure between latent and visible states, commonly known as harmoniums (or restricted Boltzmann machines). Results: The present harmonium is shown, both theoretically and experimentally, to capture non-linear patterns between distinct time recordings. We illustrate on real world data that, for a single time-to-event variable, our model is on par with established methods. In addition, we demonstrate that discriminative predictions improve by leveraging an extra time-to-event variable. Conclusions: Multiple time-to-event variables can be successfully captured within the harmonium paradigm.

Boosting（一種模型訓練加速方式） · 估計/估計量 · 泛函 · 對數似然 · MoDELS ·

2021 年 10 月 4 日

Boosted nonparametric hazards with time-dependent covariates

Donald K. K. Lee,Ningyuan Chen,Hemant Ishwaran

from arxiv, 32 pages, 0 figures

Given functional data from a survival process with time-dependent covariates, we derive a smooth convex representation for its nonparametric log-likelihood functional and obtain its functional gradient. From this, we devise a generic gradient boosting procedure for estimating the hazard function nonparametrically. An illustrative implementation of the procedure using regression trees is described to show how to recover the unknown hazard. The generic estimator is consistent if the model is correctly specified; alternatively, an oracle inequality can be demonstrated for tree-based models. To avoid overfitting, boosting employs several regularization devices. One of them is step-size restriction, but the rationale for this is somewhat mysterious from the viewpoint of consistency. Our work brings some clarity to this issue by revealing that step-size restriction is a mechanism for preventing the curvature of the risk from derailing convergence.

MoDELS · 向量化 · 極大似然 · Performer · 估計/估計量 ·

2021 年 10 月 3 日

Multi-linear Tensor Autoregressive Models

Zebang Li,Han Xiao

Contemporary time series analysis has seen more and more tensor type data, from many fields. For example, stocks can be grouped according to Size, Book-to-Market ratio, and Operating Profitability, leading to a 3-way tensor observation at each month. We propose an autoregressive model for the tensor-valued time series, with autoregressive terms depending on multi-linear coefficient matrices. Comparing with the traditional approach of vectoring the tensor observations and then applying the vector autoregressive model, the tensor autoregressive model preserves the tensor structure and admits corresponding interpretations. We introduce three estimators based on projection, least squares, and maximum likelihood. Our analysis considers both fixed dimensional and high dimensional settings. For the former we establish the central limit theorems of the estimators, and for the latter we focus on the convergence rates and the model selection. The performance of the model is demonstrated by simulated and real examples.

異常點 · 泛函 · 穩健性 · Extensibility · 概率密度函數 ·

2021 年 10 月 2 日

Functional outlier detection for density-valued data with application to robustify distribution to distribution regression

Xinyi Lei,Zhicheng Chen,Hui Li

Distributional data analysis, concerned with statistical analysis and modeling for data objects consisting of random probability density functions (PDFs) in the framework of functional data analysis (FDA), has received considerable interest in recent years. However, many important aspects remain unexplored, such as outlier detection and robustness. Existing functional outlier detection methods are mainly used for ordinary functional data and usually perform poorly when applied to PDFs. To fill this gap, this study focuses on PDF-valued outlier detection, as well as its application in robust distributional regression. Similar to ordinary functional data, detecting the shape outlier masked by the "curve net" formed by the bulk of the PDFs is the major challenge in PDF-outlier detection. To this end, we propose a tree-structured transformation system for feature extraction as well as converting the shape outliers to easily detectable magnitude outliers, relevant outlier detectors are designed for the specific transformed data. A multiple detection strategy is also proposed to account for detection uncertainties and to combine different detectors to form a more reliable detection tool. Moreover, we propose a distributional-regression-based approach for detecting the abnormal associations of PDF-valued two-tuples. As a specific application, the proposed outlier detection methods are applied to robustify a distribution-to-distribution regression method, and we develop a robust estimator for the regression operator by downweighting the detected outliers. The proposed methods are validated and evaluated by extensive simulation studies or real data applications. Relevant comparative studies demonstrate the superiority of the developed outlier detection method with other competitors in distributional outlier detection.

估計/估計量 · Performer · 方差 · 有偏 · 隨機變量 ·

2021 年 10 月 1 日

Expected Validation Performance and Estimation of a Random Variable's Maximum

Jesse Dodge,Suchin Gururangan,Dallas Card,Roy Schwartz,Noah A. Smith

Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments). Where previous work analyzing such estimators focused on the bias, we also examine the variance and mean squared error (MSE). In both synthetic and realistic scenarios, we evaluate three estimators and find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias; the estimator with the smallest MSE strikes a balance between bias and variance, displaying a classic bias-variance tradeoff. We use expected validation performance to compare between different models, and analyze how frequently each estimator leads to drawing incorrect conclusions about which of two models performs best. We find that the two biased estimators lead to the fewest incorrect conclusions, which hints at the importance of minimizing variance and MSE.

Integration · 線性組合 · BASIC · 線性的 · 向量化 ·

2021 年 10 月 1 日

High order integrators obtained by linear combinations of symmetric-conjugate compositions

Fernando Casas,Alejandro Escorihuela-Tomàs

from arxiv, Accepted for publication in Applied Mathematics and Computation

A new family of methods involving complex coefficients for the numerical integration of differential equations is presented and analyzed. They are constructed as linear combinations of symmetric-conjugate compositions obtained from a basic time-symmetric integrator of order 2n (n $\ge$ 1). The new integrators are of order 2(n + k), k = 1, 2, ..., and preserve time-symmetry up to order 4n + 3 when applied to differential equations with real vector fields. If in addition the system is Hamiltonian and the basic scheme is symplectic, then they also preserve symplecticity up to order 4n + 3. We show that these integrators are well suited for a parallel implementation, thus improving their efficiency. Methods up to order 10 based on a 4th-order integrator are built and tested in comparison with other standard procedures to increase the order of a basic scheme.

層次聚類 · 近似 · 模型評估 · 簇 · entity ·

2021 年 10 月 1 日

Private Hierarchical Clustering and Efficient Approximation

Xianrui Meng,Dimitrios Papadopoulos,Alina Oprea,Nikos Triandopoulos

In collaborative learning, multiple parties contribute their datasets to jointly deduce global machine learning models for numerous predictive tasks. Despite its efficacy, this learning paradigm fails to encompass critical application domains that involve highly sensitive data, such as healthcare and security analytics, where privacy risks limit entities to individually train models using only their own datasets. In this work, we target privacy-preserving collaborative hierarchical clustering. We introduce a formal security definition that aims to achieve the balance between utility and privacy and present a two-party protocol that provably satisfies it. We then extend our protocol with: (i) an optimized version for the single-linkage clustering, and (ii) scalable approximation variants. We implement all our schemes and experimentally evaluate their performance and accuracy on synthetic and real datasets, obtaining very encouraging results. For example, end-to-end execution of our secure approximate protocol for over 1M 10-dimensional data samples requires 35sec of computation and achieves 97.09% accuracy.

似然 · 估計/估計量 · 最大似然估計 · 極大似然 · MoDELS ·

2018 年 9 月 24 日

Implicit Maximum Likelihood Estimation

Ke Li,Jitendra Malik

from arxiv, 21 pages, 4 figures. In the interest of promoting discussion, we make the reviews available at //people.eecs.berkeley.edu/~ke.li/papers/imle_reviews.pdf

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

損失函數（機器學習） · Networking · 估計/估計量 · state-of-the-art · MoDELS ·

2018 年 7 月 3 日

Viewpoint Estimation-Insights & Model

Gilad Divon,Ayellet Tal

from arxiv, 17 pages, ECCV2018 submission

This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights that should be taken into consideration when designing a CNN that solves the problem. Based on these insights, the paper proposes a network in which (i) The architecture jointly solves detection, classification, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network improves the state-of-the-art results for this problem by 9.8%.

估計/估計量 · 話題模型 · 話題 · 優化器 · FAST ·

2018 年 6 月 12 日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Xin Bing,Florentina Bunea,Marten Wegkamp

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.