99视频在线播放喷射_国产精品亚洲四区在线观看_2020久久天天操夜夜操狠狠操_国产精品福利片免费看_欧洲尺码日本尺码专线图片_国产精品激情综合久久_高潮喷水在线观看

Group number selection is a key question for group panel data modelling. In this work, we develop a cross validation method to tackle this problem. Specifically, we split the panel data into a training dataset and a testing dataset on the time span. We first use the training dataset to estimate the parameters and group memberships. Then we apply the fitted model to the testing dataset and then the group number is estimated by minimizing certain loss function values on the testing dataset. We design the loss functions for panel data models either with or without fixed effects. The proposed method has two advantages. First, the method is totally data-driven thus no further tuning parameters are involved. Second, the method can be flexibly applied to a wide range of panel data models. Theoretically, we establish the estimation consistency by taking advantage of the optimization property of the estimation algorithm. Experiments on a variety of synthetic and empirical datasets are carried out to further illustrate the advantages of the proposed method.

相關內容

估計/估計量(liang)

關注 3

控制器 · FAST · state-of-the-art · 預測器/決策函數 · CC ·

2022 年 10 月 23 日

The Terminating-Random Experiments Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control

Jasin Machkour,Michael Muma,Daniel P. Palomar

from arxiv, 32 pages, 24 figures, 2 tables, R packages 'TRexSelector' and 'tlars' on CRAN

We propose the Terminating-Random Experiments (T-Rex) selector, a fast variable selection method for high-dimensional data. The T-Rex selector controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations confirm that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the dummies can be sampled from any univariate probability distribution with finite expectation and variance. The computational complexity of the proposed method is linear in the number of variables. The T-Rex selector outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its sequential computation time is more than two orders of magnitude lower than that of the strongest benchmark methods. The open source R package TRexSelector containing the implementation of the T-Rex selector is available on CRAN.

風險函數 · 主動學習 · MoDELS · Learning · 設計 ·

2022 年 10 月 21 日

Targeted active learning for probabilistic models

Christopher Tosh,Mauricio Tec,Wesley Tansey

A fundamental task in science is to design experiments that yield valuable insights about the system under study. Mathematically, these insights can be represented as a utility or risk function that shapes the value of conducting each experiment. We present PDBAL, a targeted active learning method that adaptively designs experiments to maximize scientific utility. PDBAL takes a user-specified risk function and combines it with a probabilistic model of the experimental outcomes to choose designs that rapidly converge on a high-utility model. We prove theoretical bounds on the label complexity of PDBAL and provide fast closed-form solutions for designing experiments with common exponential family likelihoods. In simulation studies, PDBAL consistently outperforms standard untargeted approaches that focus on maximizing expected information gain over the design space. Finally, we demonstrate the scientific potential of PDBAL through a study on a large cancer drug screen dataset where PDBAL quickly recovers the most efficacious drugs with a small fraction of the total number of experiments.

坐標下降 · 經驗風險 · 經驗風險最小化 · 貪心 · Learning ·

2022 年 10 月 21 日

High-Dimensional Private Empirical Risk Minimization by Greedy Coordinate Descent

Paul Mangold,Aurélien Bellet,Joseph Salmon,Marc Tommasi

from arxiv, 27 pages, 4 figures

In this paper, we study differentially private empirical risk minimization (DP-ERM). It has been shown that the worst-case utility of DP-ERM reduces polynomially as the dimension increases. This is a major obstacle to privately learning large machine learning models. In high dimension, it is common for some model's parameters to carry more information than others. To exploit this, we propose a differentially private greedy coordinate descent (DP-GCD) algorithm. At each iteration, DP-GCD privately performs a coordinate-wise gradient step along the gradients' (approximately) greatest entry. We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems by naturally exploiting their structural properties (such as quasi-sparse solutions). We illustrate this behavior numerically, both on synthetic and real datasets.

估計/估計量 · 推斷 · 似然 · 可辨認的 · INFORMS ·

2022 年 10 月 21 日

Efficient identification of informative features in simulation-based inference

Jonas Beck,Michael Deistler,Yves Bernaerts,Jakob Macke,Philipp Berens

Simulation-based Bayesian inference (SBI) can be used to estimate the parameters of complex mechanistic models given observed model outputs without requiring access to explicit likelihood evaluations. A prime example for the application of SBI in neuroscience involves estimating the parameters governing the response dynamics of Hodgkin-Huxley (HH) models from electrophysiological measurements, by inferring a posterior over the parameters that is consistent with a set of observations. To this end, many SBI methods employ a set of summary statistics or scientifically interpretable features to estimate a surrogate likelihood or posterior. However, currently, there is no way to identify how much each summary statistic or feature contributes to reducing posterior uncertainty. To address this challenge, one could simply compare the posteriors with and without a given feature included in the inference process. However, for large or nested feature sets, this would necessitate repeatedly estimating the posterior, which is computationally expensive or even prohibitive. Here, we provide a more efficient approach based on the SBI method neural likelihood estimation (NLE): We show that one can marginalize the trained surrogate likelihood post-hoc before inferring the posterior to assess the contribution of a feature. We demonstrate the usefulness of our method by identifying the most important features for inferring parameters of an example HH neuron model. Beyond neuroscience, our method is generally applicable to SBI workflows that rely on data features for inference used in other scientific fields.

狀態空間 · 近似 · MoDELS · 估計/估計量 · 確切的 ·

2022 年 10 月 20 日

Efficient variational approximations for state space models

Rubén Loaiza-Maya,Didier Nibbering

Variational Bayes methods are a scalable estimation approach for many complex state space models. However, existing methods exhibit a trade-off between accurate estimation and computational efficiency. This paper proposes a variational approximation that mitigates this trade-off. This approximation is based on importance densities that have been proposed in the context of efficient importance sampling. By directly conditioning on the observed data, the proposed method produces an accurate approximation to the exact posterior distribution. Because the steps required for its calibration are computationally efficient, the approach is faster than existing variational Bayes methods. The proposed method can be applied to any state space model that has a closed-form measurement density function and a state transition distribution that belongs to the exponential family of distributions. We illustrate the method in numerical experiments with stochastic volatility models and a macroeconomic empirical application using a high-dimensional state space model.

線性回歸 · 線性的 · MoDELS · 貪心 · 前向 ·

2022 年 10 月 20 日

Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

Yong-Shiuan Lee

from arxiv, 34 pages, 9 figures

Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or bootstrapped MI data may not be worthy in terms of computation cost. To fast identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on MI data. The proposed methods proceed iteratively, which starts from finding the active variables based on the complete case subset and then expand the working data matrix with both the number of active variables and available observations. A comprehensive simulation study shows the selection accuracy in different aspects and computational efficiency of the proposed methods. Two real-life examples illustrate the strength of the proposed methods.

分解的 · MoDELS · Analysis · 可辨認的 · 因子負荷量 ·

2022 年 10 月 19 日

Constrained Factor Models for High-Dimensional Matrix-Variate Time Series

Elynn Y. Chen,Ruey S. Tsay,Rong Chen

High-dimensional matrix-variate time series data are becoming widely available in many scientific fields, such as economics, biology, and meteorology. To achieve significant dimension reduction while preserving the intrinsic matrix structure and temporal dynamics in such data, Wang et al. (2017) proposed a matrix factor model that is shown to provide effective analysis. In this paper, we establish a general framework for incorporating domain or prior knowledge in the matrix factor model through linear constraints. The proposed framework is shown to be useful in achieving parsimonious parameterization, facilitating interpretation of the latent matrix factor, and identifying specific factors of interest. Fully utilizing the prior-knowledge-induced constraints results in more efficient and accurate modeling, inference, dimension reduction as well as a clear and better interpretation of the results. In this paper, constrained, multi-term, and partially constrained factor models for matrix-variate time series are developed, with efficient estimation procedures and their asymptotic properties. We show that the convergence rates of the constrained factor loading matrices are much faster than those of the conventional matrix factor analysis under many situations. Simulation studies are carried out to demonstrate the finite-sample performance of the proposed method and its associated asymptotic properties. We illustrate the proposed model with three applications, where the constrained matrix-factor models outperform their unconstrained counterparts in the power of variance explanation under the out-of-sample 10-fold cross-validation setting.

估計/估計量 · 分解的 · 推斷 · 統計量 · MoDELS ·

2022 年 10 月 19 日

Statistical Inference for High-Dimensional Matrix-Variate Factor Model

Elynn Y. Chen,Jianqing Fan

This paper considers the estimation and inference of the low-rank components in high-dimensional matrix-variate factor models, where each dimension of the matrix-variates ($p \times q$) is comparable to or greater than the number of observations ($T$). We propose an estimation method called $\alpha$-PCA that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-parameter $\alpha$. We develop an inferential theory, establishing consistency, the rate of convergence, and the limiting distributions, under general conditions that allow for correlations across time, rows, or columns of the noise. We show both theoretical and empirical methods of choosing the best $\alpha$, depending on the use-case criteria. Simulation results demonstrate the adequacy of the asymptotic results in approximating the finite sample properties. The $\alpha$-PCA compares favorably with the existing ones. Finally, we illustrate its applications with a real numeric data set and two real image data sets. In all applications, the proposed estimation procedure outperforms previous methods in the power of variance explanation using out-of-sample 10-fold cross-validation.

MoDELS · 可辨認的 · 稀疏 · Learning · 優化器 ·

2022 年 10 月 18 日

Encoding nonlinear and unsteady aerodynamics of limit cycle oscillations using nonlinear sparse Bayesian learning

Rimple Sandhu,Brandon Robinson,Mohammad Khalil,Chris L. Pettit,Dominique Poirel,Abhijit Sarkar

This paper investigates the applicability of a recently-proposed nonlinear sparse Bayesian learning (NSBL) algorithm to identify and estimate the complex aerodynamics of limit cycle oscillations. NSBL provides a semi-analytical framework for determining the data-optimal sparse model nested within a (potentially) over-parameterized model. This is particularly relevant to nonlinear dynamical systems where modelling approaches involve the use of physics-based and data-driven components. In such cases, the data-driven components, where analytical descriptions of the physical processes are not readily available, are often prone to overfitting, meaning that the empirical aspects of these models will often involve the calibration of an unnecessarily large number of parameters. While it may be possible to fit the data well, this can become an issue when using these models for predictions in regimes that are different from those where the data was recorded. In view of this, it is desirable to not only calibrate the model parameters, but also to identify the optimal compromise between data-fit and model complexity. In this paper, this is achieved for an aeroelastic system where the structural dynamics are well-known and described by a differential equation model, coupled with a semi-empirical aerodynamic model for laminar separation flutter resulting in low-amplitude limit cycle oscillations. For the purpose of illustrating the benefit of the algorithm, in this paper, we use synthetic data to demonstrate the ability of the algorithm to correctly identify the optimal model and model parameters, given a known data-generating model. The synthetic data are generated from a forward simulation of a known differential equation model with parameters selected so as to mimic the dynamics observed in wind-tunnel experiments.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.