国产乱理伦片A级在线看-一区二区三区性色福利在线视频

We study the dynamics of matrix-valued time series with observed network structures by proposing a matrix network autoregression model with row and column networks of the subjects. We incorporate covariate information and a low rank intercept matrix. We allow incomplete observations in the matrices and the missing mechanism can be covariate dependent. To estimate the model, a two-step estimation procedure is proposed. The first step aims to estimate the network autoregression coefficients, and the second step aims to estimate the regression parameters, which are matrices themselves. Theoretically, we first separately establish the asymptotic properties of the autoregression coefficients and the error bounds of the regression parameters. Subsequently, a bias reduction procedure is proposed to reduce the asymptotic bias and the theoretical property of the debiased estimator is studied. Lastly, we illustrate the usefulness of the proposed method through a number of numerical studies and an analysis of a Yelp data set.

相關內容

估計/估計量

關注 0

樣本 · 分析 · 一致 · 有效性 · 演示 ·

2023 年 3 月 28 日

Exploring the validity of the complete case analysis for regression models with a right-censored covariate

Marissa C. Ashner,Tanya P. Garcia

Despite its drawbacks, the complete case analysis is commonly used in regression models with missing covariates. Understanding when implementing complete cases will lead to consistent parameter estimation is vital before use. Here, our aim is to demonstrate when a complete case analysis is appropriate for a nuanced type of missing covariate, the randomly right-censored covariate. Across the censored covariate literature, different assumptions are made to ensure a complete case analysis produces a consistent estimator, which leads to confusion in practice. We make several contributions to dispel this confusion. First, we summarize the language surrounding the assumptions that lead to a consistent complete case estimator. Then, we show a unidirectional hierarchical relationship between these assumptions, which leads us to one sufficient assumption to consider before using a complete case analysis. Lastly, we conduct a simulation study to illustrate the performance of a complete case analysis with a right-censored covariate under different censoring mechanism assumptions, and we demonstrate its use with a Huntington disease data example.

稀疏 · 字典學習 · 多項式時間 · 過完備 · 譜方法 ·

2023 年 3 月 27 日

Dictionary Learning for the Almost-Linear Sparsity Regime

Alexei Novikov,Stephen White

Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.

噪聲 · 廣義 · 線性回歸 · 異方差 · 賭博機 ·

2023 年 3 月 27 日

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

Heyang Zhao,Dongruo Zhou,Jiafan He,Quanquan Gu

from arxiv, 27 pages, 3 figures. In this updated version, we have changed the paper title, added new theoretical results on the FTRL algorithm and mainly focused on stochastic online regression. Refer to arXiv:2202.13603v1 for the previous version, which contains more results on heteroscedastic nonlinear bandits

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise. We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise. More specifically, for $\sigma$-sub-Gaussian label noise, our analysis provides a regret upper bound of $O(\sigma^2 d \log T) + o(\log T)$, where $d$ is the dimension of the input vector, $T$ is the total number of rounds. We also prove a $\Omega(\sigma^2d\log(T/d))$ lower bound for stochastic online linear regression, which indicates that our upper bound is nearly optimal. In addition, we extend our analysis to a more refined Bernstein noise condition. As an application, we study generalized linear bandits with heteroscedastic noise and propose an algorithm based on FTRL to achieve the first variance-aware regret bound.

自回歸模型 · 高維 · 統計量 · 高功率 · 推斷 ·

2023 年 3 月 27 日

Discovering the Network Granger Causality in Large Vector Autoregressive Models

Yoshimasa Uematsu,Takashi Yamagata

This paper proposes novel inferential procedures for the network Granger causality in high-dimensional vector autoregressive models. In particular, we offer two multiple testing procedures designed to control discovered networks' false discovery rate (FDR). The first procedure is based on the limiting normal distribution of the $t$-statistics constructed by the debiased lasso estimator. The second procedure is based on the bootstrap distributions of the $t$-statistics made by imposing the null hypotheses. Their theoretical properties, including FDR control and power guarantee, are investigated. The finite sample evidence suggests that both procedures can successfully control the FDR while maintaining high power. Finally, the proposed methods are applied to discovering the network Granger causality in a large number of macroeconomic variables and regional house prices in the UK.

算法 · 非自適應 · 自適應 · 表示 · 聲明 ·

2023 年 3 月 26 日

Representation with Incomplete Votes

Daniel Halpern,Gregory Kehne,Ariel D. Procaccia,Jamie Tucker-Foltz,Manuel Wüthrich

Platforms for online civic participation rely heavily on methods for condensing thousands of comments into a relevant handful, based on whether participants agree or disagree with them. These methods should guarantee fair representation of the participants, as their outcomes may affect the health of the conversation and inform impactful downstream decisions. To that end, we draw on the literature on approval-based committee elections. Our setting is novel in that the approval votes are incomplete since participants will typically not vote on all comments. We prove that this complication renders non-adaptive algorithms impractical in terms of the amount of information they must gather. Therefore, we develop an adaptive algorithm that uses information more efficiently by presenting incoming participants with statements that appear promising based on votes by previous participants. We prove that this method satisfies commonly used notions of fair representation, even when participants only vote on a small fraction of comments. Finally, an empirical evaluation using real data shows that the proposed algorithm provides representative outcomes in practice.

歸一化 · 不變性 · 不變 · 移不變性 · 平移不變性 ·

2023 年 3 月 24 日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Tymon S?oczyński,S. Derya Uysal,Jeffrey M. Wooldridge

In this paper we study the finite sample and asymptotic properties of various weighting estimators of the local average treatment effect (LATE), several of which are based on Abadie's (2003) kappa theorem. Our framework presumes a binary treatment and a binary instrument, which may only be valid after conditioning on additional covariates. We argue that one of the Abadie estimators, which is weight normalized, is preferable in many contexts. Several other estimators, which are unnormalized, do not generally satisfy the properties of scale invariance with respect to the natural logarithm and translation invariance, thereby exhibiting sensitivity to the units of measurement when estimating the LATE in logs and the centering of the outcome variable more generally. On the other hand, when noncompliance is one-sided, certain unnormalized estimators have the advantage of being based on a denominator that is bounded away from zero. To reconcile these findings, we demonstrate that when the instrument propensity score is estimated using an appropriate covariate balancing approach, the resulting normalized estimator also shares this advantage. We use a simulation study and three empirical applications to illustrate our findings. In two cases, the unnormalized estimates are clearly unreasonable, with "incorrect" signs, magnitudes, or both.

激活函數 · 最優 · 測試誤差 · 靈敏度 · 飽和 ·

2023 年 3 月 24 日

Optimal Activation Functions for the Random Features Regression Model

Jianxin Wang,José Bento

The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.

離散化 · 離散 · 乘性噪聲 · 高階 · Lipschitz ·

2023 年 3 月 24 日

Higher order time discretization method for a class of semilinear stochastic partial differential equations with multiplicative noise

Yukun Li,Liet Vo,Guanqian Wang

from arxiv, 28 pages, 8 figures. arXiv admin note: text overlap with arXiv:1811.05028

In this paper, we consider a new approach for semi-discretization in time and spatial discretization of a class of semi-linear stochastic partial differential equations (SPDEs) with multiplicative noise. The drift term of the SPDEs is only assumed to satisfy a one-sided Lipschitz condition and the diffusion term is assumed to be globally Lipschitz continuous. Our new strategy for time discretization is based on the Milstein method from stochastic differential equations. We use the energy method for its error analysis and show a strong convergence order of nearly $1$ for the approximate solution. The proof is based on new H\"older continuity estimates of the SPDE solution and the nonlinear term. For the general polynomial-type drift term, there are difficulties in deriving even the stability of the numerical solutions. We propose an interpolation-based finite element method for spatial discretization to overcome the difficulties. Then we obtain $H^1$ stability, higher moment $H^1$ stability, $L^2$ stability, and higher moment $L^2$ stability results using numerical and stochastic techniques. The nearly optimal convergence orders in time and space are hence obtained by coupling all previous results. Numerical experiments are presented to implement the proposed numerical scheme and to validate the theoretical results.

信息聚類 · 多狀態 · 狀態模型 · ICS · 康復 ·

2023 年 3 月 23 日

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

Samuel Anyaso-Samuel,Somnath Datta

from arxiv, 22 pages, 4 figures, 4 tables

Informative cluster size (ICS) arises in situations with clustered data where a latent relationship exists between the number of participants in a cluster and the outcome measures. Although this phenomenon has been sporadically reported in statistical literature for nearly two decades now, further exploration is needed in certain statistical methodologies to avoid potentially misleading inferences. For inference about population quantities without covariates, inverse cluster size reweightings are often employed to adjust for ICS. Further, to study the effect of covariates on disease progression described by a multistate model, the pseudo-value regression technique has gained popularity in time-to-event data analysis. We seek to answer the question: "How to apply pseudo-value regression to clustered time-to-event data when cluster size is informative?" ICS adjustment by the reweighting method can be performed in two steps; estimation of marginal functions of the multistate model and fitting the estimating equations based on pseudo-value responses, leading to four possible strategies. We present theoretical arguments and thorough simulation experiments to ascertain the correct strategy for adjusting for ICS. A further extension of our methodology is implemented to include informativeness induced by the intra-cluster group size. We demonstrate the methods in two real-world applications: (i) to determine predictors of tooth survival in a periodontal study, and (ii) to identify indicators of ambulatory recovery in spinal cord injury patients who participated in locomotor-training rehabilitation.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.