亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We propose a distributional outcome regression (DOR) with scalar and distributional predictors. Distributional observations are represented via quantile functions and the dependence on predictors is modelled via functional regression coefficients. DOR expands existing literature with three key contributions: handling both scalar and distributional predictors, ensuring jointly monotone regression structure without enforcing monotonicity on individual functional regression coefficients, providing a statistical inference for estimated functional coefficients. Bernstein polynomial bases are employed to construct a jointly monotone regression structure without over-restricting individual functional regression coefficients to be monotone. Asymptotic projection-based joint confidence bands and a statistical test of global significance are developed to quantify uncertainty for estimated functional regression coefficients. Simulation studies illustrate a good performance of DOR model in accurately estimating the distributional effects. The method is applied to continuously monitored heart rate and physical activity data of 890 participants of Baltimore Longitudinal Study of Aging. Daily heart rate reserve, quantified via a subject-specific distribution of minute-level heart rate, is modelled additively as a function of age, gender, and BMI with an adjustment for the daily distribution of minute-level physical activity counts. Findings provide novel scientific insights in epidemiology of heart rate reserve.

相關內容

While power systems research relies on the availability of real-world network datasets, data owners (e.g., system operators) are hesitant to share data due to security and privacy risks. To control these risks, we develop privacy-preserving algorithms for the synthetic generation of optimization and machine learning datasets. Taking a real-world dataset as input, the algorithms output its noisy, synthetic version, which preserves the accuracy of the real data on a specific downstream model or even a large population of those. We control the privacy loss using Laplace and Exponential mechanisms of differential privacy and preserve data accuracy using a post-processing convex optimization. We apply the algorithms to generate synthetic network parameters and wind power data.

The Sorted L-One Estimator (SLOPE) is a popular regularization method in regression, which induces clustering of the estimated coefficients. That is, the estimator can have coefficients of identical magnitude. In this paper, we derive an asymptotic distribution of SLOPE for the ordinary least squares, Huber, and Quantile loss functions, and use it to study the clustering behavior in the limit. This requires a stronger type of convergence since clustering properties do not follow merely from the classical weak convergence. We establish asymptotic control of the false discovery rate for the asymptotic orthogonal design of the regressor. We also show how to extend the framework to a broader class of regularizers other than SLOPE.

Switch-like responses arising from bistability have been linked to cell signaling processes and memory. Revealing the shape and properties of the set of parameters that lead to bistability is necessary to understand the underlying biological mechanisms, but is a complex mathematical problem. We present an efficient approach to determine a basic topological property of the parameter region of multistationary, namely whether it is connected or not. The connectivity of this region can be interpreted in terms of the biological mechanisms underlying bistability and the switch-like patterns that the system can create. We provide an algorithm to assert that the parameter region of multistationarity is connected, targeting reaction networks with mass-action kinetics. We show that this is the case for numerous relevant cell signaling motifs, previously described to exhibit bistability. However, we show that for a motif displaying a phosphorylation cycle with allosteric enzyme regulation, the region of multistationarity has two distinct connected components, corresponding to two different, but symmetric, biological mechanisms. The method relies on linear programming and bypasses the expensive computational cost of direct and generic approaches to study parametric polynomial systems. This characteristic makes it suitable for mass-screening of reaction networks.

The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample. Such an approach has been used pervasively in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome the limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation in classical linear regression. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an $O(\mbox{nnz}(\mathbf{X})+rp^2)$ computational cost, where $\mathbf{X}$ is an $n\times p$ predictor matrix, $r$ is the number of elements selected from each column of $\mathbf{X}$, and $\mbox{nnz}(\cdot)$ denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and open-source datasets demonstrate the proposed method's superior performance compared to mainstream competitors.

Out-of-distribution (OOD) detection aims at enhancing standard deep neural networks to distinguish anomalous inputs from original training data. Previous progress has introduced various approaches where the in-distribution training data and even several OOD examples are prerequisites. However, due to privacy and security, auxiliary data tends to be impractical in a real-world scenario. In this paper, we propose a data-free method without training on natural data, called Class-Conditional Impressions Reappearing (C2IR), which utilizes image impressions from the fixed model to recover class-conditional feature statistics. Based on that, we introduce Integral Probability Metrics to estimate layer-wise class-conditional deviations and obtain layer weights by Measuring Gradient-based Importance (MGI). The experiments verify the effectiveness of our method and indicate that C2IR outperforms other post-hoc methods and reaches comparable performance to the full access (ID and OOD) detection method, especially in the far-OOD dataset (SVHN).

Functional quantile regression (FQR) is a useful alternative to mean regression for functional data as it provides a comprehensive understanding of how scalar predictors influence the conditional distribution of functional responses. In this article, we study the FQR model for densely sampled, high-dimensional functional data without relying on parametric or independent assumptions on the residual process, with the focus on statistical inference and scalable implementation. This is achieved by a simple but powerful distributed strategy, in which we first perform separate quantile regression to compute $M$-estimators at each sampling location, and then carry out estimation and inference for the entire coefficient functions by properly exploiting the uncertainty quantification and dependence structure of $M$-estimators. We derive a uniform Bahadur representation and a strong Gaussian approximation result for the $M$-estimators on the discrete sampling grid, serving as the basis for inference. An interpolation-based estimator with minimax optimality is proposed, and large sample properties for point and simultaneous interval estimators are established. The obtained minimax optimal rate under the FQR model shows an interesting phase transition phenomenon that has been previously observed in functional mean regression. The proposed methods are illustrated via simulations and an application to a mass spectrometry proteomics dataset.

Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.

We consider the problem of defining and fitting models of autoregressive time series of probability distributions on a compact interval of $\mathbb{R}$. An order-$1$ autoregressive model in this context is to be understood as a Markov chain, where one specifies a certain structure (regression) for the one-step conditional Fr\'echet mean with respect to a natural probability metric. We construct and explore different models based on iterated random function systems of optimal transport maps. While the properties and interpretation of these models depend on how they relate to the iterated transport system, they can all be analyzed theoretically in a unified way. We present such a theoretical analysis, including convergence rates, and illustrate our methodology using real and simulated data. Our approach generalises or extends certain existing models of transportation-based regression and autoregression, and in doing so also provides some additional insights on existing models.

Applying simple linear regression models, an economist analysed a published dataset from an influential annual ranking in 2016 and 2017 of consumer outlets for Dutch New Herring and concluded that the ranking was manipulated. His finding was promoted by his university in national and international media, and this led to public outrage and ensuing discontinuation of the survey. We reconstitute the dataset, correcting errors and exposing features already important in a descriptive analysis of the data. The economist has continued his investigations, and in a follow-up publication repeats the same accusations. We point out errors in his reasoning and show that alleged evidence for deliberate manipulation of the ranking could easily be an artefact of specification errors. Temporal and spatial factors are both important and complex, and their effects cannot be captured using simple models, given the small sample sizes and many factors determining perceived taste of a food product.

Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $\alpha$-R\'{e}nyi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.

北京阿比特科技有限公司