亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider the problem of evaluating designs for a two-arm randomized experiment for general response types under both the Neyman randomization model and population model hence generalizing the work of Kapelner et al. (2021). In the Neyman model, the only source of randomness is the treatment manipulation. Under this assumption, there is no free lunch: balanced complete randomization is minimax for estimator mean squared error. In the population model, considering the criterion where unmeasured subject covariates are averaged, we show that optimal perfect-balance designs are generally not achievable. However, in the class of all block designs, the pairwise matching design of Greevy et al. (2004) is optimal in mean squared error. When considering a tail criterion of the effect of unmeasured subject covariates in mean squared error, we prove that the common design of blocking with few blocks, i.e. order of $n^{\alpha}$ blocks for $\alpha \in (1/4,1)$, performs asymptotically optimal for all common experimental responses. Theoretical results are supported by simulations.

相關內容

This paper studies covariate adjusted estimation of the average treatment effect (ATE) in stratified experiments. We work in the stratified randomization framework of Cytrynbaum (2021), which includes matched tuples designs (e.g. matched pairs), coarse stratification, and complete randomization as special cases. Interestingly, we show that the Lin (2013) interacted regression is generically asymptotically inefficient, with efficiency only in the edge case of complete randomization. Motivated by this finding, we derive the optimal linear covariate adjustment for a given stratified design, constructing several new estimators that achieve the minimal variance. Conceptually, we show that optimal linear adjustment of a stratified design is equivalent in large samples to doubly-robust semiparametric adjustment of an independent design. We also develop novel asymptotically exact inference for the ATE over a general family of adjusted estimators, showing in simulations that the usual Eicker-Huber-White confidence intervals can significantly overcover. Our inference methods produce shorter confidence intervals by fully accounting for the precision gains from both covariate adjustment and stratified randomization. Simulation experiments and an empirical application to the Oregon Health Insurance Experiment data (Finkelstein et al. (2012)) demonstrate the value of our proposed methods.

We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism is a priori unknown to them (the so-called ''black-box'' scenario). In particular, we delve into the investigation of two notions of local differential privacy (LDP), namely {\epsilon}-LDP and R\'enyi LDP. On one hand, we prove that, without any assumption on the underlying distributions, it is not possible to have an algorithm able to infer the level of data protection with provable guarantees; this result also holds for the central versions of the two notions of DP considered. On the other hand, we demonstrate that, under reasonable assumptions (namely, Lipschitzness of the involved densities on a closed interval), such guarantees exist and can be achieved by a simple histogram-based estimator. We validate our results experimentally and we note that, on a particularly well-behaved distribution (namely, the Laplace noise), our method gives even better results than expected, in the sense that in practice the number of samples needed to achieve the desired confidence is smaller than the theoretical bound, and the estimation of {\epsilon} is more precise than predicted.

Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central problem to many scientific disciplines. Posterior inference with generative models is an alternative to methods such as Markov Chain Monte Carlo, both for likelihood-based and simulation-based inference. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `distance to random point' (DRP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators. Our method differs from previously-existing coverage-based methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is optimal. We demonstrate the method on a variety of synthetic examples, and show that DRP can be used to test the results of posterior inference analyses in high-dimensional spaces. We also show that our method can detect non-optimal inferences in cases where existing methods fail.

Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can simultaneously perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.

Generalized correlation analysis (GCA) is concerned with uncovering linear relationships across multiple datasets. It generalizes canonical correlation analysis that is designed for two datasets. We study sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. It includes sparse CCA and sparse PCA of correlation matrices as special cases. We first formulate sparse GCA as generalized eigenvalue problems at both population and sample levels via a careful choice of normalization constraints. Based on a Lagrangian form of the sample optimization problem, we propose a thresholded gradient descent algorithm for estimating GCA loading vectors and matrices in high dimensions. We derive tight estimation error bounds for estimators generated by the algorithm with proper initialization. We also demonstrate the prowess of the algorithm on a number of synthetic datasets.

Serverless computing is a popular cloud computing paradigm that frees developers from server management. Function-as-a-Service (FaaS) is the most popular implementation of serverless computing, representing applications as event-driven and stateless functions. However, existing studies report that functions of FaaS applications severely suffer from cold-start latency. In this paper, we propose an approach namely FaaSLight to accelerating the cold start for FaaS applications through application-level optimization. We first conduct a measurement study to investigate the possible root cause of the cold start problem of FaaS. The result shows that application code loading latency is a significant overhead. Therefore, loading only indispensable code from FaaS applications can be an adequate solution. Based on this insight, we identify code related to application functionalities by constructing the function-level call graph, and separate other code (i.e., optional code) from FaaS applications. The separated optional code can be loaded on demand to avoid the inaccurate identification of indispensable code causing application failure. In particular, a key principle guiding the design of FaaSLight is inherently general, i.e., platform- and language-agnostic. The evaluation results on real-world FaaS applications show that FaaSLight can significantly reduce the code loading latency (up to 78.95%, 28.78% on average), thereby reducing the cold-start latency. As a result, the total response latency of functions can be decreased by up to 42.05% (19.21% on average). Compared with the state-of-the-art, FaaSLight achieves a 21.25X improvement in reducing the average total response latency.

Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of randomization largely justifies the statistical assumptions made. However, RCT sample sizes are often small, leading to low precision; in many cases RCT estimates may be too imprecise to guide policy or inform science. Observational studies, by contrast, have strengths and weaknesses complementary to those of RCTs. Observational studies typically offer much larger sample sizes, but may suffer confounding. In many contexts, experimental and observational data exist side by side, allowing the possibility of integrating "big observational data" with "small but high-quality experimental data" to get the best of both. Such approaches hold particular promise in the field of education, where RCT sample sizes are often small due to cost constraints, but automatic collection of observational data, such as in computerized educational technology applications, or in state longitudinal data systems (SLDS) with administrative data on hundreds of thousand of students, has made rich, high-dimensional observational data widely available. We outline an approach that allows one to employ machine learning algorithms to learn from the observational data, and use the resulting models to improve precision in randomized experiments. Importantly, there is no requirement that the machine learning models are "correct" in any sense, and the final experimental results are guaranteed to be exactly unbiased. Thus, there is no danger of confounding biases in the observational data leaking into the experiment.

Physics informed neural networks (PINNs) have proven to be an efficient tool to represent problems for which measured data are available and for which the dynamics in the data are expected to follow some physical laws. In this paper, we suggest a multiobjective perspective on the training of PINNs by treating the data loss and the residual loss as two individual objective functions in a truly biobjective optimization approach. As a showcase example, we consider COVID-19 predictions in Germany and built an extended susceptibles-infected-recovered (SIR) model with additionally considered leaky-vaccinated and hospitalized populations (SVIHR model) to model the transition rates and to predict future infections. SIR-type models are expressed by systems of ordinary differential equations (ODEs). We investigate the suitability of the generated PINN for COVID-19 predictions and compare the resulting predicted curves with those obtained by applying the method of non-standard finite differences to the system of ODEs and initial data. The approach is applicable to various systems of ODEs that define dynamical regimes. Those regimes do not need to be SIR-type models, and the corresponding underlying data sets do not have to be associated with COVID-19.

The reconfigurable intelligent surface (RIS) is an emerging technology that changes how wireless networks are perceived, therefore its potential benefits and applications are currently under intense research and investigation. In this letter, we focus on electromagnetically consistent models for RISs, by departing from a recently proposed model based on mutually coupled loaded wire dipoles. While existing related research focuses on free-space wireless channels or ignore the interactions between the RIS and the scattering objects present in the propagation environment, we introduce an RIS-aided channel model that is applicable to more realistic scenarios, which include the presence of scattering objects that are modeled as loaded wire dipoles. By adjusting the parameters of the wire dipoles and the loads, the properties of general natural and engineered material objects can be modeled. We introduce a provably convergent and efficient iterative algorithm that jointly optimizes the RIS and base station configurations to maximize the system sum-rate. Extensive numerical results show the net performance improvement provided by the proposed method compared with existing optimization algorithms.

Driver stress is a major cause of car accidents and death worldwide. Furthermore, persistent stress is a health problem, contributing to hypertension and other diseases of the cardiovascular system. Stress has a measurable impact on heart and breathing rates and stress levels can be inferred from such measurements. Galvanic skin response is a common test to measure the perspiration caused by both physiological and psychological stress, as well as extreme emotions. In this paper, galvanic skin response is used to estimate the ground truth stress levels. A feature selection technique based on the minimal redundancy-maximal relevance method is then applied to multiple heart rate variability and breathing rate metrics to identify a novel and optimal combination for use in detecting stress. The support vector machine algorithm with a radial basis function kernel was used along with these features to reliably predict stress. The proposed method has achieved a high level of accuracy on the target dataset.

北京阿比特科技有限公司