亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Multi-competitor races often feature complicated within-race strategies that are difficult to capture when training data on race outcome level data. Further, models which do not account for such strategic effects may suffer from confounded inferences and predictions. In this work we develop a general generative model for multi-competitor races which allows analysts to explicitly model certain strategic effects such as changing lanes or drafting and separate these impacts from competitor ability. The generative model allows one to simulate full races from any real or created starting position which opens new avenues for attributing value to within-race actions and to perform counter-factual analyses. This methodology is sufficiently general to apply to any track based multi-competitor races where both tracking data is available and competitor movement is well described by simultaneous forward and lateral movements. We apply this methodology to one-mile horse races using data provided by the New York Racing Association (NYRA) and the New York Thoroughbred Horsemen's Association (NYTHA) for the Big Data Derby 2022 Kaggle Competition. This data features granular tracking data for all horses at the frame-level (occurring at approximately 4hz). We demonstrate how this model can yield new inferences, such as the estimation of horse-specific speed profiles which vary over phases of the race, and examples of posterior predictive counterfactual simulations to answer questions of interest such as starting lane impacts on race outcomes.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 未標記 · 穩健性 · 估計/估計量 · 線性回歸 ·
2023 年 11 月 29 日

In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. This paper challenges this notion, demonstrating its inaccuracy in high dimensions. Initially focusing on a dense scenario, we introduce robust semi-supervised estimators for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, we show that leveraging information from large-scale unlabeled data improves both estimation accuracy and inference robustness. Moreover, we propose semi-supervised methods with further enhanced efficiency in scenarios with a sparse linear slope. Diverging from the standard semi-supervised literature, we also allow for covariate shift. The performance of the proposed methods is illustrated through extensive numerical studies, including simulations and a real-data application to the AIDS Clinical Trials Group Protocol 175 (ACTG175).

Treatment-covariate interaction tests are commonly applied by researchers to examine whether the treatment effect varies across patient subgroups defined by baseline characteristics. The objective of this study is to explore treatment-covariate interaction tests involving covariate-adaptive randomization. Without assuming a parametric data generation model, we investigate usual interaction tests and observe that they tend to be conservative: specifically, their limiting rejection probabilities under the null hypothesis do not exceed the nominal level and are typically strictly lower than it. To address this problem, we propose modifications to the usual tests to obtain corresponding exact tests. Moreover, we introduce a novel class of stratified-adjusted interaction tests that are simple, broadly applicable, and more powerful than the usual and modified tests. Our findings are relevant to two types of interaction tests: one involving stratification covariates and the other involving additional covariates that are not used for randomization.

Missing data is a common problem in practical settings. Various imputation methods have been developed to deal with missing data. However, even though the label is usually available in the training data, the common practice of imputation usually only relies on the input and ignores the label. In this work, we illustrate how stacking the label into the input can significantly improve the imputation of the input. In addition, we propose a classification strategy that initializes the predicted test label with missing values and stacks the label with the input for imputation. This allows imputing the label and the input at the same time. Also, the technique is capable of handling data training with missing labels without any prior imputation and is applicable to continuous, categorical, or mixed-type data. Experiments show promising results in terms of accuracy.

Modern high-throughput sequencing assays efficiently capture not only gene expression and different levels of gene regulation but also a multitude of genome variants. Focused analysis of alternative alleles of variable sites at homologous chromosomes of the human genome reveals allele-specific gene expression and allele-specific gene regulation by assessing allelic imbalance of read counts at individual sites. Here we formally describe an advanced statistical framework for detecting the allelic imbalance in allelic read counts at single-nucleotide variants detected in diverse omics studies (ChIP-Seq, ATAC-Seq, DNase-Seq, CAGE-Seq, and others). MIXALIME accounts for copy-number variants and aneuploidy, reference read mapping bias, and provides several scoring models to balance between sensitivity and specificity when scoring data with varying levels of experimental noise-caused overdispersion.

Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions. Two regression methods based on fusion penalties are presented. The first one is a generalization of the variable fusion methodology based on the 1-nearest neighbor. The second one, called group fusion lasso, assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. A finite sample numerical simulation and an application on EEG data are presented.

We construct an efficient class of increasingly high-order (up to 17th-order) essentially non-oscillatory schemes with multi-resolution (ENO-MR) for solving hyperbolic conservation laws. The candidate stencils for constructing ENO-MR schemes range from first-order one-point stencil increasingly up to the designed very high-order stencil. The proposed ENO-MR schemes adopt a very simple and efficient strategy that only requires the computation of the highest-order derivatives of a part of candidate stencils. Besides simplicity and high efficiency, ENO-MR schemes are completely parameter-free and essentially scale-invariant. Theoretical analysis and numerical computations show that ENO-MR schemes achieve designed high-order convergence in smooth regions which may contain high-order critical points (local extrema) and retain ENO property for strong shocks. In addition, ENO-MR schemes could capture complex flow structures very well.

Adjustment of statistical significance levels for repeated analysis in group sequential trials has been understood for some time. Similarly, methods for adjustment accounting for testing multiple hypotheses are common. There is limited research on simultaneously adjusting for both multiple hypothesis testing and multiple analyses of one or more hypotheses. We address this gap by proposing adjusted-sequential p-values that reject an elementary hypothesis when its adjusted-sequential p-values are less than or equal to the family-wise Type I error rate (FWER) in a group sequential design. We also propose sequential p-values for intersection hypotheses as a tool to compute adjusted sequential p-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests, comparing adjusted sequential p-values to a desired FWER for inference on each elementary hypothesis tested.

Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of \textit{unconstrained} data is available alongside a smaller set of constraints, and propose \textit{ConstraintMatch} to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a \textit{pseudo-constraining} mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of \textit{informative} unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.

Federated Learning (FL) is a collaborative training paradigm that allows for privacy-preserving learning of cross-institutional models by eliminating the exchange of sensitive data and instead relying on the exchange of model parameters between the clients and a server. Despite individual studies on how client models are aggregated, and, more recently, on the benefits of ImageNet pre-training, there is a lack of understanding of the effect the architecture chosen for the federation has, and of how the aforementioned elements interconnect. To this end, we conduct the first joint ARchitecture-Initialization-Aggregation study and benchmark ARIAs across a range of medical image classification tasks. We find that, contrary to current practices, ARIA elements have to be chosen together to achieve the best possible performance. Our results also shed light on good choices for each element depending on the task, the effect of normalisation layers, and the utility of SSL pre-training, pointing to potential directions for designing FL-specific architectures and training pipelines.

Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as a solution. However, fitting models independently does not make efficient use of all available data. Conversely, fitting a single shared model to the full data set relies on imputation which often leads to biased results when missingness depends on unobserved factors. We propose an alternative approach, called sharing pattern submodels, which i) makes predictions that are robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels, and iii) has a short description, enabling improved interpretability. Parameter sharing is enforced through sparsity-inducing regularization which we prove leads to consistent estimation. Finally, we give conditions for when a sharing model is optimal, even when both missingness and the target outcome depend on unobserved variables. Classification and regression experiments on synthetic and real-world data sets demonstrate that our models achieve a favorable tradeoff between pattern specialization and information sharing.

北京阿比特科技有限公司