Surrogate endpoint (SE) for overall survival (OS) in cancer patients is essential to improving the efficiency of oncology drug development. In practice, we may discover a new patient level association with OS in a discovery cohort, and then measure the trial level association across studies in a meta-analysis to validate the SE. In this work, we simulated pairs of metrics to quantify the surrogacy at the patient level and the trial level and evaluated their association, and to understand how well various patient level metrics from the initial discovery would indicate the eventual utility as a SE. Across all the simulation scenarios, we found tight correlation among all the patient level metrics, including C index, integrated brier score and log hazard ratio between SE values and OS; and similar correlation between any of them and the trial level association metric. Despite the continual increase in the true biological link between SE and OS, both patient and trial level metrics often plateaued coincidentally in many scenarios; their association always decreased quickly. Under the SE development framework and data generation models considered here, all patient level metrics are similar in ranking a candidate SE according to its eventual trial level association; incorporating additional biological factors into a SE are likely to have diminished return in improving both patient level and trial level association.
With the rapid development of new anti-cancer agents which are cytostatic, new endpoints are needed to better measure treatment efficacy in phase II trials. For this purpose, Von Hoff (1998) proposed the growth modulation index (GMI), i.e. the ratio between times to progression or progression-free survival times in two successive treatment lines. An essential task in studies using GMI as an endpoint is to estimate the distribution of GMI. Traditional methods for survival data have been used for estimating the GMI distribution because censoring is common for GMI data. However, we point out that the independent censoring assumption required by traditional survival methods is always violated for GMI, which may lead to severely biased results. In this paper, we construct nonparametric estimators for the distribution of GMI, accounting for the dependent censoring of GMI. We prove that the proposed estimators are consistent and converge weakly to zero-mean Gaussian processes upon proper normalization. Extensive simulation studies show that our estimators perform well in practical situations and outperform traditional methods. A phase II clinical trial using GMI as the primary endpoint is provided for illustration.
Population adjustment methods such as matching-adjusted indirect comparison (MAIC) are increasingly used to compare marginal treatment effects when there are cross-trial differences in effect modifiers and limited patient-level data. MAIC is based on propensity score weighting, which is sensitive to poor covariate overlap and cannot extrapolate beyond the observed covariate space. Current outcome regression-based alternatives can extrapolate but target a conditional treatment effect that is incompatible in the indirect comparison. When adjusting for covariates, one must integrate or average the conditional estimate over the relevant population to recover a compatible marginal treatment effect. We propose a marginalization method based parametric G-computation that can be easily applied where the outcome regression is a generalized linear model or a Cox model. The approach views the covariate adjustment regression as a nuisance model and separates its estimation from the evaluation of the marginal treatment effect of interest. The method can accommodate a Bayesian statistical framework, which naturally integrates the analysis into a probabilistic framework. A simulation study provides proof-of-principle and benchmarks the method's performance against MAIC and the conventional outcome regression. Parametric G-computation achieves more precise and more accurate estimates than MAIC, particularly when covariate overlap is poor, and yields unbiased marginal treatment effect estimates under no failures of assumptions. Furthermore, the marginalized covariate-adjusted estimates provide greater precision and accuracy than the conditional estimates produced by the conventional outcome regression, which are systematically biased because the measure of effect is non-collapsible.
Well structured and readable source code is a pre-requisite for maintainable software and successful collaboration among developers. Static analysis enables the automated extraction of code complexity and readability metrics which can be leveraged to highlight potential improvements in code to both attain software of high quality and reinforce good practices for developers as an educational tool. This assumes reliable readability metrics which are not trivial to obtain since code readability is somewhat subjective. Recent research has resulted in increasingly sophisticated models for predicting readability as perceived by humans primarily with a procedural and object oriented focus, while functional and declarative languages and language extensions advance as they often are said to lead to more concise and readable code. In this paper, we investigate whether the existing complexity and readability metrics reflect that wisdom or whether the notion of readability and its constituents requires overhaul in the light of programming language changes. We therefore compare traditional object oriented and reactive programming in terms of code complexity and readability in a case study. Reactive programming is claimed to increase code quality but few studies have substantiated these claims empirically. We refactored an object oriented open source project into a reactive candidate and compare readability with the original using cyclomatic complexity and two state-of-the-art readability metrics. More elaborate investigations are required, but our findings suggest that both cyclomatic complexity and readability decrease significantly at the same time in the reactive candidate, which seems counter-intuitive. We exemplify and substantiate why readability metrics may require adjustment to better suit popular programming styles other than imperative and object-oriented to better match human expectations.
Measuring quality of cancer care delivered by US health providers is challenging. Patients receiving oncology care greatly vary in disease presentation among other key characteristics. In this paper we discuss a framework for institutional quality measurement which addresses the heterogeneity of patient populations. For this, we follow recent statistical developments on health outcomes research and conceptualize the task of quality measurement as a causal inference problem, helping to target flexible covariate profiles that can represent specific populations of interest. To our knowledge, such covariate profiles have not been used in the quality measurement literature. We use different clinically relevant covariate profiles and evaluate methods for layered case-mix adjustments that combine weighting and regression modeling approaches in a sequential manner in order to reduce model extrapolation and allow for provider effect modification. We appraise these methods in an extensive simulation study and highlight the practical utility of weighting methods that warn the investigator when case-mix adjustments are infeasible without some form of extrapolation that goes beyond the support of the data. In a study of cancer-care outcomes, we assess the performance of oncology practices for different profiles that correspond to the types of patients who may receive cancer care. We describe how the methods examined may be particularly important for high-stakes quality measurement, such as public reporting or performance-based payments. These methods may also be applied to support the health care decisions of individual patients and provide a path to personalized quality measurement.
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.
Crystal Structure Prediction (csp) is one of the central and most challenging problems in materials science and computational chemistry. In csp, the goal is to find a configuration of ions in 3D space that yields the lowest potential energy. Finding an efficient procedure to solve this complex optimisation question is a well known open problem in computational chemistry. Due to the exponentially large search space, the problem has been referred in several materials-science papers as "NP-Hard and very challenging" without any formal proof though. This paper fills a gap in the literature providing the first set of formally proven NP-Hardness results for a variant of csp with various realistic constraints. In particular, we focus on the problem of removal: the goal is to find a substructure with minimal potential energy, by removing a subset of the ions from a given initial structure. Our main contributions are NP-Hardness results for the csp removal problem, new embeddings of combinatorial graph problems into geometrical settings, and a more systematic exploration of the energy function to reveal the complexity of csp. In a wider context, our results contribute to the analysis of computational problems for weighted graphs embedded into the three-dimensional Euclidean space.
The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.
Although Recommender Systems have been comprehensively studied in the past decade both in industry and academia, most of current recommender systems suffer from the fol- lowing issues: 1) The data sparsity of the user-item matrix seriously affect the recommender system quality. As a result, most of traditional recommender system approaches are not able to deal with the users who have rated few items, which is known as cold start problem in recommender system. 2) Traditional recommender systems assume that users are in- dependently and identically distributed and ignore the social relation between users. However, in real life scenario, due to the exponential growth of social networking service, such as facebook and Twitter, social connections between different users play an significant role for recommender system task. In this work, aiming at providing a better recommender sys- tems by incorporating user social network information, we propose a matrix factorization framework with user social connection constraints. Experimental results on the real-life dataset shows that the proposed method performs signifi- cantly better than the state-of-the-art approaches in terms of MAE and RMSE, especially for the cold start users.
Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, while CIDEr score ensures our captions are syntactically fluent. The PG method we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. We show empirically that our algorithm leads to easier optimization and improved results compared to MIXER. Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.