Time-to-event data are often recorded on a discrete scale with multiple, competing risks as potential causes for the event. In this context, application of continuous survival analysis methods with a single risk suffer from biased estimation. Therefore, we propose the Multivariate Bernoulli detector for competing risks with discrete times involving a multivariate change point model on the cause-specific baseline hazards. Through the prior on the number of change points and their location, we impose dependence between change points across risks, as well as allowing for data-driven learning of their number. Then, conditionally on these change points, a Multivariate Bernoulli prior is used to infer which risks are involved. Focus of posterior inference is cause-specific hazard rates and dependence across risks. Such dependence is often present due to subject-specific changes across time that affect all risks. Full posterior inference is performed through a tailored local-global Markov chain Monte Carlo (MCMC) algorithm, which exploits a data augmentation trick and MCMC updates from non-conjugate Bayesian nonparametric methods. We illustrate our model in simulations and on prostate cancer data, comparing its performance with existing approaches.
While well-established methods for time-to-event data are available when the proportional hazards assumption holds, there is no consensus on the best inferential approach under non-proportional hazards (NPH). However, a wide range of parametric and non-parametric methods for testing and estimation in this scenario have been proposed. To provide recommendations on the statistical analysis of clinical trials where non proportional hazards are expected, we conducted a comprehensive simulation study under different scenarios of non-proportional hazards, including delayed onset of treatment effect, crossing hazard curves, subgroups with different treatment effect and changing hazards after disease progression. We assessed type I error rate control, power and confidence interval coverage, where applicable, for a wide range of methods including weighted log-rank tests, the MaxCombo test, summary measures such as the restricted mean survival time (RMST), average hazard ratios, and milestone survival probabilities as well as accelerated failure time regression models. We found a trade-off between interpretability and power when choosing an analysis strategy under NPH scenarios. While analysis methods based on weighted logrank tests typically were favorable in terms of power, they do not provide an easily interpretable treatment effect estimate. Also, depending on the weight function, they test a narrow null hypothesis of equal hazard functions and rejection of this null hypothesis may not allow for a direct conclusion of treatment benefit in terms of the survival function. In contrast, non-parametric procedures based on well interpretable measures as the RMST difference had lower power in most scenarios. Model based methods based on specific survival distributions had larger power, however often gave biased estimates and lower than nominal confidence interval coverage.
The moderate deviation regime is concerned with the finite block length trade-off between communication cost and error for information processing tasks in the asymptotic regime, where the communication cost approaches a capacity-like quantity and the error vanishes at the same time. We find exact characterisations of these trade-offs for a variety of fully quantum communication tasks, including quantum source coding, quantum state splitting, entanglement-assisted quantum channel coding, and entanglement-assisted quantum channel simulation. The main technical tool we derive is a tight relation between the partially smoothed max-information and the hypothesis testing relative entropy. This allows us to obtain the expansion of the partially smoothed max-information for i.i.d. states in the moderate deviation regime.
M-estmators including the Welsch and Cauchy have been widely adopted for robustness against outliers, but they also down-weigh the uncontaminated data. To address this issue, we devise a framework to generate a class of nonconvex functions which only down-weigh outlier-corrupted observations. Our framework is then applied to the Welsch, Cauchy and $\ell_p$-norm functions to produce the corresponding robust loss functions. Targeting on the application of robust matrix completion, efficient algorithms based on these functions are developed and their convergence is analyzed. Finally, extensive numerical results demonstrate that the proposed methods are superior to the competitors in terms of recovery accuracy and runtime.
Accessibility measures how well a location is connected to surrounding opportunities. We focus on accessibility provided by Public Transit (PT). There is an evident inequality in the distribution of accessibility between city centers or close to main transportation corridors and suburbs. In the latter, poor PT service leads to a chronic car-dependency. Demand-Responsive Transit (DRT) is better suited for low-density areas than conventional fixed-route PT. However, its potential to tackle accessibility inequality has not yet been exploited. On the contrary, planning DRT without care to inequality (as in the methods proposed so far) can further improve the accessibility gap in urban areas. To the best of our knowledge this paper is the first to propose a DRT planning strategy, which we call AccEq-DRT, aimed at reducing accessibility inequality, while ensuring overall efficiency. To this aim, we combine a graph representation of conventional PT and a Continuous Approximation (CA) model of DRT. The two are combined in the same multi-layer graph, on which we compute accessibility. We then devise a scoring function to estimate the need of each area for an improvement, appropriately weighting population density and accessibility. Finally, we provide a bilevel optimization method, where the upper level is a heuristic to allocate DRT buses, guided by the scoring function, and the lower level performs traffic assignment. Numerical results in a simplified model of Montreal show that inequality, measured with the Atkinson index, is reduced by up to 34\%. Keywords: DRT Public, Transportation, Accessibility, Continuous Approximation, Network Design
A central challenge in the verification of quantum computers is benchmarking their performance as a whole and demonstrating their computational capabilities. In this work, we find a universal model of quantum computation, Bell sampling, that can be used for both of those tasks and thus provides an ideal stepping stone towards fault-tolerance. In Bell sampling, we measure two copies of a state prepared by a quantum circuit in the transversal Bell basis. We show that the Bell samples are classically intractable to produce and at the same time constitute what we call a circuit shadow: from the Bell samples we can efficiently extract information about the quantum circuit preparing the state, as well as diagnose circuit errors. In addition to known properties that can be efficiently extracted from Bell samples, we give two new and efficient protocols, a test for the depth of the circuit and an algorithm to estimate a lower bound to the number of T gates in the circuit. With some additional measurements, our algorithm learns a full description of states prepared by circuits with low T-count.
The investigation of mixture models is a key to understand and visualize the distribution of multivariate data. Most mixture models approaches are based on likelihoods, and are not adapted to distribution with finite support or without a well-defined density function. This study proposes the Augmented Quantization method, which is a reformulation of the classical quantization problem but which uses the p-Wasserstein distance. This metric can be computed in very general distribution spaces, in particular with varying supports. The clustering interpretation of quantization is revisited in a more general framework. The performance of Augmented Quantization is first demonstrated through analytical toy problems. Subsequently, it is applied to a practical case study involving river flooding, wherein mixtures of Dirac and Uniform distributions are built in the input space, enabling the identification of the most influential variables.
The elliptic curve discrete logarithm problem is of fundamental importance in public-key cryptography. It is in use for a long time. Moreover, it is an interesting challenge in computational mathematics. Its solution is supposed to provide interesting research directions. In this paper, we explore ways to solve the elliptic curve discrete logarithm problem. Our results are mostly computational. However, it seems, the methods that we develop and directions that we pursue can provide a potent attack on this problem. This work follows our earlier work, where we tried to solve this problem by finding a zero minor in a matrix over the same finite field on which the elliptic curve is defined. This paper is self-contained.
Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (DEFT). This novel approach provides comprehensive insights into a person's typing behaviour, surpassing typing velocity alone. We build a DEFT model by combining DEFT features with other previously used keystroke dynamic features. The DEFT model is designed to be device-agnostic, allowing us to evaluate its effectiveness across three commonly used devices: desktop, mobile, and tablet. The DEFT model outperforms the existing state-of-the-art methods when we evaluate its effectiveness across two datasets. We obtain accuracy rates exceeding 99% and equal error rates below 10% on all three devices.
A deep generative model yields an implicit estimator for the unknown distribution or density function of the observation. This paper investigates some statistical properties of the implicit density estimator pursued by VAE-type methods from a nonparametric density estimation framework. More specifically, we obtain convergence rates of the VAE-type density estimator under the assumption that the underlying true density function belongs to a locally H\"{o}lder class. Remarkably, a near minimax optimal rate with respect to the Hellinger metric can be achieved by the simplest network architecture, a shallow generative model with a one-dimensional latent variable.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.