The analysis left truncated and right censored data is very common in survival and reliability analysis. In lifetime studies patients often subject to left truncation in addition to right censoring. For example, in bone marrow transplant studies based on International Bone Marrow Transplant Registry (IBMTR), the patients who die while waiting for the transplants will not be reported to the IBMTR. In this paper, we develop novel U-statistics under left truncation and right censoring. We prove the $\sqrt{n}$-consistency of the proposed U-statistics. We derive the asymptotic distribution of the U-statistics using counting process technique. As an application of the U-statistics, we develop a simple non-parametric test for testing the independence between time to failure and cause of failure in competing risks when the observations are subject to left truncation and right censoring. The finite sample performance of the proposed test is evaluated through Monte Carlo simulation study. Finally we illustrate our test procedure using lifetime data of transformers.
We give a formulation of the Nielsen-Schreier theorem (subgroups of free groups are free) in homotopy type theory using the presentation of groups as pointed connected 1-truncated types. We show the special case of finite index subgroups holds constructively and the full theorem follows from the axiom of choice. We give an example of a boolean infinity topos where our formulation of the theorem does not hold and show a stronger "untruncated" version of the theorem is provably false in homotopy type theory.
This work is motivated by personalized digital twins based on observations and physical models for treatment and prevention of Hypertension. The models commonly used are simplification of the real process and the aim is to make inference about physically interpretable parameters. To account for model discrepancy we propose to set up the estimation problem in a Bayesian calibration framework. This naturally solves the inverse problem accounting for and quantifying the uncertainty in the model formulation, in the parameter estimates and predictions. We focus on the inverse problem, i.e. to estimate the physical parameters given observations. The models we consider are the two and three parameters Windkessel models (WK2 and WK3). These models simulate the blood pressure waveform given the blood inflow and a set of physically interpretable calibration parameters. The third parameter in WK3 function as a tuning parameter. The WK2 model offers physical interpretable parameters and therefore we adopt it as a computer model choice in a Bayesian calibration formulation. In a synthetic simulation study, we simulate noisy data from the WK3 model. We estimate the model parameters using conventional methods, i.e. least squares optimization and through the Bayesian calibration framework. It is demonstrated that our formulation can reconstruct the blood pressure waveform of the complex model, but most importantly can learn the parameters according to known mathematical connections between the two models. We also successfully apply this formulation to a real case study, where data was obtained from a pilot randomized controlled trial study. Our approach is successful for both the simulation study and the real cases.
We analyze the extreme value dependence of independent, not necessarily identically distributed multivariate regularly varying random vectors. More specifically, we propose estimators of the spectral measure locally at some time point and of the spectral measures integrated over time. The uniform asymptotic normality of these estimators is proved under suitable nonparametric smoothness and regularity assumptions. We then use the process convergence of the integrated spectral measure to devise consistent tests for the null hypothesis that the spectral measure does not change over time.
Dynamic techniques are a scalable and effective way to analyze concurrent programs. Instead of analyzing all behaviors of a program, these techniques detect errors by focusing on a single program execution. Often a crucial step in these techniques is to define a causal ordering between events in the execution, which is then computed using vector clocks, a simple data structure that stores logical times of threads. The two basic operations of vector clocks, namely join and copy, require $\Theta(k)$ time, where $k$ is the number of threads. Thus they are a computational bottleneck when $k$ is large. In this work, we introduce tree clocks, a new data structure that replaces vector clocks for computing causal orderings in program executions. Joining and copying tree clocks takes time that is roughly proportional to the number of entries being modified, and hence the two operations do not suffer the a-priori $\Theta(k)$ cost per application. We show that when used to compute the classic happens-before (HB) partial order, tree clocks are optimal, in the sense that no other data structure can lead to smaller asymptotic running time. Moreover, we demonstrate that tree clocks can be used to compute other partial orders, such as schedulable-happens-before (SHB) and the standard Mazurkiewicz (MAZ) partial order, and thus are a versatile data structure. Our experiments show that just by replacing vector clocks with tree clocks, the computation becomes from $2.02 \times$ faster (MAZ) to $2.66 \times$ (SHB) and $2.97 \times$ (HB) on average per benchmark. These results illustrate that tree clocks have the potential to become a standard data structure with wide applications in concurrent analyses.
This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with neural networks. To optimise a cost represented as a function of the neural network weights, the proposed approach utilises the continuous-time policy gradient method based on adjoint sensitivity analysis as a means for correct and performant computation of cost gradient. This enables combining the stability, robustness, and physical interpretability of an analytically-derived structure for the feedback controller with the representational flexibility and optimised resulting performance provided by machine learning techniques. Such a hybrid paradigm for fixed-structure control synthesis is particularly useful for optimising adaptive nonlinear controllers to achieve improved performance in online operation, an area where the existing theory prevails the design of structure while lacking clear analytical understandings about tuning of the gains and the uncertainty model basis functions that govern the performance characteristics. Numerical experiments on aerospace applications illustrate the utility of the structured nonlinear controller optimisation framework.
The Monte Carlo simulation (MCS) is a statistical methodology used in a large number of applications. It uses repeated random sampling to solve problems with a probability interpretation to obtain high-quality numerical results. The MCS is simple and easy to develop, implement, and apply. However, its computational cost and total runtime can be quite high as it requires many samples to obtain an accurate approximation with low variance. In this paper, a novel MCS, called the self-adaptive BAT-MCS, based on the binary-adaption-tree algorithm (BAT) and our proposed self-adaptive simulation-number algorithm is proposed to simply and effectively reduce the run time and variance of the MCS. The proposed self-adaptive BAT-MCS was applied to a simple benchmark problem to demonstrate its application in network reliability. The statistical characteristics, including the expectation, variance, and simulation number, and the time complexity of the proposed self-adaptive BAT-MCS are discussed. Furthermore, its performance is compared to that of the traditional MCS extensively on a large-scale problem.
Approximations of optimization problems arise in computational procedures and sensitivity analysis. The resulting effect on solutions can be significant, with even small approximations of components of a problem translating into large errors in the solutions. We specify conditions under which approximations are well behaved in the sense of minimizers, stationary points, and level-sets and this leads to a framework of consistent approximations. The framework is developed for a broad class of composite problems, which are neither convex nor smooth. We demonstrate the framework using examples from stochastic optimization, neural-network based machine learning, distributionally robust optimization, penalty and augmented Lagrangian methods, interior-point methods, homotopy methods, smoothing methods, extended nonlinear programming, difference-of-convex programming, and multi-objective optimization. An enhanced proximal method illustrates the algorithmic possibilities. A quantitative analysis supplements the development by furnishing rates of convergence.
Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at //github.com/fuxianghuang1/PWCF
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.
In this paper we introduce a covariance framework for the analysis of EEG and MEG data that takes into account observed temporal stationarity on small time scales and trial-to-trial variations. We formulate a model for the covariance matrix, which is a Kronecker product of three components that correspond to space, time and epochs/trials, and consider maximum likelihood estimation of the unknown parameter values. An iterative algorithm that finds approximations of the maximum likelihood estimates is proposed. We perform a simulation study to assess the performance of the estimator and investigate the influence of different assumptions about the covariance factors on the estimated covariance matrix and on its components. Apart from that, we illustrate our method on real EEG and MEG data sets. The proposed covariance model is applicable in a variety of cases where spontaneous EEG or MEG acts as source of noise and realistic noise covariance estimates are needed for accurate dipole localization, such as in evoked activity studies, or where the properties of spontaneous EEG or MEG are themselves the topic of interest, such as in combined EEG/fMRI experiments in which the correlation between EEG and fMRI signals is investigated.