Given functional data from a survival process with time-dependent covariates, we derive a smooth convex representation for its nonparametric log-likelihood functional and obtain its functional gradient. From this, we devise a generic gradient boosting procedure for estimating the hazard function nonparametrically. An illustrative implementation of the procedure using regression trees is described to show how to recover the unknown hazard. The generic estimator is consistent if the model is correctly specified; alternatively, an oracle inequality can be demonstrated for tree-based models. To avoid overfitting, boosting employs several regularization devices. One of them is step-size restriction, but the rationale for this is somewhat mysterious from the viewpoint of consistency. Our work brings some clarity to this issue by revealing that step-size restriction is a mechanism for preventing the curvature of the risk from derailing convergence.
We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.
Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set. In the continuous case, only parametric methods have been investigated, while nonparametric maximum likelihood for DPPs -- an optimization problem over trace-class operators -- has remained an open question. In this paper, we show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS. This leads to a finite-dimensional problem, with strong statistical ties to the original MLE. Moreover, we propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem. Finally, we also provide a controlled estimate of the correlation kernel of the DPP, thus providing more interpretability.
Beta regression model is useful in the analysis of bounded continuous outcomes such as proportions. It is well known that for any regression model, the presence of multicollinearity leads to poor performance of the maximum likelihood estimators. The ridge type estimators have been proposed to alleviate the adverse effects of the multicollinearity. Furthermore, when some of the predictors have insignificant or weak effects on the outcomes, it is desired to recover as much information as possible from these predictors instead of discarding them all together. In this paper we proposed ridge type shrinkage estimators for the low and high dimensional beta regression model, which address the above two issues simultaneously. We compute the biases and variances of the proposed estimators in closed forms and use Monte Carlo simulations to evaluate their performances. The results show that, both in low and high dimensional data, the performance of the proposed estimators are superior to ridge estimators that discard weak or insignificant predictors. We conclude this paper by applying the proposed methods for two real data from econometric and medicine.
We propose and analyze volume-preserving parametric finite element methods for surface diffusion, conserved mean curvature flow and an intermediate evolution law in an axisymmetric setting. The weak formulations are presented in terms of the generating curves of the axisymmetric surfaces. The proposed numerical methods are based on piecewise linear parametric finite elements. The constructed fully practical schemes satisfy the conservation of the enclosed volume. In addition, we prove the unconditional stability and consider the distribution of vertices for the discretized schemes. The introduced methods are implicit and the resulting nonlinear systems of equations can be solved very efficiently and accurately via the Newton's iterative method. Numerical results are presented to show the accuracy and efficiency of the introduced schemes for computing the considered axisymmetric geometric flows.
We study timed systems in which some timing features are unknown parameters. Parametric timed automata (PTAs) are a classical formalism for such systems but for which most interesting problems are undecidable. Notably, the parametric reachability emptiness problem, i.e., the emptiness of the parameter valuations set allowing to reach some given discrete state, is undecidable. Lower-bound/upper-bound parametric timed automata (L/U-PTAs) achieve decidability for reachability properties by enforcing a separation of parameters used as upper bounds in the automaton constraints, and those used as lower bounds. In this paper, we first study reachability. We exhibit a subclass of PTAs (namely integer-points PTAs) with bounded rational-valued parameters for which the parametric reachability emptiness problem is decidable. Using this class, we present further results improving the boundary between decidability and undecidability for PTAs and their subclasses such as L/U-PTAs. We then study liveness. We prove that: (1) the existence of at least one parameter valuation for which there exists an infinite run in an L/U-PTA is PSPACE-complete; (2) the existence of a parameter valuation such that the system has a deadlock is however undecidable; (3) the problem of the existence of a valuation for which a run remains in a given set of locations exhibits a very thin border between decidability and undecidability.
This paper studies optimal motion planning subject to motion and environment uncertainties. By modeling the system as a probabilistic labeled Markov decision process (PL-MDP), the control objective is to synthesize a finite-memory policy, under which the agent satisfies high-level complex tasks expressed as linear temporal logic (LTL) with desired satisfaction probability. In particular, the cost optimization of the trajectory that satisfies infinite-horizon tasks is considered, and the trade-off between reducing the expected mean cost and maximizing the probability of task satisfaction is analyzed. Instead of using traditional Rabin automata, the LTL formulas are converted to limit-deterministic B\"uchi automata (LDBA) with a more straightforward accepting condition and a more compact graph structure. The novelty of this work lies in the consideration of the cases that LTL specifications can be potentially infeasible and the development of a relaxed product MDP between PL-MDP and LDBA. The relaxed product MDP allows the agent to revise its motion plan whenever the task is not fully feasible and to quantify the violation measurement of the revised plan. A multi-objective optimization problem is then formulated to jointly consider the probability of the task satisfaction, the violation with respect to original task constraints, and the implementation cost of the policy execution, which is solved via coupled linear programs. To the best of our knowledge, it is the first work that bridges the gap between planning revision and optimal control synthesis of both plan prefix and plan suffix of the agent trajectory over the infinite horizon. Experimental results are provided to demonstrate the effectiveness of the proposed framework.
The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the "Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. Our theoretical results show that NEST has strong asymptotic properties without requiring explicit assumptions about the prior. Extensions to other two-parameter members of the exponential family are discussed. Simulation studies show that NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns.
In large epidemiologic studies, self-reported outcomes are often used to record disease status more frequently than by gold standard diagnostic tests alone. While self-reported disease outcomes are easier to obtain than diagnostic test results, they are often prone to error. There has recently been interest in using error-prone, auxiliary outcomes to improve the efficiency of inference for discrete time-to-event analyses. We have developed a new augmented likelihood approach that incorporates auxiliary data into the analysis of gold standard time-to-event outcome, which can be considered when self-reported outcomes are available in addition to a gold standard endpoint. We conduct a numerical study to show how we can improve statistical efficiency by using the proposed method instead of standard approaches for interval-censored survival data that do not leverage auxiliary data. We also extended this method for the complex survey design setting so that it can be applied in our motivating data example. We apply this method to data from the Hispanic Community Health Study/Study of Latinos in order to assess the association between energy and protein intake and the risk of incident diabetes. In our application, we demonstrate how our method can be used in combination with regression calibration to additionally address the covariate measurement error in the self-reported diet.
Few-shot Learning aims to learn classifiers for new classes with only a few training examples per class. Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels. The meta-learning approaches train a meta learner to predict weights of homogeneous-structured task-specific networks, requiring a uniform number of classes across tasks. The metric-learning approaches learn one task-invariant metric for all the tasks, and they fail if the tasks diverge. We propose to deal with these limitations with meta metric learning. Our meta metric learning approach consists of task-specific learners, that exploit metric learning to handle flexible labels, and a meta learner, that discovers good parameters and gradient decent to specify the metrics in task-specific learners. Thus the proposed model is able to handle unbalanced classes as well as to generate task-specific metrics. We test our approach in the `$k$-shot $N$-way' few-shot learning setting used in previous work and new realistic few-shot setting with diverse multi-domain tasks and flexible label numbers. Experiments show that our approach attains superior performances in both settings.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.