The adaptive lasso refers to a class of methods that use weighted versions of the $L_1$-norm penalty, with weights derived from an initial estimate of the parameter vector to be estimated. Irrespective of the method chosen to compute this initial estimate, the performance of the adaptive lasso critically depends on the value of a hyperparameter, which controls the magnitude of the weighted $L_1$-norm in the penalized criterion. As for other machine learning methods, cross-validation is very popular for the calibration of the adaptive lasso, that this for the selection of a data-driven optimal value of this hyperparameter. However, the most simple cross-validation scheme is not valid in this context, and a more elaborate one has to be employed to guarantee an optimal calibration. The discrepancy of the simple cross-validation scheme has been well documented in other contexts, but less so when it comes to the calibration of the adaptive lasso, and, therefore, many statistical analysts still overlook it. In this work, we recall appropriate cross-validation schemes for the calibration of the adaptive lasso, and illustrate the discrepancy of the simple scheme, using both synthetic and real-world examples. Our results clearly establish the suboptimality of the simple scheme, in terms of support recovery and prediction error, for several versions of the adaptive lasso, including the popular one-step lasso.
This paper proposes the estimation of a smooth graphon model for network data analysis using principles of the EM algorithm. The approach considers both variability with respect to ordering the nodes of a network and smooth estimation of the graphon by nonparametric regression. To do so, (linear) B-splines are used, which allow for smooth estimation of the graphon, conditional on the node ordering. This provides the M-step. The true ordering of the nodes arising from the graphon model remains unobserved and Bayesian ideas are employed to obtain posterior samples given the network data. This yields the E-step. Combining both steps gives an EM-based approach for smooth graphon estimation. Unlike common other methods, this procedure does not require the restriction of a monotonic marginal function. The proposed graphon estimate allows to explore node-ordering strategies and therefore to compare the common degree-based node ranking with the ordering conditional on the network. Variability and uncertainty are taken into account using MCMC techniques. Examples and simulation studies support the applicability of the approach.
Choosing models from a hypothesis space is a frequent task in approximation theory and inverse problems. Cross-validation is a classical tool in the learner's repertoire to compare the goodness of fit for different reconstruction models. Much work has been dedicated to computing this quantity in a fast manner but tackling its theoretical properties occurs to be difficult. So far, most optimality results are stated in an asymptotic fashion. In this paper we propose a concentration inequality on the difference of cross-validation score and the risk functional with respect to the squared error. This gives a pre-asymptotic bound which holds with high probability. For the assumptions we rely on bounds on the uniform error of the model which allow for a broadly applicable framework. We support our claims by applying this machinery to Shepard's model, where we are able to determine precise constants of the concentration inequality. Numerical experiments in combination with fast algorithms indicate the applicability of our results.
The general adwords problem has remained largely unresolved. We define a subcase called {\em $k$-TYPICAL}, $k \in \Zplus$, as follows: the total budget of all the bidders is sufficient to buy $k$ bids for each bidder. This seems a reasonable assumption for a "typical" instance, at least for moderate values of $k$. We give a randomized online algorithm achieving a competitive ratio of $\left(1 - {1 \over e} - {1 \over k} \right) $ for this problem. We also give randomized online algorithms for other special cases of adwords. Another subcase, when bids are small compared to budgets, has been of considerable practical significance in ad auctions \cite{MSVV}. For this case, we give an optimal randomized online algorithm achieving a competitive ratio of $\left(1 - {1 \over e} \right)$. Previous algorithms for this case were based on LP-duality; the impact of our new approach remains to be seen. The key to these results is a simplification of the proof for RANKING, the optimal algorithm for online bipartite matching, given in \cite{KVV}. Our algorithms for adwords can be seen as natural extensions of RANKING.
In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.
Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS): resampling the data conditional on a sufficient statistic for the null model guarantees valid GoF testing using any test statistic the analyst chooses. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this paper, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We apply our proposed procedure both theoretically and in simulation to a number of models of interest to demonstrate its finite-sample Type I error and power.
Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches.
This paper analyzes the correlation matrix between the a priori error and measurement noise vectors for affine projection algorithms (APA). This correlation stems from the dependence between the filter tap estimates and the noise samples, and has a strong influence on the mean square behavior of the algorithm. We show that the correlation matrix is upper triangular, and compute the diagonal elements in closed form, showing that they are independent of the input process statistics. Also, for white inputs we show that the matrix is fully diagonal. These results are valid in the transient and steady states of the algorithm considering a possibly variable step-size. Our only assumption is that the filter order is large compared to the projection order of APA and we make no assumptions on the input signal except for stationarity. Using these results, we perform a steady-state analysis of the algorithm for small step size and provide a new simple closed-form expression for mean-square error, which has comparable or better accuracy to many preexisting expressions, and is much simpler to compute. Finally, we also obtain expressions for the steady-state energy of the other components of the error vector.
Plausibility is a formalization of exact tests for parametric models and generalizes procedures such as Fisher's exact test. The resulting tests are based on cumulative probabilities of the probability density function and evaluate consistency with a parametric family while providing exact control of the $\alpha$ level for finite sample size. Model comparisons are inefficient in this approach. We generalize plausibility by incorporating weighing which allows to perform model comparisons. We show that one weighing scheme is asymptotically equivalent to the likelihood ratio test (LRT) and has finite sample guarantees for the test size under the null hypothesis unlike the LRT. We confirm theoretical properties in simulations that mimic the data set of our data application. We apply the method to a retinoblastoma data set and demonstrate a parent-of-origin effect. Weighted plausibility also has applications in high-dimensional data analysis and P-values for penalized regression models can be derived. We demonstrate superior performance as compared to a data-splitting procedure in a simulation study. We apply weighted plausibility to a high-dimensional gene expression, case-control prostate cancer data set. We discuss the flexibility of the approach by relating weighted plausibility to targeted learning, the bootstrap, and sparsity selection.
Event cameras trigger events asynchronously and independently upon a sufficient change of the logarithmic brightness level. The neuromorphic sensor has several advantages over standard cameras including low latency, absence of motion blur, and high dynamic range. Event cameras are particularly well suited to sense motion dynamics in agile scenarios. We propose the continuous event-line constraint, which relies on a constant-velocity motion assumption as well as trifocal tensor geometry in order to express a relationship between line observations given by event clusters as well as first-order camera dynamics. Our core result is a closed-form solver for up-to-scale linear camera velocity {with known angular velocity}. Nonlinear optimization is adopted to improve the performance of the algorithm. The feasibility of the approach is demonstrated through a careful analysis on both simulated and real data.
Zero-inflated explanatory variables are common in fields such as ecology and finance. In this paper we address the problem of having excess of zero values in some explanatory variables which are subject to multioutcome lasso-regularized variable selection. Briefly, the problem results from the failure of the lasso-type of shrinkage methods to recognize any difference between zero value occurring either in the regression coefficient or in the corresponding value of the explanatory variable. This kind of confounding will obviously increase number of false positives - all non-zero regression coefficients do not necessarily represent real outcome effects. We present here the adaptive LAD-lasso for multiple outcomes which extends the earlier work of multivariate LAD-lasso with adaptive penalization. In addition of well known property of having less biased regression coefficients, we show here how the adaptivity improves also method's ability to recover from influences of excess of zero values measured in continuous covariates.