ML-augmented algorithms utilize predictions to achieve performance beyond their worst-case bounds. Producing these predictions might be a costly operation -- this motivated Im et al. '22 to introduce the study of algorithms which use predictions parsimoniously. We design parsimonious algorithms for caching and MTS with action predictions, proposed by Antoniadis et al. '20, focusing on the parameters of consistency (performance with perfect predictions) and smoothness (dependence of their performance on the prediction error). Our algorithm for caching is 1-consistent, robust, and its smoothness deteriorates with the decreasing number of available predictions. We propose an algorithm for general MTS whose consistency and smoothness both scale linearly with the decreasing number of predictions. Without the restriction on the number of available predictions, both algorithms match the earlier guarantees achieved by Antoniadis et al. '20.
This paper studies linear reconstruction of partially observed functional data which are recorded on a discrete grid. We propose a novel estimation approach based on approximate factor models with increasing rank taking into account potential covariate information. Whereas alternative reconstruction procedures commonly involve some preliminary smoothing, our method separates the signal from noise and reconstructs missing fragments at once. We establish uniform convergence rates of our estimator and introduce a new method for constructing simultaneous prediction bands for the missing trajectories. A simulation study examines the performance of the proposed methods in finite samples. Finally, a real data application of temperature curves demonstrates that our theory provides a simple and effective method to recover missing fragments.
Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.
We examine a mean-reverting Ornstein-Uhlenbeck process that perturbs an unknown Lipschitz-continuous drift and aim to estimate the drift's value at a predetermined time horizon by sampling the path of the process. Due to the time varying nature of the drift we propose an estimation procedure that involves an online, time-varying optimization scheme implemented using a stochastic gradient ascent algorithm to maximize the log-likelihood of our observations. The objective of the paper is to investigate the optimal sample size/rate for achieving the minimum mean square distance between our estimator and the true value of the drift. In this setting we uncover a trade-off between the correlation of the observations, which increases with the sample size, and the dynamic nature of the unknown drift, which is weakened by increasing the frequency of observation. The mean square error is shown to be non monotonic in the sample size, attaining a global minimum whose precise description depends on the parameters that govern the model. In the static case, i.e. when the unknown drift is constant, our method outperforms the arithmetic mean of the observations in highly correlated regimes, despite the latter being a natural candidate estimator. We then compare our online estimator with the global maximum likelihood estimator.
A Peskun ordering between two samplers, implying a dominance of one over the other, is known among the Markov chain Monte Carlo community for being a remarkably strong result. It is however also known for being a result that is notably difficult to establish. Indeed, one has to prove that the probability to reach a state $\mathbf{y}$ from a state $\mathbf{x}$, using a sampler, is greater than or equal to the probability using the other sampler, and this must hold for all pairs $(\mathbf{x}, \mathbf{y})$ such that $\mathbf{x} \neq \mathbf{y}$. We provide in this paper a weaker version that does not require an inequality between the probabilities for all these states: essentially, the dominance holds asymptotically, as a varying parameter grows without bound, as long as the states for which the probabilities are greater than or equal to belong to a mass-concentrating set. The weak ordering turns out to be useful to compare lifted samplers for partially-ordered discrete state-spaces with their Metropolis--Hastings counterparts. An analysis in great generality yields a qualitative conclusion: they asymptotically perform better in certain situations (and we are able to identify them), but not necessarily in others (and the reasons why are made clear). A quantitative study in a specific context of graphical-model simulation is also conducted.
We analyze a bilinear optimal control problem for the Stokes--Brinkman equations: the control variable enters the state equations as a coefficient. In two- and three-dimensional Lipschitz domains, we perform a complete continuous analysis that includes the existence of solutions and first- and second-order optimality conditions. We also develop two finite element methods that differ fundamentally in whether the admissible control set is discretized or not. For each of the proposed methods, we perform a convergence analysis and derive a priori error estimates; the latter under the assumption that the domain is convex. Finally, assuming that the domain is Lipschitz, we develop an a posteriori error estimator for each discretization scheme and obtain a global reliability bound.
Rough set is one of the important methods for rule acquisition and attribute reduction. The current goal of rough set attribute reduction focuses more on minimizing the number of reduced attributes, but ignores the spatial similarity between reduced and decision attributes, which may lead to problems such as increased number of rules and limited generality. In this paper, a rough set attribute reduction algorithm based on spatial optimization is proposed. By introducing the concept of spatial similarity, to find the reduction with the highest spatial similarity, so that the spatial similarity between reduction and decision attributes is higher, and more concise and widespread rules are obtained. In addition, a comparative experiment with the traditional rough set attribute reduction algorithms is designed to prove the effectiveness of the rough set attribute reduction algorithm based on spatial optimization, which has made significant improvements on many datasets.
In the present paper, we introduce new tensor krylov subspace methods for solving large Sylvester tensor equations. The proposed method uses the well-known T-product for tensors and tensor subspaces. We introduce some new tensor products and the related algebraic properties. These new products will enable us to develop third-order the tensor FOM (tFOM), GMRES (tGMRES), tubal Block Arnoldi and the tensor tubal Block Arnoldi method to solve large Sylvester tensor equation. We give some properties related to these method and present some numerical experiments.
We build a valid p-value based on a concentration inequality for bounded random variables introduced by Pelekis, Ramon and Wang. The motivation behind this work is the calibration of predictive algorithms in a distribution-free setting. The super-uniform p-value is tighter than Hoeffding and Bentkus alternatives in certain regions. Even though we are motivated by a calibration setting in a machine learning context, the ideas presented in this work are also relevant in classical statistical inference. Furthermore, we compare the power of a collection of valid p- values for bounded losses, which are presented in previous literature.
When modeling a vector of risk variables, extreme scenarios are often of special interest. The peaks-over-thresholds method hinges on the notion that, asymptotically, the excesses over a vector of high thresholds follow a multivariate generalized Pareto distribution. However, existing literature has primarily concentrated on the setting when all risk variables are always large simultaneously. In reality, this assumption is often not met, especially in high dimensions. In response to this limitation, we study scenarios where distinct groups of risk variables may exhibit joint extremes while others do not. These discernible groups are derived from the angular measure inherent in the corresponding max-stable distribution, whence the term extreme direction. We explore such extreme directions within the framework of multivariate generalized Pareto distributions, with a focus on their probability density functions in relation to an appropriate dominating measure. Furthermore, we provide a stochastic construction that allows any prespecified set of risk groups to constitute the distribution's extreme directions. This construction takes the form of a smoothed max-linear model and accommodates the full spectrum of conceivable max-stable dependence structures. Additionally, we introduce a generic simulation algorithm tailored for multivariate generalized Pareto distributions, offering specific implementations for extensions of the logistic and H\"usler-Reiss families capable of carrying arbitrary extreme directions.
We present a generalization of the discrete Lehmann representation (DLR) to three-point correlation and vertex functions in imaginary time and Matsubara frequency. The representation takes the form of a linear combination of judiciously chosen exponentials in imaginary time, and products of simple poles in Matsubara frequency, which are universal for a given temperature and energy cutoff. We present a systematic algorithm to generate compact sampling grids, from which the coefficients of such an expansion can be obtained by solving a linear system. We show that the explicit form of the representation can be used to evaluate diagrammatic expressions involving infinite Matsubara sums, such as polarization functions or self-energies, with controllable, high-order accuracy. This collection of techniques establishes a framework through which methods involving three-point objects can be implemented robustly, with a substantially reduced computational cost and memory footprint.