Cosmological parameters encoding our understanding of the expansion history of the Universe can be constrained by the accurate estimation of time delays arising in gravitationally lensed systems. We propose TD-CARMA, a Bayesian method to estimate cosmological time delays by modelling the observed and irregularly sampled light curves as realizations of a Continuous Auto-Regressive Moving Average (CARMA) process. Our model accounts for heteroskedastic measurement errors and microlensing, an additional source of independent extrinsic long-term variability in the source brightness. The semi-separable structure of the CARMA covariance matrix allows for fast and scalable likelihood computation using Gaussian Process modeling. We obtain a sample from the joint posterior distribution of the model parameters using a nested sampling approach. This allows for ``painless'' Bayesian Computation, dealing with the expected multi-modality of the posterior distribution in a straightforward manner and not requiring the specification of starting values or an initial guess for the time delay, unlike existing methods. In addition, the proposed sampling procedure automatically evaluates the Bayesian evidence, allowing us to perform principled Bayesian model selection. TD-CARMA is parsimonious, and typically includes no more than a dozen unknown parameters. We apply TD-CARMA to six doubly lensed quasars HS 2209+1914, SDSS J1001+5027, SDSS J1206+4332, SDSS J1515+1511, SDSS J1455+1447, SDSS J1349+1227, estimating their time delays as $-21.96 \pm 1.448$, $120.93 \pm 1.015$, $111.51 \pm 1.452$, $210.80 \pm 2.18$, $45.36 \pm 1.93$ and $432.05 \pm 1.950$ respectively. These estimates are consistent with those derived in the relevant literature, but are typically two to four times more precise.
We present Modular Polynomial (MP) Codes for Secure Distributed Matrix Multiplication (SDMM). The construction is based on the observation that one can decode certain proper subsets of the coefficients of a polynomial with fewer evaluations than is necessary to interpolate the entire polynomial. These codes are proven to outperform, in terms of recovery threshold, the currently best-known polynomial codes for the inner product partition. We also present Generalized Gap Additive Secure Polynomial (GGASP) codes for the grid partition. These two families of codes are shown experimentally to perform favorably in terms of recovery threshold when compared to other comparable polynomials codes for SDMM. Both MP and GGASP codes achieve the recovery threshold of Entangled Polynomial Codes for robustness against stragglers, but MP codes can decode below this recovery threshold depending on the set of worker nodes which fails. The decoding complexity of MP codes is shown to be lower than other approaches in the literature, due to the user not being tasked with interpolating an entire polynomial.
The question of whether $Y$ can be predicted based on $X$ often arises and while a well adjusted model may perform well on observed data, the risk of overfitting always exists, leading to poor generalization error on unseen data. This paper proposes a rigorous permutation test to assess the credibility of high $R^2$ values in regression models, which can also be applied to any measure of goodness of fit, without the need for sample splitting, by generating new pairings of $(X_i, Y_j)$ and providing an overall interpretation of the model's accuracy. It introduces a new formulation of the null hypothesis and justification for the test, which distinguishes it from previous literature. The theoretical findings are applied to both simulated data and sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test, showing that the less informative the predictors, the lower the probability of rejecting the null hypothesis, and emphasizing that detecting weaker dependence between variables requires a sufficient sample size.
We consider structural equation modeling (SEM) with latent variables for diffusion processes based on high-frequency data. The quasi-likelihood estimators for parameters in the SEM are proposed. The goodness-of-fit test is derived from the quasi-likelihood ratio. We also treat sparse estimation in the SEM. The goodness-of-fit test for the sparse estimation in the SEM is developed. Furthermore, the asymptotic properties of our proposed estimators are examined.
Most of the work in auction design literature assumes that bidders behave rationally based on the information available for each individual auction. However, in today's online advertising markets, one of the most important real-life applications of auction design, the data and computational power required to bid optimally are only available to the auction designer, and an advertiser can only participate by setting performance objectives (clicks, conversions, etc.) for the campaign. In this paper, we focus on value-maximizing campaigns with return-on-investment (ROI) constraints, which is widely adopted in many global-scale auto-bidding platforms. Through theoretical analysis and empirical experiments on both synthetic and realistic data, we find that second price auction exhibits many undesirable properties and loses its dominant theoretical advantages in single-item scenarios. In particular, second price auction brings equilibrium multiplicity, non-monotonicity, vulnerability to exploitation by both bidders and even auctioneers, and PPAD-hardness for the system to reach a steady-state. We also explore the broader impacts of the auto-bidding mechanism beyond efficiency and strategyproofness. In particular, the multiplicity of equilibria and the input sensitivity make advertisers' utilities unstable. In addition, the interference among both bidders and advertising slots introduces bias into A/B testing, which hinders the development of even non-bidding components of the platform. The aforementioned phenomena have been widely observed in practice, and our results indicate that one of the reasons might be intrinsic to the underlying auto-bidding mechanism. To deal with these challenges, we provide suggestions and candidate solutions for practitioners.
We develop and analyze a general technique for learning with an unknown distribution drift. Given a sequence of independent observations from the last $T$ steps of a drifting distribution, our algorithm agnostically learns a family of functions with respect to the current distribution at time $T$. Unlike previous work, our technique does not require prior knowledge about the magnitude of the drift. Instead, the algorithm adapts to the sample data. Without explicitly estimating the drift, the algorithm learns a family of functions with almost the same error as a learning algorithm that knows the magnitude of the drift in advance. Furthermore, since our algorithm adapts to the data, it can guarantee a better learning error than an algorithm that relies on loose bounds on the drift.
This article introduces new multiplicative updates for nonnegative matrix factorization with the $\beta$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $\beta$-divergence (i.e., any value of $\beta$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.
In many fields, including environmental epidemiology, researchers strive to understand the joint impact of a mixture of exposures. This involves analyzing a vector of exposures rather than a single exposure, with the most significant exposure sets being unknown. Examining every possible interaction or effect modification in a high-dimensional vector of candidates can be challenging or even impossible. To address this challenge, we propose a method for the automatic identification and estimation of exposure sets in a mixture with explanatory power, baseline covariates that modify the impact of an exposure and sets of exposures that have synergistic non-additive relationships. We define these parameters in a realistic nonparametric statistical model and use machine learning methods to identify variables sets and estimate nuisance parameters for our target parameters to avoid model misspecification. We establish a prespecified target parameter applied to variable sets when identified and use cross-validation to train efficient estimators employing targeted maximum likelihood estimation for our target parameter. Our approach applies a shift intervention targeting individual variable importance, interaction, and effect modification based on the data-adaptively determined sets of variables. Our methodology is implemented in the open-source SuperNOVA package in R. We demonstrate the utility of our method through simulations, showing that our estimator is efficient and asymptotically linear under conditions requiring fast convergence of certain regression functions. We apply our method to the National Institute of Environmental Health Science mixtures workshop data, revealing correct identification of antagonistic and agonistic interactions built into the data. Additionally, we investigate the association between exposure to persistent organic pollutants and longer leukocyte telomere length.
Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. We present pyPESTO, a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (//github.com/icb-dcm/pypesto).
Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.
Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.