The problem of distributed optimization requires a group of networked agents to compute a parameter that minimizes the average of their local cost functions. While there are a variety of distributed optimization algorithms that can solve this problem, they are typically vulnerable to "Byzantine" agents that do not follow the algorithm. Recent attempts to address this issue focus on single dimensional functions, or assume certain statistical properties of the functions at the agents. In this paper, we provide two resilient, scalable, distributed optimization algorithms for multi-dimensional functions. Our schemes involve two filters, (1) a distance-based filter and (2) a min-max filter, which each remove neighborhood states that are extreme (defined precisely in our algorithms) at each iteration. We show that these algorithms can mitigate the impact of up to F (unknown) Byzantine agents in the neighborhood of each regular agent. In particular, we show that if the network topology satisfies certain conditions, all of the regular agents' states are guaranteed to converge to a bounded region that contains the minimizer of the average of the regular agents' functions.
Causal discovery from observational and interventional data is challenging due to limited data and non-identifiability: factors that introduce uncertainty in estimating the underlying structural causal model (SCM). Selecting experiments (interventions) based on the uncertainty arising from both factors can expedite the identification of the SCM. Existing methods in experimental design for causal discovery from limited data either rely on linear assumptions for the SCM or select only the intervention target. This work incorporates recent advances in Bayesian causal discovery into the Bayesian optimal experimental design framework, allowing for active causal discovery of large, nonlinear SCMs while selecting both the interventional target and the value. We demonstrate the performance of the proposed method on synthetic graphs (Erdos-R\`enyi, Scale Free) for both linear and nonlinear SCMs as well as on the \emph{in-silico} single-cell gene regulatory network dataset, DREAM.
Aiming at ontology-based data access to temporal data, we design two-dimensional temporal ontology and query languages by combining logics from the (extended) DL-Lite family with linear temporal logic LTL over discrete time (Z,<). Our main concern is first-order rewritability of ontology-mediated queries (OMQs) that consist of a 2D ontology and a positive temporal instance query. Our target languages for FO-rewritings are two-sorted FO(<) - first-order logic with sorts for time instants ordered by the built-in precedence relation < and for the domain of individuals - its extension FOE with the standard congruence predicates t \equiv 0 mod n, for any fixed n > 1, and FO(RPR) that admits relational primitive recursion. In terms of circuit complexity, FOE- and FO(RPR)-rewritability guarantee answering OMQs in uniform AC0 and NC1, respectively. We proceed in three steps. First, we define a hierarchy of 2D DL-Lite/LTL ontology languages and investigate the FO-rewritability of OMQs with atomic queries by constructing projections onto 1D LTL OMQs and employing recent results on the FO-rewritability of propositional LTL OMQs. As the projections involve deciding consistency of ontologies and data, we also consider the consistency problem for our languages. While the undecidability of consistency for 2D ontology languages with expressive Boolean role inclusions might be expected, we also show that, rather surprisingly, the restriction to Krom and Horn role inclusions leads to decidability (and ExpSpace-completeness), even if one admits full Booleans on concepts. As a final step, we lift some of the rewritability results for atomic OMQs to OMQs with expressive positive temporal instance queries. The lifting results are based on an in-depth study of the canonical models and only concern Horn ontologies.
Modified Patankar-Runge-Kutta (MPRK) methods preserve the positivity as well as conservativity of a production-destruction system (PDS) of ordinary differential equations for all time step sizes. As a result, higher order MPRK schemes do not belong to the class of general linear methods, i.e. the iterates are generated by a nonlinear map $\mathbf g$ even when the PDS is linear. Moreover, due to the conservativity of the method, the map $\mathbf g$ possesses non-hyperbolic fixed points. Recently, a new theorem for the investigation of stability properties of non-hyperbolic fixed points of a nonlinear iteration map was developed. We apply this theorem to understand the stability properties of a family of second order MPRK methods when applied to a nonlinear PDS of ordinary differential equations. It is shown that the fixed points are stable for all time step sizes and members of the MPRK family. Finally, experiments are presented to numerically support the theoretical claims.
Derivatives are a key nonparametric functional in wide-ranging applications where the rate of change of an unknown function is of interest. In the Bayesian paradigm, Gaussian processes (GPs) are routinely used as a flexible prior for unknown functions, and are arguably one of the most popular tools in many areas. However, little is known about the optimal modelling strategy and theoretical properties when using GPs for derivatives. In this article, we study a plug-in strategy by differentiating the posterior distribution with GP priors for derivatives of any order. This practically appealing plug-in GP method has been previously perceived as suboptimal and degraded, but this is not necessarily the case. We provide posterior contraction rates for plug-in GPs and establish that they remarkably adapt to derivative orders. We show that the posterior measure of the regression function and its derivatives, with the same choice of hyperparameter that does not depend on the order of derivatives, converges at the minimax optimal rate up to a logarithmic factor for functions in certain classes. This to the best of our knowledge provides the first positive result for plug-in GPs in the context of inferring derivative functionals, and leads to a practically simple nonparametric Bayesian method with guided hyperparameter tuning for simultaneously estimating the regression function and its derivatives. Simulations show competitive finite sample performance of the plug-in GP method. A climate change application on analyzing the global sea-level rise is discussed.
Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.
We consider inverse problems in Hilbert spaces under correlated Gaussian noise and use a Bayesian approach to find their regularised solution. We focus on mildly ill-posed inverse problems with the noise being generalised derivative of fractional Brownian motion, using a novel wavelet - based approach we call vaguelette-vaguelette. It allows us to apply sequence space methods without assuming that all operators are simultaneously diagonalisable. The results are proved for more general bases and covariance operators. Our primary aim is to study the posterior contraction rate in such inverse problems over Sobolev classes of true functions, comparing it to the derived minimax rate. Secondly, we study the effect of plugging in a consistent estimator of variances in sequence space on the posterior contraction rate, for instance where there are repeated observations. This result is also applied to the problem where the forward operator is observed with error. Thirdly, we show that an adaptive empirical Bayes posterior distribution contracts at the optimal rate, in the minimax sense, under a condition on prior smoothness, with a plugged in maximum marginal likelihood estimator of the prior scale. These theoretical results are illustrated on simulated data.
The mean field variational inference (MFVI) formulation restricts the general Bayesian inference problem to the subspace of product measures. We present a framework to analyze MFVI algorithms, which is inspired by a similar development for general variational Bayesian formulations. Our approach enables the MFVI problem to be represented in three different manners: a gradient flow on Wasserstein space, a system of Fokker-Planck-like equations and a diffusion process. Rigorous guarantees are established to show that a time-discretized implementation of the coordinate ascent variational inference algorithm in the product Wasserstein space of measures yields a gradient flow in the limit. A similar result is obtained for their associated densities, with the limit being given by a quasi-linear partial differential equation. A popular class of practical algorithms falls in this framework, which provides tools to establish convergence. We hope this framework could be used to guarantee convergence of algorithms in a variety of approaches, old and new, to solve variational inference problems.
Two-stage randomized experiments are becoming an increasingly popular experimental design for causal inference when the outcome of one unit may be affected by the treatment assignments of other units in the same cluster. In this paper, we provide a methodological framework for general tools of statistical inference and power analysis for two-stage randomized experiments. Under the randomization-based framework, we consider the estimation of a new direct effect of interest as well as the average direct and spillover effects studied in the literature. We provide unbiased estimators of these causal quantities and their conservative variance estimators in a general setting. Using these results, we then develop hypothesis testing procedures and derive sample size formulas. We theoretically compare the two-stage randomized design with the completely randomized and cluster randomized designs, which represent two limiting designs. Finally, we conduct simulation studies to evaluate the empirical performance of our sample size formulas. For empirical illustration, the proposed methodology is applied to the randomized evaluation of the Indian national health insurance program. An open-source software package is available for implementing the proposed methodology.
In this article, we present a new Nested Cross Approximation (NCA) for $\mathcal{H}^{2}$ matrices. It differs from the existing NCAs in the technique of choosing pivots, a key part of the approximation. Our technique of choosing pivots is purely algebraic and involves only a single tree traversal. We demonstrate its applicability by developing a fast $\mathcal{H}^{2}$ matrix-vector product, that uses the new NCA for the appropriate low-rank approximations. We perform various numerical experiments to illustrate the timing profiles and the accuracy of our method. We also provide a comparison of the proposed NCA with the existing NCAs. A key observation is that the proposed NCA performs better than the existing NCAs. In the spirit of reproducible computational science, the implementation of the algorithm developed in this article is made available at //github.com/vaishna77/nNCA2D.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.