Conventional methods for extreme event estimation rely on well-chosen parametric models asymptotically justified from extreme value theory (EVT). These methods, while powerful and theoretically grounded, could however encounter a difficult bias-variance tradeoff that exacerbates especially when data size is too small, deteriorating the reliability of the tail estimation. In this paper, we study a framework based on the recently surging literature of distributionally robust optimization. This approach can be viewed as a nonparametric alternative to conventional EVT, by imposing general shape belief on the tail instead of parametric assumption and using worst-case optimization as a resolution to handle the nonparametric uncertainty. We explain how this approach bypasses the bias-variance tradeoff in EVT. On the other hand, we face a conservativeness-variance tradeoff which we describe how to tackle. We also demonstrate computational tools for the involved optimization problems and compare our performance with conventional EVT across a range of numerical examples.
This paper is dedicated to achieving scalable relative state estimation using inter-robot Euclidean distance measurements. We consider equipping robots with distance sensors and focus on the optimization problem underlying relative state estimation in this setup. We reveal the commonality between this problem and the coordinates realization problem of a sensor network. Based on this insight, we propose an effective unconstrained optimization model to infer the relative states among robots. To work on this model in a distributed manner, we propose an efficient and scalable optimization algorithm with the classical block coordinate descent method as its backbone. This algorithm exactly solves each block update subproblem with a closed-form solution while ensuring convergence. Our results pave the way for distance measurements-based relative state estimation in large-scale multi-robot systems.
We present a methodology to automatically compute worst-case performance bounds for a large class of first-order decentralized optimization algorithms. These algorithms aim at minimizing the average of local functions that are distributed across a network of agents. They typically combine local computations and consensus steps. Our methodology is based on the approach of Performance Estimation Problem (PEP), which allows computing the worst-case performance and a worst-case instance of first-order optimization algorithms by solving an SDP. We propose two ways of representing consensus steps in PEPs, which allow writing and solving PEPs for decentralized optimization. The first formulation is exact but specific to a given averaging matrix. The second formulation is a relaxation but provides guarantees valid over an entire class of averaging matrices, characterized by their spectral range. This formulation often allows recovering a posteriori the worst possible averaging matrix for the given algorithm. We apply our methodology to three different decentralized methods. For each of them, we obtain numerically tight worst-case performance bounds that significantly improve on the existing ones, as well as insights about the parameters tuning and the worst communication networks.
We introduce a new algorithm and software for solving linear equations in symmetric diagonally dominant matrices with non-positive off-diagonal entries (SDDM matrices), including Laplacian matrices. We use pre-conditioned conjugate gradient (PCG) to solve the system of linear equations. Our preconditioner is a variant of the Approximate Cholesky factorization of Kyng and Sachdeva (FOCS 2016). Our factorization approach is simple: we eliminate matrix rows/columns one at a time and update the remaining matrix using sampling to approximate the outcome of complete Cholesky factorization. Unlike earlier approaches, our sampling always maintains a connectivity in the remaining non-zero structure. Our algorithm comes with a tuning parameter that upper bounds the number of samples made per original entry. We implement our algorithm in Julia, providing two versions, AC and AC2, that respectively use 1 and 2 samples per original entry. We compare their single-threaded performance to that of current state-of-the-art solvers Combinatorial Multigrid (CMG), BoomerAMG-preconditioned Krylov solvers from HyPre and PETSc, Lean Algebraic Multigrid (LAMG), and MATLAB's with Incomplete Cholesky Factorization (ICC). Our evaluation uses a broad class of problems, including all large SDDM matrices from the SuiteSparse collection and diverse programmatically generated instances. Our experiments suggest that our algorithm attains a level of robustness and reliability not seen before in SDDM solvers, while retaining good performance across all instances. Our code and data are public, and we provide a tutorial on how to replicate our tests. We hope that others will adopt this suite of tests as a benchmark, which we refer to as SDDM2023. Our solver code is available at: //github.com/danspielman/Laplacians.jl/ Our benchmarking data and tutorial are available at: //rjkyng.github.io/SDDM2023/
Data-driven state estimation (SE) is becoming increasingly important in modern power systems, as it allows for more efficient analysis of system behaviour using real-time measurement data. This paper thoroughly evaluates a PMU-only state estimator based on graph neural networks (GNNs) applied over factor graphs. To assess the sample efficiency of the GNN model, we perform multiple training experiments on various training set sizes. Additionally, to evaluate the scalability of the GNN model, we conduct experiments on power systems of various sizes. Our results show that the GNN-based state estimator exhibits high accuracy and efficient use of data. Additionally, it demonstrated scalability in terms of both memory usage and inference time, making it a promising solution for data-driven SE in modern power systems.
We introduce the Weak-form Estimation of Nonlinear Dynamics (WENDy) method for estimating model parameters for non-linear systems of ODEs. The core mathematical idea involves an efficient conversion of the strong form representation of a model to its weak form, and then solving a regression problem to perform parameter inference. The core statistical idea rests on the Errors-In-Variables framework, which necessitates the use of the iteratively reweighted least squares algorithm. Further improvements are obtained by using orthonormal test functions, created from a set of $C^{\infty}$ bump functions of varying support sizes. We demonstrate that WENDy is a highly robust and efficient method for parameter inference in differential equations. Without relying on any numerical differential equation solvers, WENDy computes accurate estimates and is robust to large (biologically relevant) levels of measurement noise. For low dimensional systems with modest amounts of data, WENDy is competitive with conventional forward solver-based nonlinear least squares methods in terms of speed and accuracy. For both higher dimensional systems and stiff systems, WENDy is typically both faster (often by orders of magnitude) and more accurate than forward solver-based approaches. We illustrate the method and its performance in some common population and neuroscience models, including logistic growth, Lotka-Volterra, FitzHugh-Nagumo, Hindmarsh-Rose, and a Protein Transduction Benchmark model. Software and code for reproducing the examples is available at (//github.com/MathBioCU/WENDy).
Federated learning is one of the most appealing alternatives to the standard centralized learning paradigm, allowing a heterogeneous set of devices to train a machine learning model without sharing their raw data. However, it requires a central server to coordinate the learning process, thus introducing potential scalability and security issues. In the literature, server-less federated learning approaches like gossip federated learning and blockchain-enabled federated learning have been proposed to mitigate these issues. In this work, we propose a complete overview of these three techniques proposing a comparison according to an integral set of performance indicators, including model accuracy, time complexity, communication overhead, convergence time, and energy consumption. An extensive simulation campaign permits to draw a quantitative analysis considering both feedforward and convolutional neural network models. Results show that gossip federated learning and standard federated solution are able to reach a similar level of accuracy, and their energy consumption is influenced by the machine learning model adopted, the software library, and the hardware used. Differently, blockchain-enabled federated learning represents a viable solution for implementing decentralized learning with a higher level of security, at the cost of an extra energy usage and data sharing. Finally, we identify open issues on the two decentralized federated learning implementations and provide insights on potential extensions and possible research directions in this new research field.
In Statistics, log-concave density estimation is a central problem within the field of nonparametric inference under shape constraints. Despite great progress in recent years on the statistical theory of the canonical estimator, namely the log-concave maximum likelihood estimator, adoption of this method has been hampered by the complexities of the non-smooth convex optimization problem that underpins its computation. We provide enhanced understanding of the structural properties of this optimization problem, which motivates the proposal of new algorithms, based on both randomized and Nesterov smoothing, combined with an appropriate integral discretization of increasing accuracy. We prove that these methods enjoy, both with high probability and in expectation, a convergence rate of order $1/T$ up to logarithmic factors on the objective function scale, where $T$ denotes the number of iterations. The benefits of our new computational framework are demonstrated on both synthetic and real data, and our implementation is available in a github repository \texttt{LogConcComp} (Log-Concave Computation).
In recent years, empirical Bayesian (EB) inference has become an attractive approach for estimation in parametric models arising in a variety of real-life problems, especially in complex and high-dimensional scientific applications. However, compared to the relative abundance of available general methods for computing point estimators in the EB framework, the construction of confidence sets and hypothesis tests with good theoretical properties remains difficult and problem specific. Motivated by the universal inference framework of Wasserman et al. (2020), we propose a general and universal method, based on holdout likelihood ratios, and utilizing the hierarchical structure of the specified Bayesian model for constructing confidence sets and hypothesis tests that are finite sample valid. We illustrate our method through a range of numerical studies and real data applications, which demonstrate that the approach is able to generate useful and meaningful inferential statements in the relevant contexts.
With apparently all research on estimation-of-distribution algorithms (EDAs) concentrated on pseudo-Boolean optimization and permutation problems, we undertake the first steps towards using EDAs for problems in which the decision variables can take more than two values, but which are not permutation problems. To this aim, we propose a natural way to extend the known univariate EDAs to such variables. Different from a naive reduction to the binary case, it avoids additional constraints. Since understanding genetic drift is crucial for an optimal parameter choice, we extend the known quantitative analysis of genetic drift to EDAs for multi-valued variables. Roughly speaking, when the variables take $r$ different values, the time for genetic drift to become significant is $r$ times shorter than in the binary case. Consequently, the update strength of the probabilistic model has to be chosen $r$ times lower now. To investigate how desired model updates take place in this framework, we undertake a mathematical runtime analysis on the $r$-valued LeadingOnes problem. We prove that with the right parameters, the multi-valued UMDA solves this problem efficiently in $O(r\log(r)^2 n^2 \log(n))$ function evaluations. Overall, our work shows that EDAs can be adjusted to multi-valued problems, and it gives advice on how to set the main parameters.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.