The evaluation of noisy binary classifiers on unlabeled data is treated as a streaming task: given a data sketch of the decisions by an ensemble, estimate the true prevalence of the labels as well as each classifier's accuracy on them. Two fully algebraic evaluators are constructed to do this. Both are based on the assumption that the classifiers make independent errors. The first is based on majority voting. The second, the main contribution of the paper, is guaranteed to be correct. But how do we know the classifiers are independent on any given test? This principal/agent monitoring paradox is ameliorated by exploiting the failures of the independent evaluator to return sensible estimates. A search for nearly error independent trios is empirically carried out on the \texttt{adult}, \texttt{mushroom}, and \texttt{two-norm} datasets by using the algebraic failure modes to reject evaluation ensembles as too correlated. The searches are refined by constructing a surface in evaluation space that contains the true value point. The algebra of arbitrarily correlated classifiers permits the selection of a polynomial subset free of any correlation variables. Candidate evaluation ensembles are rejected if their data sketches produce independent estimates too far from the constructed surface. The results produced by the surviving ensembles can sometimes be as good as 1\%. But handling even small amounts of correlation remains a challenge. A Taylor expansion of the estimates produced when independence is assumed but the classifiers are, in fact, slightly correlated helps clarify how the independent evaluator has algebraic `blind spots'.
We develop the no-propagate algorithm for sampling the linear response of random dynamical systems, which are non-uniform hyperbolic deterministic systems perturbed by noise with smooth density. We first derive a Monte-Carlo type formula and then the algorithm, which is different from the ensemble (stochastic gradient) algorithms, finite-element algorithms, and fast-response algorithms; it does not involve the propagation of vectors or covectors, and only the density of the noise is differentiated, so the formula is not cursed by gradient explosion, dimensionality, or non-hyperbolicity. We demonstrate our algorithm on a tent map perturbed by noise and a chaotic neural network with 51 layers $\times$ 9 neurons. By itself, this algorithm approximates the linear response of non-hyperbolic deterministic systems, with an additional error proportional to the noise. We also discuss the potential of using this algorithm as a part of a bigger algorithm with smaller error.
An increasingly common viewpoint is that protein dynamics data sets reside in a non-linear subspace of low conformational energy. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich structure to account for a wide range of geometries that can be modelled after an energy landscape. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised to data on a Riemannian manifold. In the context of protein dynamics, a conceptual challenge comes from the lack of a suitable smooth manifold and the lack of guidelines for constructing a smooth Riemannian structure based on an energy landscape. In addition, computational feasibility in computing geodesics and related mappings poses a major challenge. This work considers these challenges. The first part of the paper develops a novel local approximation technique for computing geodesics and related mappings on Riemannian manifolds in a computationally feasible manner. The second part constructs a smooth manifold of point clouds modulo rigid body group actions and a Riemannian structure that is based on an energy landscape for protein conformations. The resulting Riemannian geometry is tested on several data analysis tasks relevant for protein dynamics data. It performs exceptionally well on coarse-grained molecular dynamics simulated data. In particular, the geodesics with given start- and end-points approximately recover corresponding molecular dynamics trajectories for proteins that undergo relatively ordered transitions with medium sized deformations. The Riemannian protein geometry also gives physically realistic summary statistics and retrieves the underlying dimension even for large-sized deformations within seconds on a laptop.
A finite element discretization is developed for the Cai-Hu model, describing the formation of biological networks. The model consists of a non linear elliptic equation for the pressure $p$ and a non linear reaction-diffusion equation for the conductivity tensor $\mathbb{C}$. The problem requires high resolution due to the presence of multiple scales, the stiffness in all its components and the non linearities. We propose a low order finite element discretization in space coupled with a semi-implicit time advancing scheme. The code is {verified} with several numerical tests performed with various choices for the parameters involved in the system. In absence of the exact solution, we apply Richardson extrapolation technique to estimate the order of the method.
Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that encode censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.
The problem of finding a solution to the linear system $Ax = b$ with certain minimization properties arises in numerous scientific and engineering areas. In the era of big data, the stochastic optimization algorithms become increasingly significant due to their scalability for problems of unprecedented size. This paper focuses on the problem of minimizing a strongly convex function subject to linear constraints. We consider the dual formulation of this problem and adopt the stochastic coordinate descent to solve it. The proposed algorithmic framework, called fast stochastic dual coordinate descent, utilizes sampling matrices sampled from user-defined distributions to extract gradient information. Moreover, it employs Polyak's heavy ball momentum acceleration with adaptive parameters learned through iterations, overcoming the limitation of the heavy ball momentum method that it requires prior knowledge of certain parameters, such as the singular values of a matrix. With these extensions, the framework is able to recover many well-known methods in the context, including the randomized sparse Kaczmarz method, the randomized regularized Kaczmarz method, the linearized Bregman iteration, and a variant of the conjugate gradient (CG) method. We prove that, with strongly admissible objective function, the proposed method converges linearly in expectation. Numerical experiments are provided to confirm our results.
The approach to analysing compositional data has been dominated by the use of logratio transformations, to ensure exact subcompositional coherence and, in some situations, exact isometry as well. A problem with this approach is that data zeros, found in most applications, have to be replaced to allow the logarithmic transformation. An alternative new approach, called the `chiPower' transformation, which allows data zeros, is to combine the standardization inherent in the chi-square distance in correspondence analysis, with the essential elements of the Box-Cox power transformation. The chiPower transformation is justified because it} defines between-sample distances that tend to logratio distances for strictly positive data as the power parameter tends to zero, and are then equivalent to transforming to logratios. For data with zeros, a value of the power can be identified that brings the chiPower transformation as close as possible to a logratio transformation, without having to substitute the zeros. Especially in the area of high-dimensional data, this alternative approach can present such a high level of coherence and isometry as to be a valid approach to the analysis of compositional data. Furthermore, in a supervised learning context, if the compositional variables serve as predictors of a response in a modelling framework, for example generalized linear models, then the power can be used as a tuning parameter in optimizing the accuracy of prediction through cross-validation. The chiPower-transformed variables have a straightforward interpretation, since they are each identified with single compositional parts, not ratios.
We present the full approximation scheme constraint decomposition (FASCD) multilevel method for solving variational inequalities (VIs). FASCD is a common extension of both the full approximation scheme (FAS) multigrid technique for nonlinear partial differential equations, due to A.~Brandt, and the constraint decomposition (CD) method introduced by X.-C.~Tai for VIs arising in optimization. We extend the CD idea by exploiting the telescoping nature of certain function space subset decompositions arising from multilevel mesh hierarchies. When a reduced-space (active set) Newton method is applied as a smoother, with work proportional to the number of unknowns on a given mesh level, FASCD V-cycles exhibit nearly mesh-independent convergence rates, and full multigrid cycles are optimal solvers. The example problems include differential operators which are symmetric linear, nonsymmetric linear, and nonlinear, in unilateral and bilateral VI problems.
A Hadamard-Hitchcock decomposition of a multidimensional array is a decomposition that expresses the latter as a Hadamard product of several tensor rank decompositions. Such decompositions can encode probability distributions that arise from statistical graphical models associated to complete bipartite graphs with one layer of observed random variables and one layer of hidden ones, usually called restricted Boltzmann machines. We establish generic identifiability of Hadamard-Hitchcock decompositions by exploiting the reshaped Kruskal criterion for tensor rank decompositions. A flexible algorithm leveraging existing decomposition algorithms for tensor rank decomposition is introduced for computing a Hadamard-Hitchcock decomposition. Numerical experiments illustrate its computational performance and numerical accuracy.
We study the problem of reconstructing the Faber--Schauder coefficients of a continuous function $f$ from discrete observations of its antiderivative $F$. Our approach starts with formulating this problem through piecewise quadratic spline interpolation. We then provide a closed-form solution and an in-depth error analysis. These results lead to some surprising observations, which also throw new light on the classical topic of quadratic spline interpolation itself: They show that the well-known instabilities of this method can be located exclusively within the final generation of estimated Faber--Schauder coefficients, which suffer from non-locality and strong dependence on the initial value and the given data. By contrast, all other Faber--Schauder coefficients depend only locally on the data, are independent of the initial value, and admit uniform error bounds. We thus conclude that a robust and well-behaved estimator for our problem can be obtained by simply dropping the final-generation coefficients from the estimated Faber--Schauder coefficients.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.