In recent years a new class of symmetric-key primitives over GF(p) that are essential to Multi-party computation and Zero-knowledge proofs based protocols have emerged. A number of these constructions show, that following alternative design strategies to the classical SPN and Feistel networks, leads to more efficient cipher and hash function designs. In view of these efforts, in this work we build an \emph{algebraic framework} that allows the systematic exploration of viable and efficient design strategies for constructing symmetric-key (iterative) permutations over GF(p). We propose a generalized triangular polynomial dynamical system (GTDS), and based on the GTDS we provide a generic definition of an iterative (keyed) permutation Our GTDS-based generic definition is able to describe the three most well-known design strategies, namely SPNs, Feistel networks and Lai--Massey, and instantiations of them. Most notably, this definition allows for instantiations of novel and efficient (keyed) permutations. As example, we show that the partial SPN-based permutations and the recently proposed Griffin design, which does not explicitly follow Feistel or SPN, can be described using the generic GTDS-based definition. We show that GTDS-based permutations are able to achieve exponential degree growth - a property desirable for constructing efficient cryptographic permutations. In addition, we prove a general upper bound on the differential uniformity of the GTDS. Given that the upper bound on differential uniformity is small enough, we also prove that a GTDS-based two round block cipher is $\epsilon$-close to being pairwise independent. Finally, we provide the discrepancy analysis, a technique used to measure the (pseudo-)randomness of a sequence, for analyzing the randomness of the sequence generated by the generic permutation described by GTDS.
Estimating sample size and statistical power is an essential part of a good study design. This R package allows users to conduct power analysis based on Monte Carlo simulations in settings in which consideration of the correlations between predictors is important. It runs power analyses given a data generative model and an inference model. It can set up a data generative model that preserves dependence structures among variables given existing data (continuous, binary, or ordinal) or high-level descriptions of the associations. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples focusing on applications for environmental mixture studies when predictors tend to be moderately to highly correlated. It easily interfaces with several existing and newly developed analysis strategies for assessing associations between exposures and health outcomes. However, the package is sufficiently general to facilitate power simulations in a wide variety of settings.
We present a method for generating possible proofs of a query with respect to a given Answer Set Programming (ASP) rule set using an abductive process where the space of abducibles is automatically constructed just from the input rules alone. Given a (possibly empty) set of user provided facts, our method infers any additional facts that may be needed for the entailment of a query and then outputs these extra facts, without the user needing to explicitly specify the space of all abducibles. We also present a method to generate a set of directed edges corresponding to the justification graph for the query. Furthermore, through different forms of implicit term substitution, our method can take user provided facts into account and suitably modify the abductive solutions. Past work on abduction has been primarily based on goal directed methods. However these methods can result in solvers that are not truly declarative. Much less work has been done on realizing abduction in a bottom up solver like the Clingo ASP solver. We describe novel ASP programs which can be run directly in Clingo to yield the abductive solutions and directed edge sets without needing to modify the underlying solving engine.
The Cahn--Hilliard equations are a versatile model for describing the evolution of complex morphologies. In this paper we present a computational pipeline for the numerical solution of a ternary phase-field model for describing the nanomorphology of donor--acceptor semiconductor blends used in organic photovoltaic devices. The model consists of two coupled fourth-order partial differential equations that are discretized using a finite element approach. In order to solve the resulting large-scale linear systems efficiently, we propose a preconditioning strategy that is based on efficient approximations of the Schur-complement of a saddle point system. We show that this approach performs robustly with respect to variations in the discretization parameters. Finally, we outline that the computed morphologies can be used for the computation of charge generation, recombination, and transport in organic solar cells.
In this paper, we develop a novel class of linear energy-preserving integrating factor methods for the 2D nonlinear Schr\"odinger equation with wave operator (NLSW), combining the scalar auxiliary variable approach and the integrating factor methods. A second-order scheme is first proposed, which is rigorously proved to be energy-preserving. By using the energy methods, we analyze its optimal convergence in the $H^1$ norm without any restrictions on the grid ratio, where a novel technique and an improved induction argument are proposed to overcome the difficulty posed by the unavailability of a priori $L^\infty$ estimates of numerical solutions. Based on the integrating factor Runge-Kutta methods, we extend the proposed scheme to arbitrarily high order, which is also linear and conservative. Numerical experiments are presented to confirm the theoretical analysis and demonstrate the advantages of the proposed methods.
This thesis investigates the quality of randomly collected data by employing a framework built on information-based complexity, a field related to the numerical analysis of abstract problems. The quality or power of gathered information is measured by its radius which is the uniform error obtainable by the best possible algorithm using it. The main aim is to present progress towards understanding the power of random information for approximation and integration problems.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used $K$-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the $K$-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based $K$-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein $K$-means can achieve exact recovery given the clusters are well-separated under the $2$-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based $K$-means can achieve better classification performance over the standard centroid-based $K$-means for clustering probability distributions and images.
With its powerful capability to deal with graph data widely found in practical applications, graph neural networks (GNNs) have received significant research attention. However, as societies become increasingly concerned with data privacy, GNNs face the need to adapt to this new normal. This has led to the rapid development of federated graph neural networks (FedGNNs) research in recent years. Although promising, this interdisciplinary field is highly challenging for interested researchers to enter into. The lack of an insightful survey on this topic only exacerbates this problem. In this paper, we bridge this gap by offering a comprehensive survey of this emerging field. We propose a unique 3-tiered taxonomy of the FedGNNs literature to provide a clear view into how GNNs work in the context of Federated Learning (FL). It puts existing works into perspective by analyzing how graph data manifest themselves in FL settings, how GNN training is performed under different FL system architectures and degrees of graph data overlap across data silo, and how GNN aggregation is performed under various FL settings. Through discussions of the advantages and limitations of existing works, we envision future research directions that can help build more robust, dynamic, efficient, and interpretable FedGNNs.
Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.