Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of cosine-similarity-based edge reconstruction attacks (COSERA), providing theoretical and empirical evidence that such attacks can perfectly reconstruct sparse Erdos Renyi graphs with independent random features as graph size increases. Conversely, we establish that sparsity is a critical factor for COSERA's effectiveness, as demonstrated through analysis and experiments on stochastic block models. Finally, we explore the resilience of (provably) private graph representations produced via noisy aggregation (NAG) mechanism against COSERA. We empirically delineate instances wherein COSERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility.
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.
The direct parametrisation method for invariant manifold is a model-order reduction technique that can be applied to nonlinear systems described by PDEs and discretised e.g. with a finite element procedure in order to derive efficient reduced-order models (ROMs). In nonlinear vibrations, it has already been applied to autonomous and non-autonomous problems to propose ROMs that can compute backbone and frequency-response curves of structures with geometric nonlinearity. While previous developments used a first-order expansion to cope with the non-autonomous term, this assumption is here relaxed by proposing a different treatment. The key idea is to enlarge the dimension of the parametrising coordinates with additional entries related to the forcing. A new algorithm is derived with this starting assumption and, as a key consequence, the resonance relationships appearing through the homological equations involve multiple occurrences of the forcing frequency, showing that with this new development, ROMs for systems exhibiting a superharmonic resonance, can be derived. The method is implemented and validated on academic test cases involving beams and arches. It is numerically demonstrated that the method generates efficient ROMs for problems involving 3:1 and 2:1 superharmonic resonances, as well as converged results for systems where the first-order truncation on the non-autonomous term showed a clear limitation.
Adaptiveness is a key principle in information processing including statistics and machine learning. We investigate the usefulness of adaptive methods in the framework of asymptotic binary hypothesis testing, when each hypothesis represents asymptotically many independent instances of a quantum channel, and the tests are based on using the unknown channel and observing outputs. Unlike the familiar setting of quantum states as hypotheses, there is a fundamental distinction between adaptive and non-adaptive strategies with respect to the channel uses, and we introduce a number of further variants of the discrimination tasks by imposing different restrictions on the test strategies. The following results are obtained: (1) We prove that for classical-quantum channels, adaptive and non-adaptive strategies lead to the same error exponents both in the symmetric (Chernoff) and asymmetric (Hoeffding, Stein) settings. (2) The first separation between adaptive and non-adaptive symmetric hypothesis testing exponents for quantum channels, which we derive from a general lower bound on the error probability for non-adaptive strategies; the concrete example we analyze is a pair of entanglement-breaking channels. (3)We prove, in some sense generalizing the previous statement, that for general channels adaptive strategies restricted to classical feed-forward and product state channel inputs are not superior in the asymptotic limit to non-adaptive product state strategies. (4) As an application of our findings, we address the discrimination power of an arbitrary quantum channel and show that adaptive strategies with classical feedback and no quantum memory at the input do not increase the discrimination power of the channel beyond non-adaptive tensor product input strategies.
Evaluating the expected information gain (EIG) is a critical task in many areas of computational science and statistics, necessitating the approximation of nested integrals. Available techniques for this problem based on quasi-Monte Carlo (QMC) methods have focused on enhancing the efficiency of either the inner or outer integral approximation. In this work, we introduce a novel approach that extends the scope of these efforts to address inner and outer expectations simultaneously. Leveraging the principles of Owen's scrambling of digital nets, we develop a randomized QMC (rQMC) method that improves the convergence behavior of the approximation of nested integrals. We also indicate how to combine this methodology with importance sampling to address a measure concentration arising in the inner integral. Our method capitalizes on the unique structure of nested expectations to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we handle integrands exhibiting infinite variation in the Hardy--Krause sense, paving the way for theoretically sound error estimates. As the main contribution of this work, we derive asymptotic error bounds for the bias and variance of our estimator, along with regularity conditions under which these error bounds can be attained. In addition, we provide nearly optimal sample sizes for the rQMC approximations, which are helpful for the actual numerical implementations. Moreover, we verify the quality of our estimator through numerical experiments in the context of EIG estimation. Specifically, we compare the computational efficiency of our rQMC method against standard nested MC integration across two case studies: one in thermo-mechanics and the other in pharmacokinetics. These examples highlight our approach's computational savings and enhanced applicability.
In recent years, power analysis has become widely used in applied sciences, with the increasing importance of the replicability issue. When distribution-free methods, such as Partial Least Squares (PLS)-based approaches, are considered, formulating power analysis turns out to be challenging. In this study, we introduce the methodological framework of a new procedure for performing power analysis when PLS-based methods are used. Data are simulated by the Monte Carlo method, assuming the null hypothesis of no effect is false and exploiting the latent structure estimated by PLS in the pilot data. In this way, the complex correlation data structure is explicitly considered in power analysis and sample size estimation. The paper offers insights into selecting statistical tests for the power analysis procedure, comparing accuracy-based tests and those based on continuous parameters estimated by PLS. Simulated and real datasets are investigated to show how the method works in practice.
In this thesis, we study problems at the interface of analysis and discrete mathematics. We discuss analogues of well known Hardy-type inequalities and Rearrangement inequalities on the lattice graphs $\mathbb{Z}^d$, with a particular focus on behaviour of sharp constants and optimizers.In the first half of the thesis, we analyse Hardy inequalities on $\mathbb{Z}^d$, first for $d=1$ and then for $d \geq 3$. We prove a sharp weighted Hardy inequality on integers with power weights of the form $n^\alpha$. This is done via two different methods, namely super-solution and Fourier method. We also use Fourier method to prove a weighted Hardy type inequality for higher order operators. After discussing the one dimensional case, we study the Hardy inequality in higher dimensions ($d \geq 3$). In particular, we compute the asymptotic behaviour of the sharp constant in the discrete Hardy inequality, as $d \rightarrow \infty$. This is done by converting the inequality into a continuous Hardy-type inequality on a torus for functions having zero average. These continuous inequalities are new and interesting in themselves. In the second half, we focus our attention on analogues of Rearrangement inequalities on lattice graphs. We begin by analysing the situation in dimension one. We define various notions of rearrangements and prove the corresponding Polya-Szeg\H{o} inequality. These inequalities are also applied to prove some weighted Hardy inequalities on integers. Finally, we study Rearrangement inequalities (Polya-Szeg\H{o}) on general graphs, with a particular focus on lattice graphs $\mathbb{Z}^d$, for $d \geq 2$. We develop a framework to study these inequalities, using which we derive concrete results in dimension two. In particular, these results develop connections between Polya-Szeg\H{o} inequality and various isoperimetric inequalities on graphs.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.