Graph is an important data representation which occurs naturally in the real world applications \cite{goyal2018graph}. Therefore, analyzing graphs provides users with better insights in different areas such as anomaly detection \cite{ma2021comprehensive}, decision making \cite{fan2023graph}, clustering \cite{tsitsulin2023graph}, classification \cite{wang2021mixup} and etc. However, most of these methods require high levels of computational time and space. We can use other ways like embedding to reduce these costs. Knowledge graph (KG) embedding is a technique that aims to achieve the vector representation of a KG. It represents entities and relations of a KG in a low-dimensional space while maintaining the semantic meanings of them. There are different methods for embedding graphs including random walk-based methods such as node2vec, metapath2vec and regpattern2vec. However, most of these methods bias the walks based on a rigid pattern usually hard-coded in the algorithm. In this work, we introduce \textit{subgraph2vec} for embedding KGs where walks are run inside a user-defined subgraph. We use this embedding for link prediction and prove our method has better performance in most cases in comparison with the previous ones.
Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is often lacking. The use of animals in captivity instead offers a viable alternative pathway. However, most prior works follow a traditional, statistics-based approach to analysing interactions. In the present work, we go beyond this standard framework by attempting to predict the underlying context in interactions between captive \emph{Rousettus Aegyptiacus} using deep neural networks. We reach an unweighted average recall of over 30\% -- more than thrice the chance level -- and show error patterns that differ from our statistical analysis. This work thus represents an important step towards the automatic analysis of states in animals from sound.
Latent variable models serve as powerful tools to infer underlying dynamics from observed neural activity. However, due to the absence of ground truth data, prediction benchmarks are often employed as proxies. In this study, we reveal the limitations of the widely-used 'co-smoothing' prediction framework and propose an improved few-shot prediction approach that encourages more accurate latent dynamics. Utilizing a student-teacher setup with Hidden Markov Models, we demonstrate that the high co-smoothing model space can encompass models with arbitrary extraneous dynamics within their latent representations. To address this, we introduce a secondary metric -- a few-shot version of co-smoothing. This involves performing regression from the latent variables to held-out channels in the data using fewer trials. Our results indicate that among models with near-optimal co-smoothing, those with extraneous dynamics underperform in the few-shot co-smoothing compared to 'minimal' models devoid of such dynamics. We also provide analytical insights into the origin of this phenomenon. We further validate our findings on real neural data using two state-of-the-art methods: LFADS and STNDT. In the absence of ground truth, we suggest a proxy measure to quantify extraneous dynamics. By cross-decoding the latent variables of all model pairs with high co-smoothing, we identify models with minimal extraneous dynamics. We find a correlation between few-shot co-smoothing performance and this new measure. In summary, we present a novel prediction metric designed to yield latent variables that more accurately reflect the ground truth, offering a significant improvement for latent dynamics inference.
Quantum error correction is crucial for scalable quantum information processing applications. Traditional discrete-variable quantum codes that use multiple two-level systems to encode logical information can be hardware-intensive. An alternative approach is provided by bosonic codes, which use the infinite-dimensional Hilbert space of harmonic oscillators to encode quantum information. Two promising features of bosonic codes are that syndrome measurements are natively analog and that they can be concatenated with discrete-variable codes. In this work, we propose novel decoding methods that explicitly exploit the analog syndrome information obtained from the bosonic qubit readout in a concatenated architecture. Our methods are versatile and can be generally applied to any bosonic code concatenated with a quantum low-density parity-check (QLDPC) code. Furthermore, we introduce the concept of quasi-single-shot protocols as a novel approach that significantly reduces the number of repeated syndrome measurements required when decoding under phenomenological noise. To realize the protocol, we present a first implementation of time-domain decoding with the overlapping window method for general QLDPC codes, and a novel analog single-shot decoding method. Our results lay the foundation for general decoding algorithms using analog information and demonstrate promising results in the direction of fault-tolerant quantum computation with concatenated bosonic-QLDPC codes.
Reliably measuring the collinearity of bivariate data is crucial in statistics, particularly for time-series analysis or ongoing studies in which incoming observations can significantly impact current collinearity estimates. Leveraging identities from Welford's online algorithm for sample variance, we develop a rigorous theoretical framework for analyzing the maximal change to the Pearson correlation coefficient and its p-value that can be induced by additional data. Further, we show that the resulting optimization problems yield elegant closed-form solutions that can be accurately computed by linear- and constant-time algorithms. Our work not only creates new theoretical avenues for robust correlation measures, but also has broad practical implications for disciplines that span econometrics, operations research, clinical trials, climatology, differential privacy, and bioinformatics. Software implementations of our algorithms in Cython-wrapped C are made available at //github.com/marc-harary/sensitivity for reproducibility, practical deployment, and future theoretical development.
Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute the source string efficiently, having a tradeoff between scalability and avoiding information loss. The scalable de Bruijn graphs come at the price of losing crucial overlap information. The complete overlap information is stored in overlap graphs using quadratic space. Hierarchical overlap graphs [IPL20] (HOG) overcome these limitations, avoiding information loss despite using linear space. After a series of suboptimal improvements, Khan and Park et al. simultaneously presented two optimal algorithms [CPM2021], where only the former was seemingly practical. We empirically analyze all the practical algorithms for computing HOG on real and random datasets, where the optimal algorithm [CPM2021] outperforms the previous algorithms as expected, though at the expense of extra memory. However, it uses non-intuitive approach and non-trivial data structures. We present arguably the most intuitive algorithm, using only elementary arrays, which is also optimal. Our algorithm empirically proves even better for both time and memory over all the algorithms, highlighting its significance in both theory and practice. We further explore the applications of hierarchical overlap graphs to solve various forms of suffix-prefix queries on a set of strings. Loukides et al. [CPM2023] recently presented state-of-the-art algorithms for these queries. However, these algorithms require complex black-box data structures and are seemingly impractical. Our algorithms, despite failing to match the state-of-the-art algorithms theoretically, answer different queries ranging from 0.01-100 milliseconds for a data set having around a billion characters.
In various technical applications, assessing the impact of non-Gaussian processes on responses of dynamic systems is crucial. While simulating time-domain realizations offers an efficient solution for linear dynamic systems, this method proves time-consuming for finite element (FE) models, which may contain thousands to millions of degrees-of-freedom (DOF). Given the central role of kurtosis in describing non-Gaussianity - owing to its concise, parametric-free and easily interpretable nature - this paper introduces a highly efficient approach for deriving response kurtosis and other related statistical descriptions. This approach makes use of the modal solution of dynamic systems, which allows to reduce DOFs and responses analysis to a minimum number in the modal domain. This computational advantage enables fast assessments of non-Gaussian effects for entire FE models. Our approach is illustrated using a simple FE model that has found regular use in the field of random vibration fatigue.
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function (CDF). Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits desirable invariance properties of the CDF, and an efficient implementation with copulas allows it to scale to many objectives. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions while being computationally efficient for many objectives.
Computationally efficient surrogates for parametrized physical models play a crucial role in science and engineering. Operator learning provides data-driven surrogates that map between function spaces. However, instead of full-field measurements, often the available data are only finite-dimensional parametrizations of model inputs or finite observables of model outputs. Building on Fourier Neural Operators, this paper introduces the Fourier Neural Mappings (FNMs) framework that is able to accommodate such finite-dimensional vector inputs or outputs. The paper develops universal approximation theorems for the method. Moreover, in many applications the underlying parameter-to-observable (PtO) map is defined implicitly through an infinite-dimensional operator, such as the solution operator of a partial differential equation. A natural question is whether it is more data-efficient to learn the PtO map end-to-end or first learn the solution operator and subsequently compute the observable from the full-field solution. A theoretical analysis of Bayesian nonparametric regression of linear functionals, which is of independent interest, suggests that the end-to-end approach can actually have worse sample complexity. Extending beyond the theory, numerical results for the FNM approximation of three nonlinear PtO maps demonstrate the benefits of the operator learning perspective that this paper adopts.
We consider the application of the generalized Convolution Quadrature (gCQ) to approximate the solution of an important class of sectorial problems. The gCQ is a generalization of Lubich's Convolution Quadrature (CQ) that allows for variable steps. The available stability and convergence theory for the gCQ requires non realistic regularity assumptions on the data, which do not hold in many applications of interest, such as the approximation of subdiffusion equations. It is well known that for non smooth enough data the original CQ, with uniform steps, presents an order reduction close to the singularity. We generalize the analysis of the gCQ to data satisfying realistic regularity assumptions and provide sufficient conditions for stability and convergence on arbitrary sequences of time points. We consider the particular case of graded meshes and show how to choose them optimally, according to the behaviour of the data. An important advantage of the gCQ method is that it allows for a fast and memory reduced implementation. We describe how the fast and oblivious gCQ can be implemented and illustrate our theoretical results with several numerical experiments.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.