We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (C$^2$VAE) is introduced without relying on prior knowledge about data in the probabilistic principle and involving strong modeling assumptions on the posterior in the neural architecture. C$^2$VAE simultaneously factorizes the posterior (evidence lower bound, ELBO) with total correlation (TC)-driven decomposition for learning factorized disentangled representations and extracts the dependencies between hidden features by a neural Gaussian copula for copula coupled representations. Then, a self-supervised contrastive classifier differentiates the disentangled representations from the coupled representations, where a contrastive loss regularizes this contrastive classification together with the TC loss for eliminating entangled factors and strengthening disentangled representations. C$^2$VAE demonstrates a strong effect in enhancing disentangled representation learning. C$^2$VAE further contributes to improved optimization addressing the TC-based VAE instability and the trade-off between reconstruction and representation.
We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead. More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$? To test this, we measure linear approximation and causal intervention methods in GPT-J-6B to evaluate the degree to which individual hidden states in the network contain signal rich enough to predict future hidden states and, ultimately, token outputs. We find that, at some layers, we can approximate a model's output with more than 48% accuracy with respect to its prediction of subsequent tokens through a single hidden state. Finally we present a "Future Lens" visualization that uses these methods to create a new view of transformer states.
We provide a deterministic algorithm for computing the $5$-edge-connected components of an undirected multigraph in linear time. There were probably good indications that this computation can be performed in linear time, but no such algorithm was actually known prior to this work. Thus, our paper answers a theoretical question, and sheds light on the possibility that a solution may exist for general $k$. A key component in our algorithm is an oracle for answering connectivity queries for pairs of vertices in the presence of at most four edge-failures. Specifically, the oracle has size $O(n)$, it can be constructed in linear time, and it answers connectivity queries in the presence of at most four edge-failures in worst-case constant time, where $n$ denotes the number of vertices of the graph. We note that this is a result of independent interest. Our paper can be considered as a follow-up of recent work on computing the $4$-edge-connected components in linear time. However, in dealing with the computation of the $5$-edge-connected components, we are faced with unique challenges that do not appear when dealing with lower connectivity. The problem is that the $4$-edge cuts in $3$-edge-connected graphs are entangled in various complicated ways, that make it difficult to organize them in a compact way. Here we provide a novel analysis of those cuts, that reveals the existence of various interesting structures. These can be exploited so that we can disentangle and collect only those cuts that are essential in computing the $5$-edge-connected components. This analysis may provide a clue for a general solution for the $k$-edge-connected components, or other related graph connectivity problems.
Social intelligence manifests the capability, often referred to as the Theory of Mind (ToM), to discern others' behavioral intentions, beliefs, and other mental states. ToM is especially important in multi-agent and human-machine interaction environments because each agent needs to understand the mental states of other agents in order to better respond, interact, and collaborate. Recent research indicates that the ToM model possesses the capability to infer beliefs, intentions, and anticipate future observations and actions; nonetheless, its deployment in tackling intricate tasks remains notably limited. The challenges arise when the number of agents increases, the environment becomes more complex, and interacting with the environment and predicting the mental state of each other becomes difficult and time consuming. To overcome such limits, we take inspiration from the Theory of Collective Mind (ToCM) mechanism, predicting observations of all other agents into a unified but plural representation and discerning how our own actions affect this mental state representation. Based on this foundation, we construct an imaginative space to simulate the multi-agent interaction process, thus improving the efficiency of cooperation among multiple agents in complex decision-making environments. In various cooperative tasks with different numbers of agents, the experimental results highlight the superior cooperative efficiency and performance of our approach compared to the Multi-Agent Reinforcement Learning (MARL) baselines. We achieve consistent boost on SNN- and DNN-based decision networks, and demonstrate that ToCM's inferences about others' mental states can be transferred to new tasks for quickly and flexible adaptation.
Kernel methods are learning algorithms that enjoy solid theoretical foundations while suffering from important computational limitations. Sketching, which consists in looking for solutions among a subspace of reduced dimension, is a well studied approach to alleviate these computational burdens. However, statistically-accurate sketches, such as the Gaussian one, usually contain few null entries, such that their application to kernel methods and their non-sparse Gram matrices remains slow in practice. In this paper, we show that sparsified Gaussian (and Rademacher) sketches still produce theoretically-valid approximations while allowing for important time and space savings thanks to an efficient \emph{decomposition trick}. To support our method, we derive excess risk bounds for both single and multiple output kernel problems, with generic Lipschitz losses, hereby providing new guarantees for a wide range of applications, from robust regression to multiple quantile regression. Our theoretical results are complemented with experiments showing the empirical superiority of our approach over SOTA sketching methods.
Student attention is an indispensable input for uncovering their goals, intentions, and interests, which prove to be invaluable for a multitude of research areas, ranging from psychology to interactive systems. However, most existing methods to classify attention fail to model its complex nature. To bridge this gap, we propose AttentioNet, a novel Convolutional Neural Network-based approach that utilizes Electroencephalography (EEG) data to classify attention into five states: Selective, Sustained, Divided, Alternating, and relaxed state. We collected a dataset of 20 subjects through standard neuropsychological tasks to elicit different attentional states. The average across-student accuracy of our proposed model at this configuration is 92.3% (SD=3.04), which is well-suited for end-user applications. Our transfer learning-based approach for personalizing the model to individual subjects effectively addresses the issue of individual variability in EEG signals, resulting in improved performance and adaptability of the model for real-world applications. This represents a significant advancement in the field of EEG-based classification. Experimental results demonstrate that AttentioNet outperforms a popular EEGnet baseline (p-value < 0.05) in both subject-independent and subject-dependent settings, confirming the effectiveness of our proposed approach despite the limitations of our dataset. These results highlight the promising potential of AttentioNet for attention classification using EEG data.
Standard probabilistic sparse coding assumes a Laplace prior, a linear mapping from latents to observables, and Gaussian observable distributions. We here derive a solely entropy-based learning objective for the parameters of standard sparse coding. The novel variational objective has the following features: (A) unlike MAP approximations, it uses non-trivial posterior approximations for probabilistic inference; (B) unlike for previous non-trivial approximations, the novel objective is fully analytical; and (C) the objective allows for a novel principled form of annealing. The objective is derived by first showing that the standard ELBO objective converges to a sum of entropies, which matches similar recent results for generative models with Gaussian priors. The conditions under which the ELBO becomes equal to entropies are then shown to have analytical solutions, which leads to the fully analytical objective. Numerical experiments are used to demonstrate the feasibility of learning with such entropy-based ELBOs. We investigate different posterior approximations including Gaussians with correlated latents and deep amortized approximations. Furthermore, we numerically investigate entropy-based annealing which results in improved learning. Our main contributions are theoretical, however, and they are twofold: (1) for non-trivial posterior approximations, we provide the (to the knowledge of the authors) first analytical ELBO objective for standard probabilistic sparse coding; and (2) we provide the first demonstration on how a recently shown convergence of the ELBO to entropy sums can be used for learning.
While current NL2SQL tasks constructed using Foundation Models have achieved commendable results, their direct application to Natural Language to Graph Query Language (NL2GQL) tasks poses challenges due to the significant differences between GQL and SQL expressions, as well as the numerous types of GQL. Our extensive experiments reveal that in NL2GQL tasks, larger Foundation Models demonstrate superior cross-schema generalization abilities, while smaller Foundation Models struggle to improve their GQL generation capabilities through fine-tuning. However, after fine-tuning, smaller models exhibit better intent comprehension and higher grammatical accuracy. Diverging from rule-based and slot-filling techniques, we introduce R3-NL2GQL, which employs both smaller and larger Foundation Models as reranker, rewriter and refiner. The approach harnesses the comprehension ability of smaller models for information reranker and rewriter, and the exceptional generalization and generation capabilities of larger models to transform input natural language queries and code structure schema into any form of GQLs. Recognizing the lack of established datasets in this nascent domain, we have created a bilingual dataset derived from graph database documentation and some open-source Knowledge Graphs (KGs). We tested our approach on this dataset and the experimental results showed that delivers promising performance and robustness.Our code and dataset is available at //github.com/zhiqix/NL2GQL
Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs). To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC$^3$) that expands on the principle of self-consistency checking. Our SAC$^3$ approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC$^3$ outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.
Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted, such as enterprise settings. CFT algorithms offer high throughput and low latency, making them an attractive option for centralized operations that require fault tolerance. However, CFT consensus is vulnerable to Byzantine faults, which can be introduced by a single corrupt component. Such faults can break consensus in the system. Byzantine fault tolerant (BFT) consensus algorithms withstand Byzantine faults, but they are not as competitive with CFT algorithms in terms of performance. In this work, we explore a middle ground between BFT and CFT consensus by exploring the role of accountability in CFT protocols. That is, if a CFT protocol node breaks protocol and affects consensus safety, we aim to identify which node was the culprit. Based on Raft, one of the most popular CFT algorithms, we present Raft-Forensics, which provides accountability over Byzantine faults. We theoretically prove that if two honest components fail to reach consensus, the Raft-Forensics auditing algorithm finds the adversarial component that caused the inconsistency. In an empirical evaluation, we demonstrate that Raft-Forensics performs similarly to Raft and significantly better than state-of-the-art BFT algorithms. With 256 byte messages, Raft-Forensics achieves peak throughput 87.8% of vanilla Raft at 46% higher latency, while state-of-the-art BFT protocol Dumbo-NG only achieves 18.9% peak throughput at nearly $6\times$ higher latency.
Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.