This paper presents a novel FPGA-based neuromorphic cochlea, leveraging the general-purpose spike-coding algorithm, Spiketrum. The focus of this study is on the development and characterization of this cochlea model, which excels in transforming audio vibrations into biologically realistic auditory spike trains. These spike trains are designed to withstand neural fluctuations and spike losses while accurately encapsulating the spatial and precise temporal characteristics of audio, along with the intensity of incoming vibrations. Noteworthy features include the ability to generate real-time spike trains with minimal information loss and the capacity to reconstruct original signals. This fine-tuning capability allows users to optimize spike rates, achieving an optimal balance between output quality and power consumption. Furthermore, the integration of a feedback system into Spiketrum enables selective amplification of specific features while attenuating others, facilitating adaptive power consumption based on application requirements. The hardware implementation supports both spike-based and non-spike-based processors, making it versatile for various computing systems. The cochlea's ability to encode diverse sensory information, extending beyond sound waveforms, positions it as a promising sensory input for current and future spike-based intelligent computing systems, offering compact and real-time spike train generation.
This paper explores the impact of different back-translation approaches on machine translation for Ladin, specifically the Val Badia variant. Given the limited amount of parallel data available for this language (only 18k Ladin-Italian sentence pairs), we investigate the performance of a multilingual neural machine translation model fine-tuned for Ladin-Italian. In addition to the available authentic data, we synthesise further translations by using three different models: a fine-tuned neural model, a rule-based system developed specifically for this language pair, and a large language model. Our experiments show that all approaches achieve comparable translation quality in this low-resource scenario, yet round-trip translations highlight differences in model performance.
This paper presents a character-based approach for enhancing writer retrieval performance in the context of Greek papyri. Our contribution lies in introducing character-level annotations for frequently used characters, in our case the trigram kai and four additional letters (epsilon, kappa, mu, omega), in Greek texts. We use a state-of-the-art writer retrieval approach based on NetVLAD and compare a character-level-based feature aggregation method against the current default baseline of using small patches located at SIFT keypoint locations for building the page descriptors. We demonstrate that by using only about 15 characters per page, we are able to boost the performance up to 4% mAP (a relative improvement of 11%) on the GRK-120 dataset. Additionally, our qualitative analysis offers insights into the similarity scores of SIFT patches and specific characters. We publish the dataset with character-level annotations, including a quality label and our binarized images for further research.
Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes, leveraging the class concepts learned from labeled samples. Current GCD methods only use a single visual modality of information, resulting in poor classification of visually similar classes. As a different modality, text information can provide complementary discriminative information, which motivates us to introduce it into the GCD task. However, the lack of class names for unlabelled data makes it impractical to utilize text information. To tackle this challenging problem, in this paper, we propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples. Specifically, our TES leverages the property that CLIP can generate aligned vision-language features, converting visual embeddings into tokens of the CLIP's text encoder to generate pseudo text embeddings. Besides, we employ a dual-branch framework, through the joint learning and instance consistency of different modality branches, visual and semantic information mutually enhance each other, promoting the interaction and fusion of visual and text knowledge. Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks, achieving new state-of-the-art. The code will be released at //github.com/enguangW/GET .
The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current evaluation metrics to accurately convey their performance. Thus, we analyze the two primary metrics, Test Suite Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM), to examine their robustness for this task and address shortcomings. We compare the performance of 9 LLM-based models using EXE, the original ESM, and our improved ESM (called ESM+). Our results show that EXE and ESM have high false positive and negative rates of 11.3% and 13.9%, while ESM+ gives those of 0.1% and 2.6% respectively, providing a significantly more stable evaluation. We release the ESM+ script as open-source for the community to contribute, while enjoying a more reliable assessment of Text-to-SQL.
We propose efficient minimum-distance decoding and list-decoding algorithms for a certain class of analog subspace codes, referred to as character-polynomial (CP) codes, recently introduced by Soleymani and the second author. In particular, a CP code without its character can be viewed as a subcode of a Reed-Solomon (RS) code, where a certain subset of the coefficients of the message polynomial is set to zeros. We then demonstrate how classical decoding methods, including list decoders, for RS codes can be leveraged for decoding CP codes. For instance, it is shown that, in almost all cases, the list decoder behaves as a unique decoder. We also present a probabilistic analysis of the improvements in list decoding of CP codes when leveraging their certain structure as subcodes of RS codes.
This paper presents a novel clustering algorithm from the SPINEX (Similarity-based Predictions with Explainable Neighbors Exploration) algorithmic family. The newly proposed clustering variant leverages the concept of similarity and higher-order interactions across multiple subspaces to group data into clusters. To showcase the merit of SPINEX, a thorough set of benchmarking experiments was carried out against 13 algorithms, namely, Affinity Propagation, Agglomerative, Birch, DBSCAN, Gaussian Mixture, HDBSCAN, K-Means, KMedoids, Mean Shift, MiniBatch K-Means, OPTICS, Spectral Clustering, and Ward Hierarchical. Then, the performance of all algorithms was examined across 51 synthetic and real datasets from various domains, dimensions, and complexities. Furthermore, we present a companion complexity analysis to compare the complexity of SPINEX to that of the aforementioned algorithms. Our results demonstrate that SPINEX can outperform commonly adopted clustering algorithms by ranking within the top-5 best performing algorithms and has moderate complexity. Finally, a demonstration of the explainability capabilities of SPINEX, along with future research needs, is presented.
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.
We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs. Therefore, multi-relational link prediction and entity matching communities can benefit from this resource. We believe this data set has the potential to facilitate the development of novel multi-modal learning approaches for knowledge graphs.We validate the utility ofMMKG in the sameAs link prediction task with an extensive set of experiments. These experiments show that the task at hand benefits from learning of multiple feature types.
Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.