Accurate transcription of Bengali text to the International Phonetic Alphabet (IPA) is a challenging task due to the complex phonology of the language and context-dependent sound changes. This challenge is even more for regional Bengali dialects due to unavailability of standardized spelling conventions for these dialects, presence of local and foreign words popular in those regions and phonological diversity across different regions. This paper presents an approach to this sequence-to-sequence problem by introducing the District Guided Tokens (DGT) technique on a new dataset spanning six districts of Bangladesh. The key idea is to provide the model with explicit information about the regional dialect or "district" of the input text before generating the IPA transcription. This is achieved by prepending a district token to the input sequence, effectively guiding the model to understand the unique phonetic patterns associated with each district. The DGT technique is applied to fine-tune several transformer-based models, on this new dataset. Experimental results demonstrate the effectiveness of DGT, with the ByT5 model achieving superior performance over word-based models like mT5, BanglaT5, and umT5. This is attributed to ByT5's ability to handle a high percentage of out-of-vocabulary words in the test set. The proposed approach highlights the importance of incorporating regional dialect information into ubiquitous natural language processing systems for languages with diverse phonological variations. The following work was a result of the "Bhashamul" challenge, which is dedicated to solving the problem of Bengali text with regional dialects to IPA transcription //www.kaggle.com/competitions/regipa/. The training and inference notebooks are available through the competition link.
The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing with a focus on semantic parsing and text generation. Currently, we witness an excellent performance of neural parsers and generators on the PMB. This might suggest that such semantic processing tasks have by and large been solved. We argue that this is not the case and that performance scores from the past on the PMB are inflated by non-optimal data splits and test sets that are too easy. In response, we introduce several changes. First, instead of the prior random split, we propose a more systematic splitting approach to improve the reliability of the standard test data. Second, except for the standard test set, we also propose two challenge sets: one with longer texts including discourse structure, and one that addresses compositional generalization. We evaluate five neural models for semantic parsing and meaning-to-text generation. Our results show that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges.
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
This paper addresses a critical flaw in MediaPipe Holistic's hand Region of Interest (ROI) prediction, which struggles with non-ideal hand orientations, affecting sign language recognition accuracy. We propose a data-driven approach to enhance ROI estimation, leveraging an enriched feature set including additional hand keypoints and the z-dimension. Our results demonstrate better estimates, with higher Intersection-over-Union compared to the current method. Our code and optimizations are available at //github.com/sign-language-processing/mediapipe-hand-crop-fix.
Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a cutting-edge concept for the sixth-generation (6G) wireless networks. In this paper, we propose a novel system that incorporates STAR-RIS with simultaneous wireless information and power transfer (SWIPT) using rate splitting multiple access (RSMA). The proposed system facilitates communication from a multi-antenna base station (BS) to single-antenna users in a downlink transmission. The BS concurrently sends energy and information signals to multiple energy harvesting receivers (EHRs) and information data receivers (IDRs) with the support of a deployed STAR-RIS. Furthermore, an optimization is introduced to strike a balance between users' sum rate and the total harvested energy. To achieve this, an optimization problem is formulated to optimize the energy/information beamforming vectors at the BS, the phase shifts at the STAR-RIS, and the common message rate. Subsequently, we employ a meta deep deterministic policy gradient (Meta-DDPG) approach to solve the complex problem. Simulation results validate that the proposed algorithm significantly enhances both data rate and harvested energy in comparison to conventional DDPG.
With the rapid surge in the prevalence of Large Language Models (LLMs), individuals are increasingly turning to conversational AI for initial insights across various domains, including health-related inquiries such as disease diagnosis. Many users seek potential causes on platforms like ChatGPT or Bard before consulting a medical professional for their ailment. These platforms offer valuable benefits by streamlining the diagnosis process, alleviating the significant workload of healthcare practitioners, and saving users both time and money by avoiding unnecessary doctor visits. However, Despite the convenience of such platforms, sharing personal medical data online poses risks, including the presence of malicious platforms or potential eavesdropping by attackers. To address privacy concerns, we propose a novel framework combining FHE and Deep Learning for a secure and private diagnosis system. Operating on a question-and-answer-based model akin to an interaction with a medical practitioner, this end-to-end secure system employs Fully Homomorphic Encryption (FHE) to handle encrypted input data. Given FHE's computational constraints, we adapt deep neural networks and activation functions to the encryted domain. Further, we also propose a faster algorithm to compute summation of ciphertext elements. Through rigorous experiments, we demonstrate the efficacy of our approach. The proposed framework achieves strict security and privacy with minimal loss in performance.
Context: Bug bisection is a common technique used to identify a revision that introduces a bug or indirectly fixes a bug, and often involves executing multiple revisions of a project to determine whether the bug is present within the revision. However, many legacy revisions often cannot be successfully compiled due to changes in the programming language or tools used in the compilation process, adding complexity and preventing automation in the bisection process. Objective: In this paper, we introduce an approach to repair test cases of Java projects by performing dependency minimization. Our approach aims to remove classes and methods that are not required for the execution of one or more test cases. Unlike existing state-of-the-art techniques, our approach performs minimization at source-level, which allows compile-time errors to be fixed. Method: A standalone Java tool implementing our technique was developed, and we evaluated our technique using subjects from Defects4J retargeted against Java 8 and 17. Results: Our evaluation showed that a majority of subjects can be repaired solely by performing minimization, including replicating the test results of the original version. Furthermore, our technique is also shown to achieve accurate minimized results, while only adding a small overhead to the bisection process. Conclusion: Our proposed technique is shown to be effective for repairing build failures with minimal overhead, making it suitable for use in automated bug bisection. Our tool can also be adapted for use cases such as bug corpus creation and refactoring.
We present a translation from linear temporal logic with past to deterministic Rabin automata. The translation is direct in the sense that it does not rely on intermediate non-deterministic automata, and asymptotically optimal, resulting in Rabin automata of doubly exponential size. It is based on two main notions. One is that it is possible to encode the history contained in the prefix of a word, as relevant for the formula under consideration, by performing simple rewrites of the formula itself. As a consequence, a formula involving past operators can (through such rewrites, which involve alternating between weak and strong versions of past operators in the formula's syntax tree) be correctly evaluated at an arbitrary point in the future without requiring backtracking through the word. The other is that this allows us to generalize to linear temporal logic with past the result that the language of a pure-future formula can be decomposed into a Boolean combination of simpler languages, for which deterministic automata with simple acceptance conditions are easily constructed.
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.