In Racket, the LLVM IR, Rust, and other modern languages, programmers and static analyses can hint, with special annotations, that certain parts of a program are unreachable. Same as other assumptions about undefined behavior; the compiler assumes these hints are correct and transforms the program aggressively. While compile-time transformations due to undefined behavior often perplex compiler writers and developers, we show that the essence of transformations due to unreachable code can be distilled in a surprisingly small set of simple formal rules. Specifically, following the well-established tradition of understanding linguistic phenomena through calculi, we introduce the first calculus for unreachable. Its term-rewriting rules that take advantage of unreachable fall into two groups. The first group allows the compiler to delete any code downstream of unreachable, and any effect-free code upstream of unreachable. The second group consists of rules that eliminate conditional expressions when one of their branches is unreachable. We show the correctness of the rules with a novel logical relation, and we examine how they correspond to transformations due to unreachable in Racket and LLVM.
Recently, Bessa et al. (PODS 2023) showed that sketches based on coordinated weighted sampling theoretically and empirically outperform popular linear sketching methods like Johnson-Lindentrauss projection and CountSketch for the ubiquitous problem of inner product estimation. We further develop this finding by introducing and analyzing two alternative sampling-based methods. In contrast to the computationally expensive algorithm in Bessa et al., our methods run in linear time (to compute the sketch) and perform better in practice, significantly beating linear sketching on a variety of tasks. For example, they provide state-of-the-art results for estimating the correlation between columns in unjoined tables, a problem that we show how to reduce to inner product estimation in a black-box way. While based on known sampling techniques (threshold and priority sampling) we introduce significant new theoretical analysis to prove approximation guarantees for our methods.
Page Stream Segmentation (PSS) is an essential prerequisite for automated document processing at scale. However, research progress has been limited by the absence of realistic public benchmarks. This paper works towards addressing this gap by introducing TABME++, an enhanced benchmark featuring commercial Optical Character Recognition (OCR) annotations. We evaluate the performance of large language models (LLMs) on PSS, focusing on decoder-based models fine-tuned with parameter-efficient methods. Our results show that decoder-based LLMs outperform smaller multimodal encoders. Through a review of existing PSS research and datasets, we identify key challenges and advancements in the field. Our findings highlight the key importance of robust OCR, providing valuable insights for the development of more effective document processing systems.
In the era of large language models (LLMs), the task of ``System I''~-~the fast, unconscious, and intuitive tasks, e.g., sentiment analysis, text classification, etc., have been argued to be successfully solved. However, sarcasm, as a subtle linguistic phenomenon, often employs rhetorical devices like hyperbole and figuration to convey true sentiments and intentions, involving a higher level of abstraction than sentiment analysis. There is growing concern that the argument about LLMs' success may not be fully tenable when considering sarcasm understanding. To address this question, we select eleven SOTA LLMs and eight SOTA pre-trained language models (PLMs) and present comprehensive evaluations on six widely used benchmark datasets through different prompting approaches, i.e., zero-shot input/output (IO) prompting, few-shot IO prompting, chain of thought (CoT) prompting. Our results highlight three key findings: (1) current LLMs underperform supervised PLMs based sarcasm detection baselines across six sarcasm benchmarks. This suggests that significant efforts are still required to improve LLMs' understanding of human sarcasm. (2) GPT-4 consistently and significantly outperforms other LLMs across various prompting methods, with an average improvement of 14.0\%$\uparrow$. Claude 3 and ChatGPT demonstrate the next best performance after GPT-4. (3) Few-shot IO prompting method outperforms the other two methods: zero-shot IO and few-shot CoT. The reason is that sarcasm detection, being a holistic, intuitive, and non-rational cognitive process, is argued not to adhere to step-by-step logical reasoning, making CoT less effective in understanding sarcasm compared to its effectiveness in mathematical reasoning tasks.
We robustify PCTL and PCTL*, the most important specification languages for probabilistic systems, and show that robustness does not increase the complexity of their model-checking problems.
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.
Since the 1950s, machine translation (MT) has become one of the important tasks of AI and development, and has experienced several different periods and stages of development, including rule-based methods, statistical methods, and recently proposed neural network-based learning methods. Accompanying these staged leaps is the evaluation research and development of MT, especially the important role of evaluation methods in statistical translation and neural translation research. The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers on the problems existing in machine translation itself, how to improve and how to optimise. In some practical application fields, such as in the absence of reference translations, the quality estimation of machine translation plays an important role as an indicator to reveal the credibility of automatically translated target languages. This report mainly includes the following contents: a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress, including human evaluation, automatic evaluation, and evaluation of evaluation methods (meta-evaluation). Manual evaluation and automatic evaluation include reference-translation based and reference-translation independent participation; automatic evaluation methods include traditional n-gram string matching, models applying syntax and semantics, and deep learning models; evaluation of evaluation methods includes estimating the credibility of human evaluations, the reliability of the automatic evaluation, the reliability of the test set, etc. Advances in cutting-edge evaluation methods include task-based evaluation, using pre-trained language models based on big data, and lightweight optimisation models using distillation techniques.
Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.
Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
Text Classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.