In this perspective paper, we first comprehensively review existing evaluations of Large Language Models (LLMs) using both standardized tests and ability-oriented benchmarks. We pinpoint several problems with current evaluation methods that tend to overstate the capabilities of LLMs. We then articulate what artificial general intelligence should encompass beyond the capabilities of LLMs. We propose four characteristics of generally intelligent agents: 1) they can perform unlimited tasks; 2) they can generate new tasks within a context; 3) they operate based on a value system that underpins task generation; and 4) they have a world model reflecting reality, which shapes their interaction with the world. Building on this viewpoint, we highlight the missing pieces in artificial general intelligence, that is, the unity of knowing and acting. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. Additionally, knowledge acquisition isn't solely reliant on passive input but requires repeated trials and errors. We conclude by outlining promising future research directions in the field of artificial general intelligence.
Purpose: In this paper, we present an automated method for article classification, leveraging the power of Large Language Models (LLM). The primary focus is on the field of ophthalmology, but the model is extendable to other fields. Methods: We have developed a model based on Natural Language Processing (NLP) techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we have employed zero-shot learning (ZSL) LLM models and compared against Bidirectional and Auto-Regressive Transformers (BART) and its variants, and Bidirectional Encoder Representations from Transformers (BERT), and its variant such as distilBERT, SciBERT, PubmedBERT, BioBERT. Results: The classification results demonstrate the effectiveness of LLMs in categorizing large number of ophthalmology papers without human intervention. Results: To evalute the LLMs, we compiled a dataset (RenD) of 1000 ocular disease-related articles, which were expertly annotated by a panel of six specialists into 15 distinct categories. The model achieved mean accuracy of 0.86 and mean F1 of 0.85 based on the RenD dataset. Conclusion: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval in other domains too. We performed trend analysis that enables the researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.
This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection that aims to assess the performance of machine learning models in more realistic settings. We observed that the real-world requirements for testing OOD detection methods are not satisfied by the current testing protocols. They usually encourage methods to have a strong bias towards a low level of diversity in normal data. To address this limitation, we propose new OOD test datasets (CIFAR-10-R, CIFAR-100-R, and ImageNet-30-R) that can allow researchers to benchmark OOD detection performance under realistic distribution shifts. Additionally, we introduce a Generalizability Score (GS) to measure the generalization ability of a model during OOD detection. Our experiments demonstrate that improving the performance on existing benchmark datasets does not necessarily improve the usability of OOD detection models in real-world scenarios. While leveraging deep pre-trained features has been identified as a promising avenue for OOD detection research, our experiments show that state-of-the-art pre-trained models tested on our proposed datasets suffer a significant drop in performance. To address this issue, we propose a post-processing stage for adapting pre-trained features under these distribution shifts before calculating the OOD scores, which significantly enhances the performance of state-of-the-art pre-trained models on our benchmarks.
Link prediction task is vital to automatically understanding the structure of large knowledge bases. In this paper, we present our system to solve this task at the Data Science and Advanced Analytics 2023 Competition "Efficient and Effective Link Prediction" (DSAA-2023 Competition) with a corpus containing 948,233 training and 238,265 for public testing. This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task. Drawing inspiration from recent advancements in natural language processing and understanding, we cast link prediction as an NLI task, wherein the presence of a link between two articles is treated as a premise, and the task is to determine whether this premise holds based on the information presented in the articles. We implemented our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task. Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively. Our team UIT-NLP ranked 3rd in performance on the private test set, equal to the scores of the first and second places. Our code is publicly for research purposes.
In this paper, we investigate resource allocation problem in the context of multiple virtual reality (VR) video flows sharing a certain link, considering specific deadline of each video frame and the impact of different frames on video quality. Firstly, we establish a queuing delay bound estimation model, enabling link node to proactively discard frames that will exceed the deadline. Secondly, we model the importance of different frames based on viewport feature of VR video and encoding method. Accordingly, the frames of each flow are sorted. Then we formulate a problem of minimizing long-term quality loss caused by frame dropping subject to per-flow quality guarantee and bandwidth constraints. Since the frequency of frame dropping and network fluctuation are not on the same time scale, we propose a two-timescale resource allocation scheme. On the long timescale, a queuing theory based resource allocation method is proposed to satisfy quality requirement, utilizing frame queuing delay bound to obtain minimum resource demand for each flow. On the short timescale, in order to quickly fine-tune allocation results to cope with the unstable network state, we propose a low-complexity heuristic algorithm, scheduling available resources based on the importance of frames in each flow. Extensive experimental results demonstrate that the proposed scheme can efficiently improve quality and fairness of VR video flows under various network conditions.
Driven by expressiveness commonalities of Python and our Python-based embedded logic-based language Natlog, we design high-level interaction patterns between equivalent language constructs and data types on the two sides. By directly connecting generators and backtracking, nested tuples and terms, coroutines and first-class logic engines, reflection and meta-interpretation, we enable logic-based language constructs to access the full power of the Python ecosystem. We show the effectiveness of our design via Natlog apps working as orchestrators for JAX and Pytorch pipelines and as DCG-driven GPT3 and DALL.E prompt generators. Keyphrases: embedding of logic programming in the Python ecosystem, high-level inter-paradigm data exchanges, coroutining with logic engines, logic-based neuro-symbolic computing, logic grammars as prompt-generators for Large Language Models, logic-based neural network configuration and training.
The popularity of Deep Learning (DL) makes the privacy of sensitive data more imperative than ever. As a result, various privacy-preserving techniques have been implemented to preserve user data privacy in DL. Among various privacy-preserving techniques, collaborative learning techniques, such as Split Learning (SL) have been utilized to accelerate the learning and prediction process. Initially, SL was considered a promising approach to data privacy. However, subsequent research has demonstrated that SL is susceptible to many types of attacks and, therefore, it cannot serve as a privacy-preserving technique. Meanwhile, countermeasures using a combination of SL and encryption have also been introduced to achieve privacy-preserving deep learning. In this work, we propose a hybrid approach using SL and Homomorphic Encryption (HE). The idea behind it is that the client encrypts the activation map (the output of the split layer between the client and the server) before sending it to the server. Hence, during both forward and backward propagation, the server cannot reconstruct the client's input data from the intermediate activation map. This improvement is important as it reduces privacy leakage compared to other SL-based works, where the server can gain valuable information about the client's input. In addition, on the MIT-BIH dataset, our proposed hybrid approach using SL and HE yields faster training time (about 6 times) and significantly reduced communication overhead (almost 160 times) compared to other HE-based approaches, thereby offering improved privacy protection for sensitive data in DL.
In this paper, we consider sampling an Ornstein-Uhlenbeck (OU) process through a channel for remote estimation. The goal is to minimize the mean square error (MSE) at the estimator under a sampling frequency constraint when the channel delay statistics is unknown. Sampling for MSE minimization is reformulated into an optimal stopping problem. By revisiting the threshold structure of the optimal stopping policy when the delay statistics is known, we propose an online sampling algorithm to learn the optimum threshold using stochastic approximation algorithm and the virtual queue method. We prove that with probability 1, the MSE of the proposed online algorithm converges to the minimum MSE that is achieved when the channel delay statistics is known. The cumulative MSE gap of our proposed algorithm compared with the minimum MSE up to the $(k+1)$-th sample grows with rate at most $\mathcal{O}(\ln k)$. Our proposed online algorithm can satisfy the sampling frequency constraint theoretically. Finally, simulation results are provided to demonstrate the performance of the proposed algorithm.
In this paper, we describe the research on how perceptual load can affect programming performance in people with symptoms of Attention Deficit / Hyperactivity Disorder (ADHD). We asked developers to complete the Barkley Deficits in Executive Functioning Scale, which indicates the presence and severity levels of ADHD symptoms. After that, participants solved mentally active programming tasks (coding) and monotonous ones (debugging) in the integrated development environment in high perceptual load modes (visually noisy) and low perceptual load modes (visually clear). The development environment was augmented with the plugin we wrote to track efficiency metrics, i.e. time, speed, and activity. We found that the perceptual load does affect programmers' efficiency. For mentally active tasks, the time of inserting the first character was shorter and the overall speed was higher in the low perceptual load mode. For monotonous tasks, the total time for the solution was less for the low perceptual load mode. Also, we found that the effect of perceptual load on programmers' efficiency differs between those with and without ADHD symptoms. This effect has a specificity: depending on efficiency measures and ADHD symptoms, one or another level of perceptual load might be beneficial. Our findings support the idea of behavioral assessment of users for providing appropriate accommodation for the workforce with special needs.
We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' positions and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets.
In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: //github.com/kemaloksuz/ObjectDetectionImbalance .