Computer vision is a broad field of study that encompasses different tasks (e.g., object detection, semantic segmentation, 3D reconstruction). Although computer vision is relevant to the African communities in various applications, yet computer vision research is under-explored in the continent and constructs only 0.06% of top-tier publications in the last 10 years. In this paper, our goal is to have a better understanding of the computer vision research conducted in Africa and provide pointers on whether there is equity in research or not. We do this through an empirical analysis of the African computer vision publications that are Scopus indexed. We first study the opportunities available for African institutions to publish in top-tier computer vision venues. We show that African publishing trends in top-tier venues over the years do not exhibit consistent growth. We also devise a novel way to retrieve African authors through their affiliation history to have a better understanding of their contributions in top-tier venues. Moreover, we study all computer vision publications beyond top-tier venues in different African regions to find that mainly Northern and Southern Africa are publishing in computer vision with more than 85% of African publications. Finally, we present the most recurring keywords in computer vision publications. In summary, our analysis reveals that African researchers are key contributors to African research, yet there exists multiple barriers to publish in top-tier venues and the current trend of topics published in the continent might not necessarily reflect the communities' needs. This work is part of a community based effort that is focused on improving computer vision research in Africa.
Understanding the relationship between the composition of a research team and the potential impact of their research papers is crucial as it can steer the development of new science policies for improving the research enterprise. Numerous studies assess how the characteristics and diversity of research teams can influence their performance across several dimensions: ethnicity, internationality, size, and others. In this paper, we explore the impact of diversity in terms of the authors' expertise. To this purpose, we retrieved 114K papers in the field of Computer Science and analysed how the diversity of research fields within a research team relates to the number of citations their papers received in the upcoming 5 years. The results show that two different metrics we defined, reflecting the diversity of expertise, are significantly associated with the number of citations. This suggests that, at least in Computer Science, diversity of expertise is key to scientific impact.
One core assumption typically adopted for valid causal inference is that of no interference between experimental units, i.e., the outcome of an experimental unit is unaffected by the treatments assigned to other experimental units. This assumption can be violated in real-life experiments, which significantly complicates the task of causal inference as one must disentangle direct treatment effects from ``spillover'' effects. Current methodologies are lacking, as they cannot handle arbitrary, unknown interference structures to permit inference on causal estimands. We present a general framework to address the limitations of existing approaches. Our framework is based on the new concept of the ``degree of interference'' (DoI). The DoI is a unit-level latent variable that captures the latent structure of interference. We also develop a data augmentation algorithm that adopts a blocked Gibbs sampler and Bayesian nonparametric methodology to perform inferences on the estimands under our framework. We illustrate the DoI concept and properties of our Bayesian methodology via extensive simulation studies and an analysis of a randomized experiment investigating the impact of a cash transfer program for which interference is a critical concern. Ultimately, our framework enables us to infer causal effects without strong structural assumptions on interference.
Recently, computer scientists have developed large language models (LLMs) by training prediction models with large-scale language corpora and human reinforcements. The LLMs have become one promising way to implement artificial intelligence with accuracy in various fields. Interestingly, recent LLMs possess emergent functional features that emulate sophisticated human cognition, especially in-context learning and the chain of thought, which were unavailable in previous prediction models. In this paper, I will examine how LLMs might contribute to moral education and development research. To achieve this goal, I will review the most recently published conference papers and ArXiv preprints to overview the novel functional features implemented in LLMs. I also intend to conduct brief experiments with ChatGPT to investigate how LLMs behave while addressing ethical dilemmas and external feedback. The results suggest that LLMs might be capable of solving dilemmas based on reasoning and revising their reasoning process with external input. I will discuss the potential implications of LLMs on research on moral education and development with the results.
The advent of AI driven large language models (LLMs) have stirred discussions about their role in qualitative research. Some view these as tools to enrich human understanding, while others perceive them as threats to the core values of the discipline. This study aimed to compare and contrast the comprehension capabilities of humans and LLMs. We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. LLMs were then asked to classify these reviews and provide the reasoning behind each classification. We compared the results with human classification and reasoning. The research indicated a significant alignment between human and ChatGPT 3.5 classifications in one third of cases, and a slightly lower alignment with GPT4 in over a quarter of cases. The two AI models showed a higher alignment, observed in more than half of the instances. However, a consensus across all three methods was seen only in about one fifth of the classifications. In the comparison of human and LLMs reasoning, it appears that human analysts lean heavily on their individual experiences. As expected, LLMs, on the other hand, base their reasoning on the specific word choices found in app reviews and the functional components of the app itself. Our results highlight the potential for effective human LLM collaboration, suggesting a synergistic rather than competitive relationship. Researchers must continuously evaluate LLMs role in their work, thereby fostering a future where AI and humans jointly enrich qualitative research.
Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), soft labels provide information about the object's relationship with multiple classes (e.g., "this is most likely a dog, but it could also be a wolf or a coyote"). We use information theory to compare how a number of commonly-used supervision signals contribute to representation-learning performance, as well as how their capacity is affected by factors such as the number of labels, classes, dimensions, and noise. Our framework provides theoretical justification for using hard labels in the big-data regime, but richer supervision signals for few-shot learning and out-of-distribution generalization. We validate these results empirically in a series of experiments with over 1 million crowdsourced image annotations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
Deep Learning (DL) is the most widely used tool in the contemporary field of computer vision. Its ability to accurately solve complex problems is employed in vision research to learn deep neural models for a variety of tasks, including security critical applications. However, it is now known that DL is vulnerable to adversarial attacks that can manipulate its predictions by introducing visually imperceptible perturbations in images and videos. Since the discovery of this phenomenon in 2013~[1], it has attracted significant attention of researchers from multiple sub-fields of machine intelligence. In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018. Many of those contributions have inspired new directions in this area, which has matured significantly since witnessing the first generation methods. Hence, as a legacy sequel of [2], this literature review focuses on the advances in this area since 2018. To ensure authenticity, we mainly consider peer-reviewed contributions published in the prestigious sources of computer vision and machine learning research. Besides a comprehensive literature review, the article also provides concise definitions of technical terminologies for non-experts in this domain. Finally, this article discusses challenges and future outlook of this direction based on the literature reviewed herein and [2].
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.
It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models. Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed. This embraces a twofold meaning: the improvement of designers' productivity, and the completion of the virtuous cycle. In this paper, we present a comprehensive review of work that applies ML for system design, which can be grouped into two major categories, ML-based modelling that involves predictions of performance metrics or some other criteria of interest, and ML-based design methodology that directly leverages ML as the design tool. For ML-based modelling, we discuss existing studies based on their target level of system, ranging from the circuit level to the architecture/system level. For ML-based design methodology, we follow a bottom-up path to review current work, with a scope of (micro-)architecture design (memory, branch prediction, NoC), coordination between architecture/system and workload (resource allocation and management, data center management, and security), compiler, and design automation. We further provide a future vision of opportunities and potential directions, and envision that applying ML for computer architecture and systems would thrive in the community.