This paper argues for the strategic treatment of artificial intelligence as a key industry within broader industrial policy framework of Pakistan, underscoring the importance of aligning it with national goals such as economic resilience and preservation of autonomy. The paper starts with defining industrial policy as a set of targeted government interventions to shape specific sectors for strategic outcomes and argues for its application to AI in Pakistan due to its huge potential, the risks of unregulated adoption, and prevailing market inefficiencies. The paper conceptualizes AI as a layered ecosystem, comprising foundational infrastructure, core computing, development platforms, and service and product layers, supported by education, government policy, and research and development. The analysis highlights that AI sector of Pakistan is predominantly service oriented, with limited product innovation and dependence on foreign technologies, posing risks to economic independence, national security, and employment. To address these challenges, the paper recommends educational reforms, support for local AI product development, initiatives for indigenous cloud and hardware capabilities, and public-private collaborations on foundational models. Additionally, it advocates for public procurement policies and infrastructure incentives to foster local solutions and reduce reliance on foreign providers. This strategy aims to position Pakistan as a competitive, autonomous player in the global AI ecosystem.
The expansion of artificial intelligence (AI) applications has driven substantial investment in computational infrastructure, especially by cloud computing providers. Quantifying the energy footprint of this infrastructure requires models parameterized by the power demand of AI hardware during training. We empirically measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node during the training of open-source image classifier (ResNet) and large-language models (Llama2-13b). The maximum observed power draw was approximately 8.4 kW, 18% lower than the manufacturer-rated 10.2 kW, even with GPUs near full utilization. Holding model architecture constant, increasing batch size from 512 to 4096 images for ResNet reduced total training energy consumption by a factor of 4. These findings can inform capacity planning for data center operators and energy use estimates by researchers. Future work will investigate the impact of cooling technology and carbon-aware scheduling on AI workload energy consumption.
This paper explores the potential of recurrent neural networks (RNNs) and other subquadratic architectures as competitive alternatives to transformer-based models in low-resource language modeling scenarios. We utilize HGRN2 (Qin et al., 2024), a recently proposed RNN-based architecture, and comparatively evaluate its effectiveness against transformer-based baselines and other subquadratic architectures (LSTM, xLSTM, Mamba). Our experimental results show that BABYHGRN, our HGRN2 language model, outperforms transformer-based models in both the 10M and 100M word tracks of the challenge, as measured by their performance on the BLiMP, EWoK, GLUE and BEAR benchmarks. Further, we show the positive impact of knowledge distillation. Our findings challenge the prevailing focus on transformer architectures and indicate the viability of RNN-based models, particularly in resource-constrained environments.
This paper studies identification and estimation of average causal effects, such as average marginal or treatment effects, in fixed effects logit models with short panels. Relating the identified set of these effects to an extremal moment problem, we first show how to obtain sharp bounds on such effects simply, without any optimization. We also consider even simpler outer bounds, which, contrary to the sharp bounds, do not require any first-step nonparametric estimators. We build confidence intervals based on these two approaches and show their asymptotic validity. Monte Carlo simulations suggest that both approaches work well in practice, the second being typically competitive in terms of interval length. Finally, we show that our method is also useful to measure treatment effect heterogeneity.
This paper presents a novel approach to range-based cooperative localization for robot swarms in GPS-denied environments, addressing the limitations of current methods in noisy and sparse settings. We propose a robust multi-layered localization framework that combines shadow edge localization techniques with the strategic deployment of UAVs. This approach not only addresses the challenges associated with nonrigid and poorly connected graphs but also enhances the convergence rate of the localization process. We introduce two key concepts: the S1-Edge approach in our distributed protocol to address the rigidity problem of sparse graphs and the concept of a powerful UAV node to increase the sensing and localization capability of the multi-robot system. Our approach leverages the advantages of the distributed localization methods, enhancing scalability and adaptability in large robot networks. We establish theoretical conditions for the new S1-Edge that ensure solutions exist even in the presence of noise, thereby validating the effectiveness of shadow edge localization. Extensive simulation experiments confirm the superior performance of our method compared to state-of-the-art techniques, resulting in up to 95\% reduction in localization error, demonstrating substantial improvements in localization accuracy and robustness to sparse graphs. This work provides a decisive advancement in the field of multi-robot localization, offering a powerful tool for high-performance and reliable operations in challenging environments.
As artificial intelligence (AI) continues advancing, ensuring positive societal impacts becomes critical, especially as AI systems become increasingly ubiquitous in various aspects of life. However, developing "AI for good" poses substantial challenges around aligning systems with complex human values. Presently, we lack mature methods for addressing these challenges. This article presents and evaluates the Positive AI design method aimed at addressing this gap. The method provides a human-centered process to translate wellbeing aspirations into concrete practices. First, we explain the method's four key steps: contextualizing, operationalizing, optimizing, and implementing wellbeing supported by continuous measurement for feedback cycles. We then present a multiple case study where novice designers applied the method, revealing strengths and weaknesses related to efficacy and usability. Next, an expert evaluation study assessed the quality of the resulting concepts, rating them moderately high for feasibility, desirability, and plausibility of achieving intended wellbeing benefits. Together, these studies provide preliminary validation of the method's ability to improve AI design, while surfacing areas needing refinement like developing support for complex steps. Proposed adaptations such as examples and evaluation heuristics could address weaknesses. Further research should examine sustained application over multiple projects. This human-centered approach shows promise for realizing the vision of 'AI for Wellbeing' that does not just avoid harm, but actively benefits humanity.
Understanding relations arising out of interactions among entities can be very difficult, and predicting them is even more challenging. This problem has many applications in various fields, such as financial networks and e-commerce. These relations can involve much more complexities than just involving more than two entities. One such scenario is evolving recursive relations between multiple entities, and so far, this is still an open problem. This work addresses the problem of forecasting higher-order interaction events that can be multi-relational and recursive. We pose the problem in the framework of representation learning of temporal hypergraphs that can capture complex relationships involving multiple entities. The proposed model, \textit{Relational Recursive Hyperedge Temporal Point Process} (RRHyperTPP) uses an encoder that learns a dynamic node representation based on the historical interaction patterns and then a hyperedge link prediction-based decoder to model the occurrence of interaction events. These learned representations are then used for downstream tasks involving forecasting the type and time of interactions. The main challenge in learning from hyperedge events is that the number of possible hyperedges grows exponentially with the number of nodes in the network. This will make the computation of negative log-likelihood of the temporal point process expensive, as the calculation of survival function requires a summation over all possible hyperedges. In our work, we develop a noise contrastive estimation method to learn the parameters of our model, and we have experimentally shown that our models perform better than previous state-of-the-art methods for interaction forecasting.
Governments typically collect and steward a vast amount of high-quality data on their citizens and institutions, and the UK government is exploring how it can better publish and provision this data to the benefit of the AI landscape. However, the compositions of generative AI training corpora remain closely guarded secrets, making the planning of data sharing initiatives difficult. To address this, we devise two methods to assess UK government data usage for the training of Large Language Models (LLMs) and 'peek behind the curtain' in order to observe the UK government's current contributions as a data provider for AI. The first method, an ablation study that utilises LLM 'unlearning', seeks to examine the importance of the information held on UK government websites for LLMs and their performance in citizen query tasks. The second method, an information leakage study, seeks to ascertain whether LLMs are aware of the information held in the datasets published on the UK government's open data initiative data.gov.uk. Our findings indicate that UK government websites are important data sources for AI (heterogenously across subject matters) while data.gov.uk is not. This paper serves as a technical report, explaining in-depth the designs, mechanics, and limitations of the above experiments. It is accompanied by a complementary non-technical report on the ODI website in which we summarise the experiments and key findings, interpret them, and build a set of actionable recommendations for the UK government to take forward as it seeks to design AI policy. While we focus on UK open government data, we believe that the methods introduced in this paper present a reproducible approach to tackle the opaqueness of AI training corpora and provide organisations a framework to evaluate and maximize their contributions to AI development.
Interacting with the legal system and the government requires the assembly and analysis of various pieces of information that can be spread across different (paper) documents, such as forms, certificates and contracts (e.g. leases). This information is required in order to understand one's legal rights, as well as to fill out forms to file claims in court or obtain government benefits. However, finding the right information, locating the correct forms and filling them out can be challenging for laypeople. Large language models (LLMs) have emerged as a powerful technology that has the potential to address this gap, but still rely on the user to provide the correct information, which may be challenging and error-prone if the information is only available in complex paper documents. We present an investigation into utilizing multi-modal LLMs to analyze images of handwritten paper forms, in order to automatically extract relevant information in a structured format. Our initial results are promising, but reveal some limitations (e.g., when the image quality is low). Our work demonstrates the potential of integrating multi-modal LLMs to support laypeople and self-represented litigants in finding and assembling relevant information.
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
This paper aims at revisiting Graph Convolutional Neural Networks by bridging the gap between spectral and spatial design of graph convolutions. We theoretically demonstrate some equivalence of the graph convolution process regardless it is designed in the spatial or the spectral domain. The obtained general framework allows to lead a spectral analysis of the most popular ConvGNNs, explaining their performance and showing their limits. Moreover, the proposed framework is used to design new convolutions in spectral domain with a custom frequency profile while applying them in the spatial domain. We also propose a generalization of the depthwise separable convolution framework for graph convolutional networks, what allows to decrease the total number of trainable parameters by keeping the capacity of the model. To the best of our knowledge, such a framework has never been used in the GNNs literature. Our proposals are evaluated on both transductive and inductive graph learning problems. Obtained results show the relevance of the proposed method and provide one of the first experimental evidence of transferability of spectral filter coefficients from one graph to another. Our source codes are publicly available at: //github.com/balcilar/Spectral-Designed-Graph-Convolutions