Generative Artificial Intelligence (AI) stands as a transformative force that presents a paradox; it offers unprecedented opportunities for productivity growth while potentially posing significant threats to economic stability and societal wellbeing. Many consider generative AI as akin to previous technological advancements, using historical precedent to argue that fears of widespread job displacement are unfounded, while others contend that generative AI`s unique capacity to undertake non-routine cognitive tasks sets it apart from other forms of automation capital and presents a threat to the quality and availability of work that underpin stable societies. This paper explores the conditions under which both may be true. We posit the existence of an AI-capital-to-labour ratio threshold beyond which a self-reinforcing cycle of recessionary pressures could be triggered, exacerbating social disparities, reducing social cohesion, heightening tensions, and requiring sustained government intervention to maintain stability. To prevent this, the paper underscores the urgent need for proactive policy responses, making recommendations to reduce these risks through robust regulatory frameworks and a new social contract characterised by progressive social and economic policies. This approach aims to ensure a sustainable, inclusive, and resilient economic future where human contribution to the economy is retained and integrated with generative AI to enhance the Mental Wealth of nations.
This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
Dynamic Neural Radiance Fields (Dynamic NeRF) enhance NeRF technology to model moving scenes. However, they are resource intensive and challenging to compress. To address these issues, this paper presents WavePlanes, a fast and more compact explicit model. We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients. The inverse discrete wavelet transform reconstructs feature signals at varying detail, which are linearly decoded to approximate the color and density of volumes in a 4-D grid. Exploiting the sparsity of wavelet coefficients, we compress the model using a Hash Map containing only non-zero coefficients and their locations on each plane. Compared to the state-of-the-art (SotA) plane-based models, WavePlanes is up to 15x smaller while being less resource demanding and competitive in performance and training time. Compared to other small SotA models WavePlanes preserves details better without requiring custom CUDA code or high performance computing resources. Our code is available at: //github.com/azzarelli/waveplanes/
The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.
We investigate how generative Artificial Intelligence (AI) can be used to optimize resources in Unmanned Aerial Vehicle (UAV)-assisted Internet of Things (IoT) networks. In particular, generative AI models for real-time decision-making have been used in public safety scenarios. This work describes how generative AI models can improve resource management within UAV-assisted networks. Furthermore, this work presents generative AI in UAV-assisted networks to demonstrate its practical applications and highlight its broader capabilities. We demonstrate a real-life case study for public safety, demonstrating how generative AI can enhance real-time decision-making and improve training datasets. By leveraging generative AI in UAV- assisted networks, we can design more intelligent, adaptive, and efficient ecosystems to meet the evolving demands of wireless networks and diverse applications. Finally, we discuss challenges and future research directions associated with generative AI for resource optimization in UAV-assisted networks.
Despite superior reasoning prowess demonstrated by Large Language Models (LLMs) with Chain-of-Thought (CoT) prompting, a lack of understanding prevails around the internal mechanisms of the models that facilitate CoT generation. This work investigates the neural sub-structures within LLMs that manifest CoT reasoning from a mechanistic point of view. From an analysis of Llama-2 7B applied to multistep reasoning over fictional ontologies, we demonstrate that LLMs deploy multiple parallel pathways of answer generation for step-by-step reasoning. These parallel pathways provide sequential answers from the input question context as well as the generated CoT. We observe a functional rift in the middle layers of the LLM. Token representations in the initial half remain strongly biased towards the pretraining prior, with the in-context prior taking over in the later half. This internal phase shift manifests in different functional components: attention heads that write the answer token appear in the later half, attention heads that move information along ontological relationships appear in the initial half, and so on. To the best of our knowledge, this is the first attempt towards mechanistic investigation of CoT reasoning in LLMs.
The problems that we consider in this paper are as follows. Let A and B be 2x2 matrices (over reals). Let w(A, B) be a word of length n. After evaluating w(A, B) as a product of matrices, we get a 2x2 matrix, call it W. What is the largest (by the absolute value) possible entry of W, over all w(A, B) of length n, as a function of n? What is the expected absolute value of the largest (by the absolute value) entry in a random product of n matrices, where each matrix is A or B with probability 0.5? What is the Lyapunov exponent for a random matrix product like that? We give partial answer to the first of these questions and an essentially complete answer to the second question. For the third question (the most difficult of the three), we offer a very simple method to produce an upper bound on the Lyapunov exponent in the case where all entries of the matrices A and B are nonnegative.
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. It is designed as a general framework, compatible with both structured and unstructured pruning. Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge. Besides, to better retain the performance of the pruned model, the snapshots (i.e., the intermediate models at each pruning iteration) also serve as effective supervisions for pruning. Our extensive experiments show that adopting CAP consistently yields significant improvements, especially in extremely high sparsity scenarios. With only 3% model parameters reserved (i.e., 97% sparsity), CAP successfully achieves 99.2% and 96.3% of the original BERT performance in QQP and MNLI tasks. In addition, our probing experiments demonstrate that the model pruned by CAP tends to achieve better generalization ability.
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.