We consider the problem of developing a clickstream modeling framework for real-time customer event prediction problems in SaaS products like QBO. We develop a low-latency, cost-effective, and robust ensemble architecture (BBE-LSWCM), which combines both aggregated user behavior data from a longer historical window (e.g., over the last few weeks) as well as user activities over a short window in recent-past (e.g., in the current session). As compared to other baseline approaches, we demonstrate the superior performance of the proposed method for two important real-time event prediction problems: subscription cancellation and intended task detection for QBO subscribers. Finally, we present details of the live deployment and results from online experiments in QBO.
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite- time problems which is reflected in improved convergence bounds.
Process Mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging for traditional process discovery algorithms to store and mine such data in traditional event logs. Further, privacy considerations often prevent data collection at a central location in the first place. To address this challenge, this paper introduces EdgeAlpha, a distributed algorithm for process discovery operating directly on sensor nodes and edge devices on a stream of real-time event data. Based on the Alpha Miner, EdgeAlpha tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. From this local view, each node in EdgeAlpha derives a partial footprint matrix, which we then merge at a central location, whenever we query the system to compute a process model. EdgeAlpha enables (a) scalable mining, as a node, for each event, only interacts with its predecessors and, when queried, only exchanges aggregates, i.e., partial footprint matrices, with the central location and (b) privacy preserving process mining, as nodes only store their own as well as predecessor and successor events. On the Sepsis Cases event log, for example, a node queries on average 18.7% of all nodes. For the Hospital Log, we can even reduce the overall querying to 3.87% of the nodes.
In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of examples to efficiently generate optimized parallel code, streamlining the development process. This method's strength lies in its ability to swiftly produce parallelized code that runs efficiently. However, this reliance on NLP models, without direct code analysis, can introduce inaccuracies, as these models might not fully grasp the nuanced semantics of the code they parallelize. We build OMP-Engineer, which balances the efficiency and scalability of NLP models with the accuracy and reliability of traditional methods, aiming to enhance the performance of automating parallelization while navigating its inherent challenges.
Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. Large Language Models (LLMs), trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.
Modular reconfigurable manipulators enable quick adaptation and versatility to address different application environments and tailor to the specific requirements of the tasks. Task performance significantly depends on the manipulator's mounted pose and morphology design, therefore posing the need of methodologies for selecting suitable modular robot configurations and mounted pose that can address the specific task requirements and required performance. Morphological changes in modular robots can be derived through a discrete optimization process involving the selective addition or removal of modules. In contrast, the adjustment of the mounted pose operates within a continuous space, allowing for smooth and precise alterations in both orientation and position. This work introduces a computational framework that simultaneously optimizes modular manipulators' mounted pose and morphology. The core of the work is that we design a mapping function that \textit{implicitly} captures the morphological state of manipulators in the continuous space. This transformation function unifies the optimization of mounted pose and morphology within a continuous space. Furthermore, our optimization framework incorporates a array of performance metrics, such as minimum joint effort and maximum manipulability, and considerations for trajectory execution error and physical and safety constraints. To highlight our method's benefits, we compare it with previous methods that framed such problem as a combinatorial optimization problem and demonstrate its practicality in selecting the modular robot configuration for executing a drilling task with the CONCERT modular robotic platform.
The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community. First, although the state-of-the-art quantization algorithm can achieve low precision with negligible degradation of accuracy, the latest deep learning framework (e.g., PyTorch) can only support non-customizable 8-bit precision, data format, and parameter extraction. Secondly, the objective of quantization is to enable the computation with low-precision data. However, the current SoTA algorithm treats the quantized integer as an intermediate result, while the final output of the quantizer is the "discretized" floating-point values, ignoring the practical needs and adding additional workload to hardware designers for integer parameter extraction and layer fusion. Finally, the compression toolkits designed by the industry are constrained to their in-house product or a handful of algorithms. The limited degree of freedom in the current toolkit and the under-explored customization hinder the prototype ASIC or FPGA-based accelerator design. To resolve these challenges, we propose Torch2Chip, an open-sourced, fully customizable, and high-performance toolkit that supports user-designed compression followed by automatic model fusion and parameter extraction. Torch2Chip incorporates the hierarchical design workflow, and the user-customized compression algorithm will be directly packed into the deployment-ready format for prototype chip verification with either CNN or vision transformer (ViT). The code is available at //github.com/SeoLabCornell/torch2chip.
Unmanned Aerial Vehicles (UAVs) have emerged as a transformative technology across diverse sectors, offering adaptable solutions to complex challenges in both military and civilian domains. Their expanding capabilities present a platform for further advancement by integrating cutting-edge computational tools like Artificial Intelligence (AI) and Machine Learning (ML) algorithms. These advancements have significantly impacted various facets of human life, fostering an era of unparalleled efficiency and convenience. Large Language Models (LLMs), a key component of AI, exhibit remarkable learning and adaptation capabilities within deployed environments, demonstrating an evolving form of intelligence with the potential to approach human-level proficiency. This work explores the significant potential of integrating UAVs and LLMs to propel the development of autonomous systems. We comprehensively review LLM architectures, evaluating their suitability for UAV integration. Additionally, we summarize the state-of-the-art LLM-based UAV architectures and identify novel opportunities for LLM embedding within UAV frameworks. Notably, we focus on leveraging LLMs to refine data analysis and decision-making processes, specifically for enhanced spectral sensing and sharing in UAV applications. Furthermore, we investigate how LLM integration expands the scope of existing UAV applications, enabling autonomous data processing, improved decision-making, and faster response times in emergency scenarios like disaster response and network restoration. Finally, we highlight crucial areas for future research that are critical for facilitating the effective integration of LLMs and UAVs.
We introduce a new regression framework designed to deal with large-scale, complex data that lies around a low-dimensional manifold with noises. Our approach first constructs a graph representation, referred to as the skeleton, to capture the underlying geometric structure. We then define metrics on the skeleton graph and apply nonparametric regression techniques, along with feature transformations based on the graph, to estimate the regression function. We also discuss the limitations of some nonparametric regressors with respect to the general metric space such as the skeleton graph. The proposed regression framework suggests a novel way to deal with data with underlying geometric structures and provides additional advantages in handling the union of multiple manifolds, additive noises, and noisy observations. We provide statistical guarantees for the proposed method and demonstrate its effectiveness through simulations and real data examples.
Existing recommender systems extract the user preference based on learning the correlation in data, such as behavioral correlation in collaborative filtering, feature-feature, or feature-behavior correlation in click-through rate prediction. However, regretfully, the real world is driven by causality rather than correlation, and correlation does not imply causation. For example, the recommender systems can recommend a battery charger to a user after buying a phone, in which the latter can serve as the cause of the former, and such a causal relation cannot be reversed. Recently, to address it, researchers in recommender systems have begun to utilize causal inference to extract causality, enhancing the recommender system. In this survey, we comprehensively review the literature on causal inference-based recommendation. At first, we present the fundamental concepts of both recommendation and causal inference as the basis of later content. We raise the typical issues that the non-causality recommendation is faced. Afterward, we comprehensively review the existing work of causal inference-based recommendation, based on a taxonomy of what kind of problem causal inference addresses. Last, we discuss the open problems in this important research area, along with interesting future works.