This research study delves into the conceptualization, development, and deployment of an innovative learning analytics tool, leveraging the capabilities of OpenAI's GPT-4 model. This tool is designed to quantify student engagement, map learning progression, and evaluate the efficacy of diverse instructional strategies within an educational context. Through the analysis of various critical data points such as students' stress levels, curiosity, confusion, agitation, topic preferences, and study methods, the tool offers a rich, multi-dimensional view of the learning environment. Furthermore, it employs Bloom's taxonomy as a framework to gauge the cognitive levels addressed by students' questions, thereby elucidating their learning progression. The information gathered from these measurements can empower educators by providing valuable insights to enhance teaching methodologies, pinpoint potential areas for improvement, and craft personalized interventions for individual students. The study articulates the design intricacies, implementation strategy, and thorough evaluation of the learning analytics tool, underscoring its prospective contributions to enhancing educational outcomes and bolstering student success. Moreover, the practicalities of integrating the tool within existing educational platforms and the requisite robust, secure, and scalable technical infrastructure are addressed. This research opens avenues for harnessing AI's potential in shaping the future of education, facilitating data-driven pedagogical decisions, and ultimately fostering a more conducive, personalized learning environment.
The study of reinforcement learning from human feedback (RLHF) has gained prominence in recent years due to its role in the development of LLMs. Neuroscience research shows that human responses to stimuli are known to depend on partially-observed "internal states." Unfortunately current models of RLHF do not take take this into consideration. Moreover most RLHF models do not account for intermediate feedback, which is gaining importance in empirical work and can help improve both sample complexity and alignment. To address these limitations, we model RLHF as reinforcement learning with partially observed reward-states (PORRL). We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL. For cardinal feedback, we develop generic statistically efficient algorithms and instantiate them to present POR-UCRL and POR-UCBVI. For dueling feedback, we show that a naive reduction to cardinal feedback fails to achieve sublinear dueling regret. We then present the first explicit reduction that converts guarantees for cardinal regret to dueling regret. We show that our models and guarantees in both settings generalize and extend existing ones. Finally, we identify a recursive structure on our model that could improve the statistical and computational tractability of PORRL, giving examples from past work on RLHF as well as learning perfect reward machines, which PORRL subsumes.
We study constrained comonotone min-max optimization, a structured class of nonconvex-nonconcave min-max optimization problems, and their generalization to comonotone inclusion. In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods. Additionally, we prove that the algorithm's iterations converge to a point in the solution set. In our second contribution, we extend the Fast Extra Gradient (FEG) algorithm, as developed by Lee and Kim (2021), to constrained comonotone min-max optimization and comonotone inclusion, achieving the same $O\left(\frac{1}{T}\right)$ convergence rate. This rate is applicable to the broadest set of comonotone inclusion problems yet studied in the literature. Our analyses are based on simple potential function arguments, which might be useful for analyzing other accelerated algorithms.
Motivated by understanding and analysis of large-scale machine learning under heavy-tailed gradient noise, we study distributed optimization with gradient clipping, i.e., in which certain clipping operators are applied to the gradients or gradient estimates computed from local clients prior to further processing. While vanilla gradient clipping has proven effective in mitigating the impact of heavy-tailed gradient noises in non-distributed setups, it incurs bias that causes convergence issues in heterogeneous distributed settings. To address the inherent bias introduced by gradient clipping, we develop a smoothed clipping operator, and propose a distributed gradient method equipped with an error feedback mechanism, i.e., the clipping operator is applied on the difference between some local gradient estimator and local stochastic gradient. We establish that, for the first time in the strongly convex setting with heavy-tailed gradient noises that may not have finite moments of order greater than one, the proposed distributed gradient method's mean square error (MSE) converges to zero at a rate $O(1/t^\iota)$, $\iota \in (0, 1/2)$, where the exponent $\iota$ stays bounded away from zero as a function of the problem condition number and the first absolute moment of the noise and, in particular, is shown to be independent of the existence of higher order gradient noise moments $\alpha > 1$. Numerical experiments validate our theoretical findings.
Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation learning into an \underline{E}nd-to-end \underline{L}earnable \underline{C}lustering framework, for effective and efficient \underline{Rec}ommendation. Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons. Then, we design a novel learnable clustering module to separate different cluster centers, thus decoupling users' complex intents. Meanwhile, it guides the network to learn intents from behaviors by forcing behavior embeddings close to cluster centers. This allows simultaneous optimization of recommendation and clustering via mini-batch data. Moreover, we propose intent-assisted contrastive learning by using cluster centers as self-supervision signals, further enhancing mutual promotion. Both experimental results and theoretical analyses demonstrate the superiority of ELCRec from six perspectives. Compared to the runner-up, ELCRec improves NDCG@5 by 8.9\% and reduces computational costs by 22.5\% on Beauty dataset. Furthermore, due to the scalability and universal applicability, we deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized. For two-player zero-sum Markov Games, we design a (policy) elimination based algorithm that achieves a regret of $\widetilde{O}(\sqrt{H^3 S^2 ABK})$, while the batch complexity is only $O(H+\log\log K)$. In the above, $S$ denotes the number of states, $A,B$ are the number of actions for the two players respectively, $H$ is the horizon and $K$ is the number of episodes. Furthermore, we prove a batch complexity lower bound $\Omega(\frac{H}{\log_{A}K}+\log\log K)$ for all algorithms with $\widetilde{O}(\sqrt{K})$ regret bound, which matches our upper bound up to logarithmic factors. As a byproduct, our techniques naturally extend to learning bandit games and reward-free MARL within near optimal batch complexity. To the best of our knowledge, these are the first line of results towards understanding MARL with low adaptivity.
The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD). This survey presents a comprehensive review of more than 40 research papers, demonstrating the role of foundation models in enhancing AD. Large language models contribute to planning and simulation in AD, particularly through their proficiency in reasoning, code generation and translation. In parallel, vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking, as well as creating realistic driving scenarios for simulation and testing. Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning, crucial for end-to-end AD. This survey not only provides a structured taxonomy, categorizing foundation models based on their modalities and functionalities within the AD domain but also delves into the methods employed in current research. It identifies the gaps between existing foundation models and cutting-edge AD approaches, thereby charting future research directions and proposing a roadmap for bridging these gaps.
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within the historical predictions of the model being trained. Specifically, we empirically reveal that a significant number of salient unlabeled samples yield consistent historical predictions corresponding to their ground truth category. From this observation, we propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL). In this framework, we introduce two memory banks to record historical prediction of unlabeled data, which are exploited to measure the credibility of each sample in terms of its prediction consistency. With the guidance of credibility, we can design a divide-and-conquer learning strategy to fully utilize the discriminative information of unlabeled data while alleviating the negative influence of noisy labels. Extensive experimental results on multiple benchmarks demonstrate the generality and superiority of our method, where our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition and challenging semantic shift settings (i.e.,with +8.4% gain on CUB and +8.1% on Standford Cars).
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.