This paper explores the integration of Automated Guided Vehicles (AGVs) in warehouse order picking, a crucial and cost-intensive aspect of warehouse operations. The booming AGV industry, accelerated by the COVID-19 pandemic, is witnessing widespread adoption due to its efficiency, reliability, and cost-effectiveness in automating warehouse tasks. This paper focuses on enhancing the picker-to-parts system, prevalent in small to medium-sized warehouses, through the strategic use of AGVs. We discuss the benefits and applications of AGVs in various warehouse tasks, highlighting their transformative potential in improving operational efficiency. We examine the deployment of AGVs by leading companies in the industry, showcasing their varied functionalities in warehouse management. Addressing the gap in research on optimizing operational performance in hybrid environments where humans and AGVs coexist, our study delves into a dynamic picker-to-parts warehouse scenario. We propose a novel approach Neural Approximate Dynamic Programming approach for coordinating a mixed team of human and AGV workers, aiming to maximize order throughput and operational efficiency. This involves innovative solutions for non-myopic decision making, order batching, and battery management. We also discuss the integration of advanced robotics technology in automating the complete order-picking process. Through a comprehensive numerical study, our work offers valuable insights for managing a heterogeneous workforce in a hybrid warehouse setting, contributing significantly to the field of warehouse automation and logistics.
Our work addresses a fundamental problem in the context of counterfactual inference for Markov Decision Processes (MDPs). Given an MDP path $\tau$, this kind of inference allows us to derive counterfactual paths $\tau'$ describing what-if versions of $\tau$ obtained under different action sequences than those observed in $\tau$. However, as the counterfactual states and actions deviate from the observed ones over time, the observation $\tau$ may no longer influence the counterfactual world, meaning that the analysis is no longer tailored to the individual observation, resulting in interventional outcomes rather than counterfactual ones. Even though this issue specifically affects the popular Gumbel-max structural causal model used for MDP counterfactuals, it has remained overlooked until now. In this work, we introduce a formal characterisation of influence based on comparing counterfactual and interventional distributions. We devise an algorithm to construct counterfactual models that automatically satisfy influence constraints. Leveraging such models, we derive counterfactual policies that are not just optimal for a given reward structure but also remain tailored to the observed path. Even though there is an unavoidable trade-off between policy optimality and strength of influence constraints, our experiments demonstrate that it is possible to derive (near-)optimal policies while remaining under the influence of the observation.
This paper designs a simple, efficient and truthful mechanism to to elicit self-evaluations about items jointly owned by owners. A key application of this mechanism is to improve the peer review of large scientific conferences where a paper often has multiple authors and many authors have multiple papers. Our mechanism is designed to generate an entirely new source of review data truthfully elicited from paper owners, and can be used to augment the traditional approach of eliciting review data only from peer reviewers. Our approach starts by partitioning all submissions of a conference into disjoint blocks, each of which shares a common set of co-authors. We then elicit the ranking of the submissions from each author and employ isotonic regression to produce adjusted review scores that align with both the reported ranking and the raw review scores. Under certain conditions, truth-telling by all authors is a Nash equilibrium for any valid partition of the overlapping ownership sets. We prove that to ensure truthfulness for such isotonic regression based mechanisms, partitioning the authors into blocks and eliciting only ranking information independently from each block is necessary. This leave the optimization of block partition as the only room for maximizing the estimation efficiency of our mechanism, which is a computationally intractable optimization problem in general. Fortunately, we develop a nearly linear-time greedy algorithm that provably finds a performant partition with appealing robust approximation guarantees. Extensive experiments on both synthetic data and real-world conference review data demonstrate the effectiveness of this owner-assisted calibration mechanism.
The paper introduces a flexible model for the analysis of multivariate nonlinear time series data. The proposed Functional Coefficients Network Autoregressive (FCNAR) model considers the response of each node in the network to depend in a nonlinear fashion to each own past values (autoregressive component), as well as past values of each neighbor (network component). Key issues of model stability/stationarity, together with model parameter identifiability, estimation and inference are addressed for error processes that can be heavier than Gaussian for both fixed and growing number of network nodes. The performance of the estimators for the FCNAR model is assessed on synthetic data and the applicability of the model is illustrated on multiple indicators of air pollution data.
Diffusion models suffer from slow sample generation at inference time. Despite recent efforts, improving the sampling efficiency of stochastic samplers for diffusion models remains a promising direction. We propose Splitting Integrators for fast stochastic sampling in pre-trained diffusion models in augmented spaces. Commonly used in molecular dynamics, splitting-based integrators attempt to improve sampling efficiency by cleverly alternating between numerical updates involving the data, auxiliary, or noise variables. However, we show that a naive application of splitting integrators is sub-optimal for fast sampling. Consequently, we propose several principled modifications to naive splitting samplers for improving sampling efficiency and denote the resulting samplers as Reduced Splitting Integrators. In the context of Phase Space Langevin Diffusion (PSLD) [Pandey \& Mandt, 2023] on CIFAR-10, our stochastic sampler achieves an FID score of 2.36 in only 100 network function evaluations (NFE) as compared to 2.63 for the best baselines.
The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus not only on the task-specific features but also the environment dynamics. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighboring states. This encourages the representations to be driven not only by the value/policy learning but also by an additional loss that constrains the representations from over-fitting to the value loss. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant performance improvement.
While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.
Emotion recognition in conversation (ERC) aims to detect the emotion label for each utterance. Motivated by recent studies which have proven that feeding training examples in a meaningful order rather than considering them randomly can boost the performance of models, we propose an ERC-oriented hybrid curriculum learning framework. Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC). In CC, we construct a difficulty measurer based on "emotion shift" frequency within a conversation, then the conversations are scheduled in an "easy to hard" schema according to the difficulty score returned by the difficulty measurer. For UC, it is implemented from an emotion-similarity perspective, which progressively strengthens the model's ability in identifying the confusing emotions. With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models and we are able to achieve new state-of-the-art results on four public ERC datasets.
Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.