In this paper, we introduce EconLogicQA, a rigorous benchmark designed to assess the sequential reasoning capabilities of large language models (LLMs) within the intricate realms of economics, business, and supply chain management. Diverging from traditional benchmarks that predict subsequent events individually, EconLogicQA poses a more challenging task: it requires models to discern and sequence multiple interconnected events, capturing the complexity of economic logics. EconLogicQA comprises an array of multi-event scenarios derived from economic articles, which necessitate an insightful understanding of both temporal and logical event relationships. Through comprehensive evaluations, we exhibit that EconLogicQA effectively gauges a LLM's proficiency in navigating the sequential complexities inherent in economic contexts. We provide a detailed description of EconLogicQA dataset and shows the outcomes from evaluating the benchmark across various leading-edge LLMs, thereby offering a thorough perspective on their sequential reasoning potential in economic contexts. Our benchmark dataset is available at //huggingface.co/datasets/yinzhu-quan/econ_logic_qa.
In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims. Our recommended approach involves a refined modeling of the zero-claim process, together with the integration of boosting methods in order to help leverage an iterative process to enhance predictive accuracy. Despite the inherent slowdown in learning algorithms due to this iteration, several efficient implementation techniques that also help precise tuning of parameter like XGBoost, LightGBM, and CatBoost have emerged. Nonetheless, we chose to utilize CatBoost, a efficient boosting approach that effectively handles categorical and other special types of data. The core contribution of our paper is the assembly of separate modeling for zero claims and the application of tree-based boosting ensemble methods within a CatBoost framework, assuming that the inflated probability of zero is a function of the mean parameter. The efficacy of our enhanced Tweedie model is demonstrated through the application of an insurance telematics dataset, which presents the additional complexity of compositional feature variables. Our modeling results reveal a marked improvement in model performance, showcasing its potential to deliver more accurate predictions suitable for insurance claim analytics.
In this paper, we introduce Dreamweaver, which belongs to a new class of auto-regressive decision-making models known as large reasoning models (LRMs). Dreamweaver is designed to improve 3D floorplanning in electronic design automation (EDA) via an architecture that melds advancements in sequence-to-sequence reinforcement learning algorithms. A significant advantage of our approach is its ability to effectively reason over large discrete action spaces, which is essential for handling the numerous potential positions for various functional blocks in floorplanning. Additionally, Dreamweaver demonstrates strong performance even when trained on entirely random trajectories, showcasing its capacity to leverage sub-optimal or non-expert trajectories to enhance its results. This innovative approach contributes to streamlining the integrated circuit (IC) design flow and reducing the high computational costs typically associated with floorplanning. We evaluate its performance against a current state-of-the-art method, highlighting notable improvements.
This paper introduces EmpathyEar, a pioneering open-source, avatar-based multimodal empathetic chatbot, to fill the gap in traditional text-only empathetic response generation (ERG) systems. Leveraging the advancements of a large language model, combined with multimodal encoders and generators, EmpathyEar supports user inputs in any combination of text, sound, and vision, and produces multimodal empathetic responses, offering users, not just textual responses but also digital avatars with talking faces and synchronized speeches. A series of emotion-aware instruction-tuning is performed for comprehensive emotional understanding and generation capabilities. In this way, EmpathyEar provides users with responses that achieve a deeper emotional resonance, closely emulating human-like empathy. The system paves the way for the next emotional intelligence, for which we open-source the code for public access.
In this paper, we introduce QuST-LLM, an innovative extension of QuPath that utilizes the capabilities of large language models (LLMs) to analyze and interpret spatial transcriptomics (ST) data. This tool effectively simplifies the intricate and high-dimensional nature of ST data by offering a comprehensive workflow that includes data loading, region selection, gene expression analysis, and functional annotation. QuST-LLM employs LLMs to transform complex ST data into understandable and detailed biological narratives based on gene ontology annotations, thereby significantly improving the interpretability of ST data. Consequently, users can interact with their own ST data using natural language. Hence, QuST-LLM provides researchers with a potent functionality to unravel the spatial and functional complexities of tissues, fostering novel insights and advancements in biomedical research.
In this paper, we generalize the Pearl and Neyman-Rubin methodologies in causal inference by introducing a generalized approach that incorporates fuzzy logic. Indeed, we introduce a fuzzy causal inference approach that consider both the vagueness and imprecision inherent in data, as well as the subjective human perspective characterized by fuzzy terms such as 'high', 'medium', and 'low'. To do so, we introduce two fuzzy causal effect formulas: the Fuzzy Average Treatment Effect (FATE) and the Generalized Fuzzy Average Treatment Effect (GFATE), together with their normalized versions: NFATE and NGFATE. When dealing with a binary treatment variable, our fuzzy causal effect formulas coincide with classical Average Treatment Effect (ATE) formula, that is a well-established and popular metric in causal inference. In FATE, all values of the treatment variable are considered equally important. In contrast, GFATE takes into account the rarity and frequency of these values. We show that for linear Structural Equation Models (SEMs), the normalized versions of our formulas, NFATE and NGFATE, are equivalent to ATE. Further, we provide identifiability criteria for these formulas and show their stability with respect to minor variations in the fuzzy subsets and the probability distributions involved. This ensures the robustness of our approach in handling small perturbations in the data. Finally, we provide several experimental examples to empirically validate and demonstrate the practical application of our proposed fuzzy causal inference methods.
In this paper, we propose the "Kinetics Observer", a novel estimator addressing the challenge of state estimation for legged robots using proprioceptive sensors (encoders, IMU and force/torque sensors). Based on a Multiplicative Extended Kalman Filter, the Kinetics Observer allows the real-time simultaneous estimation of contact and perturbation forces, and of the robot's kinematics, which are accurate enough to perform proprioceptive odometry. Thanks to a visco-elastic model of the contacts linking their kinematics to the ones of the centroid of the robot, the Kinetics Observer ensures a tight coupling between the whole-body kinematics and dynamics of the robot. This coupling entails a redundancy of the measurements that enhances the robustness and the accuracy of the estimation. This estimator was tested on two humanoid robots performing long distance walking on even terrain and non-coplanar multi-contact locomotion.
In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm \epsilon)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graphs, such a data structure requires $\Omega(n^2)$ bits even for constant $\epsilon$. To circumvent this, recent works study $\beta$-balanced graphs, meaning that for every directed cut, the total weight of edges in one direction is at most $\beta$ times that in the other direction. We consider two models: the {\em for-each} model, where the goal is to approximate each cut with constant probability, and the {\em for-all} model, where all cuts must be preserved simultaneously. We improve the previous $\Omega(n \sqrt{\beta/\epsilon})$ lower bound to $\tilde{\Omega}(n \sqrt{\beta}/\epsilon)$ in the for-each model, and we improve the previous $\Omega(n \beta/\epsilon)$ lower bound to $\Omega(n \beta/\epsilon^2)$ in the for-all model. This resolves the main open questions of (Cen et al., ICALP, 2021). The second problem is to approximate the global minimum cut in a local query model, where we can only access the graph via degree, edge, and adjacency queries. We improve the previous $\Omega\bigl(\frac{m}{k}\bigr)$ query complexity lower bound to $\Omega\bigl(\min\{m, \frac{m}{\epsilon^2 k}\}\bigr)$ for this problem, where $m$ is the number of edges, $k$ is the size of the minimum cut, and we seek a $(1+\epsilon)$-approximation. In addition, we show that existing upper bounds with slight modifications match our lower bound up to logarithmic factors.
In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locating and Building, constructed from two video games (Europa Universalis IV and Victoria 3) that have almost the same goal as Decision QA. To address Decision QA effectively, we also propose a new RAG technique called the iterative plan-then-retrieval augmented generation (PlanRAG). Our PlanRAG-based LM generates the plan for decision making as the first step, and the retriever generates the queries for data analysis as the second step. The proposed method outperforms the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario, respectively. We release our code and benchmark at //github.com/myeon9h/PlanRAG.
In this paper, we introduce UniBridge (Cross-Lingual Transfer Learning with Optimized Embeddings and Vocabulary), a comprehensive approach developed to improve the effectiveness of Cross-Lingual Transfer Learning, particularly in languages with limited resources. Our approach tackles two essential elements of a language model: the initialization of embeddings and the optimal vocabulary size. Specifically, we propose a novel embedding initialization method that leverages both lexical and semantic alignment for a language. In addition, we present a method for systematically searching for the optimal vocabulary size, ensuring a balance between model complexity and linguistic coverage. Our experiments across multilingual datasets show that our approach greatly improves the F1-Score in several languages. UniBridge is a robust and adaptable solution for cross-lingual systems in various languages, highlighting the significance of initializing embeddings and choosing the right vocabulary size in cross-lingual environments.
Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.