This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algorithms using the binary cross-entropy loss achieve first-order bounds scaling with the optimal policy's cost and are much more efficient than the commonly used squared loss. Moreover, we prove that distributional algorithms using the maximum likelihood loss achieve second-order bounds scaling with the policy variance and are even sharper than first-order bounds. This in particular proves the benefits of distributional RL. We hope that this paper serves as a guide analyzing decision making algorithms with varying loss functions, and can inspire the reader to seek out better loss functions to improve any decision making algorithm.
This paper studies identification and estimation of average causal effects, such as average marginal or treatment effects, in fixed effects logit models with short panels. Relating the identified set of these effects to an extremal moment problem, we first show how to obtain sharp bounds on such effects simply, without any optimization. We also consider even simpler outer bounds, which, contrary to the sharp bounds, do not require any first-step nonparametric estimators. We build confidence intervals based on these two approaches and show their asymptotic validity. Monte Carlo simulations suggest that both approaches work well in practice, the second being typically competitive in terms of interval length. Finally, we show that our method is also useful to measure treatment effect heterogeneity.
Numerous studies have focused on learning and understanding the dynamics of physical systems from video data, such as spatial intelligence. Artificial intelligence requires quantitative assessments of the uncertainty of the model to ensure reliability. However, there is still a relative lack of systematic assessment of the uncertainties, particularly the uncertainties of the physical data. Our motivation is to introduce conformal prediction into the uncertainty assessment of dynamical systems, providing a method supported by theoretical guarantees. This paper uses the conformal prediction method to assess uncertainties with benchmark operator learning methods. We have also compared the Monte Carlo Dropout and Ensemble methods in the partial differential equations dataset, effectively evaluating uncertainty through straight roll-outs, making it ideal for time-series tasks.
This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under exponential family distributions.
This paper presents a mathematical formulation to perform temporal parallelisation of continuous-time optimal control problems, which can be solved via the Hamilton--Jacobi--Bellman (HJB) equation. We divide the time interval of the control problem into sub-intervals, and define a control problem in each sub-interval, conditioned on the start and end states, leading to conditional value functions for the sub-intervals. By defining an associative operator as the minimisation of the sum of conditional value functions, we obtain the elements and associative operators for a parallel associative scan operation. This allows for solving the optimal control problem on the whole time interval in parallel in logarithmic time complexity in the number of sub-intervals. We derive the HJB-type of backward and forward equations for the conditional value functions and solve them in closed form for linear quadratic problems. We also discuss numerical methods for computing the conditional value functions. The computational advantages of the proposed parallel methods are demonstrated via simulations run on a multi-core central processing unit and a graphics processing unit.
Use real word data to evaluate the performance of the electrocardiographic markers of GEH as features in a machine learning model with Standard ECG features and Risk Factors in Predicting Outcome of patients in a population referred to a tertiary cardiology hospital. Patients forwarded to specific evaluation in a cardiology specialized hospital performed an ECG and a risk factor anamnesis. A series of follow up attendances occurred in periods of 6 months, 12 months and 15 months to check for cardiovascular related events (mortality or new nonfatal cardiovascular events (Stroke, MI, PCI, CS), as identified during 1-year phone follow-ups. The first attendance ECG was measured by a specialist and processed in order to obtain the global electric heterogeneity (GEH) using the Kors Matriz. The ECG measurements, GEH parameters and risk factors were combined for training multiple instances of XGBoost decision trees models. Each instance were optmized for the AUCPR and the instance with higher AUC is chosen as representative to the model. The importance of each parameter for the winner tree model was compared to better understand the improvement from using GEH parameters. The GEH parameters turned out to have statistical significance for this population specially the QRST angle and the SVG. The combined model with the tree parameters class had the best performance. The findings suggest that using VCG features can facilitate more accurate identification of patients who require tertiary care, thereby optimizing resource allocation and improving patient outcomes. Moreover, the decision tree model's transparency and ability to pinpoint critical features make it a valuable tool for clinical decision-making and align well with existing clinical practices.
Historical documents encompass a wealth of cultural treasures but suffer from severe damages including character missing, paper damage, and ink erosion over time. However, existing document processing methods primarily focus on binarization, enhancement, etc., neglecting the repair of these damages. To this end, we present a new task, termed Historical Document Repair (HDR), which aims to predict the original appearance of damaged historical documents. To fill the gap in this field, we propose a large-scale dataset HDR28K and a diffusion-based network DiffHDR for historical document repair. Specifically, HDR28K contains 28,552 damaged-repaired image pairs with character-level annotations and multi-style degradations. Moreover, DiffHDR augments the vanilla diffusion framework with semantic and spatial information and a meticulously designed character perceptual loss for contextual and visual coherence. Experimental results demonstrate that the proposed DiffHDR trained using HDR28K significantly surpasses existing approaches and exhibits remarkable performance in handling real damaged documents. Notably, DiffHDR can also be extended to document editing and text block generation, showcasing its high flexibility and generalization capacity. We believe this study could pioneer a new direction of document processing and contribute to the inheritance of invaluable cultures and civilizations. The dataset and code is available at //github.com/yeungchenwa/HDR.
With the increasing availability of high-dimensional data, analysts often rely on exploratory data analysis to understand complex data sets. A key approach to exploring such data is dimensionality reduction, which embeds high-dimensional data in two dimensions to enable visual exploration. However, popular embedding techniques, such as t-SNE and UMAP, typically assume that data points are independent. When this assumption is violated, as in time-series data, the resulting visualizations may fail to reveal important temporal patterns and trends. To address this, we propose a formal extension to existing dimensionality reduction methods that incorporates two temporal loss terms that explicitly highlight temporal progression in the embedded visualizations. Through a series of experiments on both synthetic and real-world datasets, we demonstrate that our approach effectively uncovers temporal patterns and improves the interpretability of the visualizations. Furthermore, the method improves temporal coherence while preserving the fidelity of the embeddings, providing a robust tool for dynamic data analysis.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
Deep neural networks have revolutionized many machine learning tasks in power systems, ranging from pattern recognition to signal processing. The data in these tasks is typically represented in Euclidean domains. Nevertheless, there is an increasing number of applications in power systems, where data are collected from non-Euclidean domains and represented as the graph-structured data with high dimensional features and interdependency among nodes. The complexity of graph-structured data has brought significant challenges to the existing deep neural networks defined in Euclidean domains. Recently, many studies on extending deep neural networks for graph-structured data in power systems have emerged. In this paper, a comprehensive overview of graph neural networks (GNNs) in power systems is proposed. Specifically, several classical paradigms of GNNs structures (e.g., graph convolutional networks, graph recurrent neural networks, graph attention networks, graph generative networks, spatial-temporal graph convolutional networks, and hybrid forms of GNNs) are summarized, and key applications in power systems such as fault diagnosis, power prediction, power flow calculation, and data generation are reviewed in detail. Furthermore, main issues and some research trends about the applications of GNNs in power systems are discussed.