Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system's causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.
Optical Music Recognition is a field that has progressed significantly, bringing accurate systems that transcribe effectively music scores into digital formats. Despite this, there are still several limitations that hinder OMR from achieving its full potential. Specifically, state of the art OMR still depends on multi-stage pipelines for performing full-page transcription, as well as it has only been demonstrated in monophonic cases, leaving behind very relevant engravings. In this work, we present the Sheet Music Transformer++, an end-to-end model that is able to transcribe full-page polyphonic music scores without the need of a previous Layout Analysis step. This is done thanks to an extensive curriculum learning-based pretraining with synthetic data generation. We conduct several experiments on a full-page extension of a public polyphonic transcription dataset. The experimental outcomes confirm that the model is competent at transcribing full-page pianoform scores, marking a noteworthy milestone in end-to-end OMR transcription.
In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence. Furthermore, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of an objective function corresponding to a novel $f$-divergence referred to as shifted log (SL). We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.
Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.
Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To address these issues, we introduce a simple yet practical objective function for time series explainable learning. The design of the objective function builds upon the principle of information bottleneck (IB), and modifies the IB objective function to avoid trivial solutions and distributional shift issues. We further present TimeX++, a novel explanation framework that leverages a parametric network to produce explanation-embedded instances that are both in-distributed and label-preserving. We evaluate TimeX++ on both synthetic and real-world datasets comparing its performance against leading baselines, and validate its practical efficacy through case studies in a real-world environmental application. Quantitative and qualitative evaluations show that TimeX++ outperforms baselines across all datasets, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{//github.com/zichuan-liu/TimeXplusplus}.
Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.
Tensors serve as a crucial tool in the representation and analysis of complex, multi-dimensional data. As data volumes continue to expand, there is an increasing demand for developing optimization algorithms that can directly operate on tensors to deliver fast and effective computations. Many problems in real-world applications can be formulated as the task of recovering high-order tensors characterized by sparse and/or low-rank structures. In this work, we propose novel Kaczmarz algorithms with a power of the $\ell_1$-norm regularization for reconstructing high-order tensors by exploiting sparsity and/or low-rankness of tensor data. In addition, we develop both a block and an accelerated variant, along with a thorough convergence analysis of these algorithms. A variety of numerical experiments on both synthetic and real-world datasets demonstrate the effectiveness and significant potential of the proposed methods in image and video processing tasks, such as image sequence destriping and video deconvolution.
We show that computing the total variation distance between two product distributions is $\#\mathsf{P}$-complete. This is in stark contrast with other distance measures such as Kullback-Leibler, Chi-square, and Hellinger, which tensorize over the marginals leading to efficient algorithms.
Two typical fixed-length random number generation problems in information theory are considered for general sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given approximation measure is one of our main concerns and has been characterized using two different information quantities: the information spectrum and the smooth R\'enyi entropy. Recently, optimum achievable rates with respect to $f$-divergences have been characterized using the information spectrum quantity. The $f$-divergence is a general non-negative measure between two probability distributions on the basis of a convex function $f$. The class of f-divergences includes several important measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence, it is meaningful to consider the random number generation problems with respect to $f$-divergences. However, optimum achievable rates with respect to $f$-divergences using the smooth R\'enyi entropy have not been clarified yet in both of two problems. In this paper we try to analyze the optimum achievable rates using the smooth R\'enyi entropy and to extend the class of $f$-divergence. To do so, we first derive general formulas of the first-order optimum achievable rates with respect to $f$-divergences in both problems under the same conditions as imposed by previous studies. Next, we relax the conditions on $f$-divergence and generalize the obtained general formulas. Then, we particularize our general formulas to several specified functions $f$. As a result, we reveal that it is easy to derive optimum achievable rates for several important measures from our general formulas. Furthermore, a kind of duality between the resolvability and the intrinsic randomness is revealed in terms of the smooth R\'enyi entropy.
The problem of computing $\alpha$-capacity for $\alpha>1$ is equivalent to that of computing the correct decoding exponent. Various algorithms for computing them have been proposed, such as Arimoto and Jitsumatsu--Oohama algorithm. In this study, we propose a novel alternating optimization algorithm for computing the $\alpha$-capacity for $\alpha>1$ based on a variational characterization of the Augustin--Csisz{\'a}r mutual information. A comparison of the convergence performance of these algorithms is demonstrated through numerical examples.
Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.