In this paper we define a class of polynomial functors suited for constructing coalgebras representing processes in which uncertainty plays an important role. In these polynomial functors we include upper and lower probability measures, finitely additive probability measures, plausibilty measures (and their duals, belief functions), and possibility measures. We give axioms and inference rules for the associated system of coalgebraic modal logic, and construct the canonical coalgebras to prove a completeness result.
We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis remains bounded and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under mild assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.
In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it leads to suboptimal performance in complex datasets with rich features. To address this problem, we introduce Tabular Feature Weighting with Transformer, a novel feature weighting approach for tabular data. Our method adopts Transformer to capture complex feature dependencies and contextually assign appropriate weights to discrete and continuous features. Besides, we employ a reinforcement learning strategy to further fine-tune the weighting process. Our extensive experimental results across various real-world datasets and diverse downstream tasks show the effectiveness of TFWT and highlight the potential for enhancing feature weighting in tabular data analysis.
In this paper, we address the segmentation of tumor subtypes in whole slide images (WSI) by utilizing incomplete label proportions. Specifically, we utilize `partial' label proportions, which give the proportions among tumor subtypes but do not give the proportion between tumor and non-tumor. Partial label proportions are recorded as the standard diagnostic information by pathologists, and we, therefore, want to use them for realizing the segmentation model that can classify each WSI patch into one of the tumor subtypes or non-tumor. We call this problem ``learning from partial label proportions (LPLP)'' and formulate the problem as a weakly supervised learning problem. Then, we propose an efficient algorithm for this challenging problem by decomposing it into two weakly supervised learning subproblems: multiple instance learning (MIL) and learning from label proportions (LLP). These subproblems are optimized efficiently in the end-to-end manner. The effectiveness of our algorithm is demonstrated through experiments conducted on two WSI datasets.
In this paper, we propose an optimal sequential procedure for the early detection of potential side effects resulting from the administration of some treatment (e.g. a vaccine, say). The results presented here extend previous results obtained in Wang and Boukai (2024) who study the single side effect case to the case of two (or more) side effects. While the sequential procedure we employ, simultaneously monitors several of the treatment's side effects, the $(\alpha, \beta)$-optimal test we propose does not require any information about the inter-correlation between these potential side effects. However, in all of the subsequent analyses, including the derivations of the exact expressions of the Average Sample Number (ASN), the Power function, and the properties of the post-test (or post-detection) estimators, we accounted specifically, for the correlation between the potential side effects. In the real-life application (such as post-marketing surveillance), the number of available observations is large enough to justify asymptotic analyses of the sequential procedure (testing and post-detection estimation) properties. Accordingly, we also derive the consistency and asymptotic normality of our post-test estimators; results which enable us to also provide (asymptotic, post-detection) confidence intervals for the probabilities of various side-effects. Moreover, to compare two specific side effects, their relative risk plays an important role. We derive the distribution of the estimated relative risk in the asymptotic framework to provide appropriate inference. To illustrate the theoretical results presented, we provide two detailed examples based on the data of side effects on COVID-19 vaccine collected in Nigeria (see Nigeria (see Ilori et al. (2022)).
In this paper we present the probabilistic typed natural deduction calculus TPTND, designed to reason about and derive trustworthiness properties of probabilistic computational processes, like those underlying current AI applications. Derivability in TPTND is interpreted as the process of extracting $n$ samples of possibly complex outputs with a certain frequency from a given categorical distribution. We formalize trust for such outputs as a form of hypothesis testing on the distance between such frequency and the intended probability. The main advantage of the calculus is to render such notion of trustworthiness checkable. We present a computational semantics for the terms over which we reason and then the semantics of TPTND, where logical operators as well as a Trust operator are defined through introduction and elimination rules. We illustrate structural and metatheoretical properties, with particular focus on the ability to establish under which term evolutions and logical rules applications the notion of trustworhtiness can be preserved.
Given the ubiquity of non-separable optimization problems in real worlds, in this paper we analyze and extend the large-scale version of the well-known cooperative coevolution (CC), a divide-and-conquer black-box optimization framework, on non-separable functions. First, we reveal empirical reasons of when decomposition-based methods are preferred or not in practice on some non-separable large-scale problems, which have not been clearly pointed out in many previous CC papers. Then, we formalize CC to a continuous-game model via simplification, but without losing its essential property. Different from previous evolutionary game theory for CC, our new model provides a much simpler but useful viewpoint to analyze its convergence, since only the pure Nash equilibrium concept is needed and more general fitness landscapes can be explicitly considered. Based on convergence analyses, we propose a hierarchical decomposition strategy for better generalization, as for any decomposition, there is a risk of getting trapped into a suboptimal Nash equilibrium. Finally, we use powerful distributed computing to accelerate it under the recent multi-level learning framework, which combines the fine-tuning ability from decomposition with the invariance property of CMA-ES. Experiments on a set of high-dimensional test functions validate both its search performance and scalability (w.r.t. CPU cores) on a clustering computing platform with 400 CPU cores.
We present a novel method for generating sequential parameter estimates and quantifying epistemic uncertainty in dynamical systems within a data-consistent (DC) framework. The DC framework differs from traditional Bayesian approaches due to the incorporation of the push-forward of an initial density, which performs selective regularization in parameter directions not informed by the data in the resulting updated density. This extends a previous study that included the linear Gaussian theory within the DC framework and introduced the maximal updated density (MUD) estimate as an alternative to both least squares and maximum a posterior (MAP) estimates. In this work, we introduce algorithms for operational settings of MUD estimation in real or near-real time where spatio-temporal datasets arrive in packets to provide updated estimates of parameters and identify potential parameter drift. Computational diagnostics within the DC framework prove critical for evaluating (1) the quality of the DC update and MUD estimate and (2) the detection of parameter value drift. The algorithms are applied to estimate (1) wind drag parameters in a high-fidelity storm surge model, (2) thermal diffusivity field for a heat conductivity problem, and (3) changing infection and incubation rates of an epidemiological model.
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. Here, regularized autoencoders provide a popular approach by learning the identity mapping on the set of normal examples, while trying to prevent good reconstruction on points outside of the manifold. Typically, this goal is implemented by controlling the capacity of the model, either directly by reducing the size of the bottleneck layer or implicitly by imposing some sparsity (or contraction) constraints on parts of the corresponding network. However, neither of these techniques does explicitly penalize the reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. Informally, our training objective regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter that removes anomalous patterns. To support this intuition, we perform a rigorous formal analysis of the proposed method and provide a number of interesting insights. In particular, we show that the resulting model resembles a non-linear orthogonal projection of partially corrupted images onto the submanifold of uncorrupted samples. On the other hand, we identify the orthogonal projection as an optimal solution for a number of regularized autoencoders including the contractive and denoising variants. We support our theoretical analysis by empirical evaluation of the resulting detection and localization performance of the proposed method. In particular, we achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.