Regularization of inverse problems is of paramount importance in computational imaging. The ability of neural networks to learn efficient image representations has been recently exploited to design powerful data-driven regularizers. While state-of-the-art plug-and-play methods rely on an implicit regularization provided by neural denoisers, alternative Bayesian approaches consider Maximum A Posteriori (MAP) estimation in the latent space of a generative model, thus with an explicit regularization. However, state-of-the-art deep generative models require a huge amount of training data compared to denoisers. Besides, their complexity hampers the optimization of the latent MAP. In this work, we propose to use compressive autoencoders for latent estimation. These networks, which can be seen as variational autoencoders with a flexible latent prior, are smaller and easier to train than state-of-the-art generative models. We then introduce the Variational Bayes Latent Estimation (VBLE) algorithm, which performs this estimation within the framework of variational inference. This allows for fast and easy (approximate) posterior sampling. Experimental results on image datasets BSD and FFHQ demonstrate that VBLE reaches similar performance than state-of-the-art plug-and-play methods, while being able to quantify uncertainties faster than other existing posterior sampling techniques.
The recent progress in artificial intelligence has led to an ever-increasing usage of images and videos by machine analysis algorithms, mainly neural networks. Nonetheless, compression, storage and transmission of media have traditionally been designed considering human beings as the viewers of the content. Recent research on image and video coding for machine analysis has progressed mainly in two almost orthogonal directions. The first is represented by end-to-end (E2E) learned codecs which, while offering high performance on image coding, are not yet on par with state-of-the-art conventional video codecs and lack interoperability. The second direction considers using the Versatile Video Coding (VVC) standard or any other conventional video codec (CVC) together with pre- and post-processing operations targeting machine analysis. While the CVC-based methods benefit from interoperability and broad hardware and software support, the machine task performance is often lower than the desired level, particularly in low bitrates. This paper proposes a hybrid codec for machines called NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bj{\o}ntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks. To the best of our knowledge, this is the first research paper showing a hybrid video codec that outperforms VVC on multiple datasets and multiple machine vision tasks.
The interpretability of models has become a crucial issue in Machine Learning because of algorithmic decisions' growing impact on real-world applications. Tree ensemble methods, such as Random Forests or XgBoost, are powerful learning tools for classification tasks. However, while combining multiple trees may provide higher prediction quality than a single one, it sacrifices the interpretability property resulting in "black-box" models. In light of this, we aim to develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. First, given a target tree-ensemble model, we develop a hierarchical visualization tool based on a heatmap representation of the forest's feature use, considering the frequency of a feature and the level at which it is selected as an indicator of importance. Next, we propose a mixed-integer linear programming (MILP) formulation for constructing a single optimal multivariate tree that accurately mimics the target model predictions. The goal is to provide an interpretable surrogate model based on oblique hyperplane splits, which uses only the most relevant features according to the defined forest's importance indicators. The MILP model includes a penalty on feature selection based on their frequency in the forest to further induce sparsity of the splits. The natural formulation has been strengthened to improve the computational performance of {mixed-integer} software. Computational experience is carried out on benchmark datasets from the UCI repository using a state-of-the-art off-the-shelf solver. Results show that the proposed model is effective in yielding a shallow interpretable tree approximating the tree-ensemble decision function.
Multivariate spatio-temporal models are widely applicable, but specifying their structure is complicated and may inhibit wider use. We introduce the R package tinyVAST from two viewpoints: the software user and the statistician. From the user viewpoint, tinyVAST adapts a widely used formula interface to specify generalized additive models, and combines this with arguments to specify spatial and spatio-temporal interactions among variables. These interactions are specified using arrow notation (from structural equation models), or an extended arrow-and-lag notation that allows simultaneous, lagged, and recursive dependencies among variables over time. The user also specifies a spatial domain for areal (gridded), continuous (point-count), or stream-network data. From the statistician viewpoint, tinyVAST constructs sparse precision matrices representing multivariate spatio-temporal variation, and parameters are estimated by specifying a generalized linear mixed model (GLMM). This expressive interface encompasses vector autoregressive, empirical orthogonal functions, spatial factor analysis, and ARIMA models. To demonstrate, we fit to data from two survey platforms sampling corals, sponges, rockfishes, and flatfishes in the Gulf of Alaska and Aleutian Islands. We then compare eight alternative model structures using different assumptions about habitat drivers and survey detectability. Model selection suggests that towed-camera and bottom trawl gears have spatial variation in detectability but sample the same underlying density of flatfishes and rockfishes, and that rockfishes are positively associated with sponges while flatfishes are negatively associated with corals. We conclude that tinyVAST can be used to test complicated dependencies representing alternative structural assumptions for research and real-world policy evaluation.
While deep neural networks have achieved remarkable performance, data augmentation has emerged as a crucial strategy to mitigate overfitting and enhance network performance. These techniques hold particular significance in industrial manufacturing contexts. Recently, image mixing-based methods have been introduced, exhibiting improved performance on public benchmark datasets. However, their application to industrial tasks remains challenging. The manufacturing environment generates massive amounts of unlabeled data on a daily basis, with only a few instances of abnormal data occurrences. This leads to severe data imbalance. Thus, creating well-balanced datasets is not straightforward due to the high costs associated with labeling. Nonetheless, this is a crucial step for enhancing productivity. For this reason, we introduce ContextMix, a method tailored for industrial applications and benchmark datasets. ContextMix generates novel data by resizing entire images and integrating them into other images within the batch. This approach enables our method to learn discriminative features based on varying sizes from resized images and train informative secondary features for object recognition using occluded images. With the minimal additional computation cost of image resizing, ContextMix enhances performance compared to existing augmentation techniques. We evaluate its effectiveness across classification, detection, and segmentation tasks using various network architectures on public benchmark datasets. Our proposed method demonstrates improved results across a range of robustness tasks. Its efficacy in real industrial environments is particularly noteworthy, as demonstrated using the passive component dataset.
This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.
Full-reference image quality metrics (FR-IQMs) aim to measure the visual differences between a pair of reference and distorted images, with the goal of accurately predicting human judgments. However, existing FR-IQMs, including traditional ones like PSNR and SSIM and even perceptual ones such as HDR-VDP, LPIPS, and DISTS, still fall short in capturing the complexities and nuances of human perception. In this work, rather than devising a novel IQM model, we seek to improve upon the perceptual quality of existing FR-IQM methods. We achieve this by considering visual masking, an important characteristic of the human visual system that changes its sensitivity to distortions as a function of local image content. Specifically, for a given FR-IQM metric, we propose to predict a visual masking model that modulates reference and distorted images in a way that penalizes the visual errors based on their visibility. Since the ground truth visual masks are difficult to obtain, we demonstrate how they can be derived in a self-supervised manner solely based on mean opinion scores (MOS) collected from an FR-IQM dataset. Our approach results in enhanced FR-IQM metrics that are more in line with human prediction both visually and quantitatively.
The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. In this paper, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.
We use tropical algebras as platforms for a very efficient digital signature protocol. Security relies on computational hardness of factoring one-variable tropical polynomials; this problem is known to be NP-hard.
Using well-known mathematical problems for encryption is a widely used technique because they are computationally hard and provide security against potential attacks on the encryption method. The subset sum problem (SSP) can be defined as finding a subset of integers from a given set, whose sum is equal to a specified integer. The classic SSP has various variants, one of which is the multiple-subset problem (MSSP). In the MSSP, the goal is to select items from a given set and distribute them among multiple bins, en-suring that the capacity of each bin is not exceeded while maximizing the total weight of the selected items. This approach addresses a related problem with a different perspective. Here a related different kind of problem is approached: given a set of sets A={A1, A2..., An}, find an integer s for which every subset of the given sets is summed up to, if such an integer exists. The problem is NP-complete when considering it as a variant of SSP. However, there exists an algorithm that is relatively efficient for known pri-vate keys. This algorithm is based on dispensing non-relevant values of the potential sums. In this paper we present the encryption scheme based on MSSP and present its novel usage and implementation in communication.
The optimal one-sided parametric polynomial approximants of a circular arc are considered. More precisely, the approximant must be entirely in or out of the underlying circle of an arc. The natural restriction to an arc's approximants interpolating boundary points is assumed. However, the study of approximants, which additionally interpolate corresponding tangent directions and curvatures at the boundary of an arc, is also considered. Several low-degree polynomial approximants are studied in detail. When several solutions fulfilling the interpolation conditions exist, the optimal one is characterized, and a numerical algorithm for its construction is suggested. Theoretical results are demonstrated with several numerical examples and a comparison with general (i.e. non-one-sided) approximants are provided.