We develop a theory for the representation of opaque solids as volumetric models. Starting from a stochastic representation of opaque solids as random indicator functions, we prove the conditions under which such solids can be modeled using exponential volumetric transport. We also derive expressions for the volumetric attenuation coefficient as a functional of the probability distributions of the underlying indicator functions. We generalize our theory to account for isotropic and anisotropic scattering at different parts of the solid, and for representations of opaque solids as implicit surfaces. We derive our volumetric representation from first principles, which ensures that it satisfies physical constraints such as reciprocity and reversibility. We use our theory to explain, compare, and correct previous volumetric representations, as well as propose meaningful extensions that lead to improved performance in 3D reconstruction tasks.
In this paper, we propose to consider various models of pattern recognition. At the same time, it is proposed to consider models in the form of two operators: a recognizing operator and a decision rule. Algebraic operations are introduced on recognizing operators, and based on the application of these operators, a family of recognizing algorithms is created. An upper estimate is constructed for the model, which guarantees the completeness of the extension.
We develop new multilevel Monte Carlo (MLMC) methods to estimate the expectation of the smallest eigenvalue of a stochastic convection-diffusion operator with random coefficients. The MLMC method is based on a sequence of finite element (FE) discretizations of the eigenvalue problem on a hierarchy of increasingly finer meshes. For the discretized, algebraic eigenproblems we use both the Rayleigh quotient (RQ) iteration and implicitly restarted Arnoldi (IRA), providing an analysis of the cost in each case. By studying the variance on each level and adapting classical FE error bounds to the stochastic setting, we are able to bound the total error of our MLMC estimator and provide a complexity analysis. As expected, the complexity bound for our MLMC estimator is superior to plain Monte Carlo. To improve the efficiency of the MLMC further, we exploit the hierarchy of meshes and use coarser approximations as starting values for the eigensolvers on finer ones. To improve the stability of the MLMC method for convection-dominated problems, we employ two additional strategies. First, we consider the streamline upwind Petrov--Galerkin formulation of the discrete eigenvalue problem, which allows us to start the MLMC method on coarser meshes than is possible with standard FEs. Second, we apply a homotopy method to add stability to the eigensolver for each sample. Finally, we present a multilevel quasi-Monte Carlo method that replaces Monte Carlo with a quasi-Monte Carlo (QMC) rule on each level. Due to the faster convergence of QMC, this improves the overall complexity. We provide detailed numerical results comparing our different strategies to demonstrate the practical feasibility of the MLMC method in different use cases. The results support our complexity analysis and further demonstrate the superiority over plain Monte Carlo in all cases.
The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which allows a granular evaluation of text-generative vision-language models and their comparison with discriminative vision-language models. To improve the assessment of coarse answers on fine-grained classification tasks, we suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category. Finally, we compare traditional NLP and LLM-based metrics for the problem of evaluating model predictions given ground-truth answers. We perform a human evaluation study upon which we base our decision on the final metric. We apply our benchmark to a suite of vision-language models and show a detailed comparison of their abilities on object, action, and attribute classification. Our contributions aim to lay the foundation for more precise and meaningful assessments, facilitating targeted progress in the exciting field of vision-language modeling.
Multi-sequence magnetic resonance imaging (MRI) has found wide applications in both modern clinical studies and deep learning research. However, in clinical practice, it frequently occurs that one or more of the MRI sequences are missing due to different image acquisition protocols or contrast agent contraindications of patients, limiting the utilization of deep learning models trained on multi-sequence data. One promising approach is to leverage generative models to synthesize the missing sequences, which can serve as a surrogate acquisition. State-of-the-art methods tackling this problem are based on convolutional neural networks (CNN) which usually suffer from spectral biases, resulting in poor reconstruction of high-frequency fine details. In this paper, we propose Conditional Neural fields with Shift modulation (CoNeS), a model that takes voxel coordinates as input and learns a representation of the target images for multi-sequence MRI translation. The proposed model uses a multi-layer perceptron (MLP) instead of a CNN as the decoder for pixel-to-pixel mapping. Hence, each target image is represented as a neural field that is conditioned on the source image via shift modulation with a learned latent code. Experiments on BraTS 2018 and an in-house clinical dataset of vestibular schwannoma patients showed that the proposed method outperformed state-of-the-art methods for multi-sequence MRI translation both visually and quantitatively. Moreover, we conducted spectral analysis, showing that CoNeS was able to overcome the spectral bias issue common in conventional CNN models. To further evaluate the usage of synthesized images in clinical downstream tasks, we tested a segmentation network using the synthesized images at inference.
We prove explicit uniform two-sided bounds for the phase functions of Bessel functions and of their derivatives. As a consequence, we obtain new enclosures for the zeros of Bessel functions and their derivatives in terms of inverse values of some elementary functions. These bounds are valid, with a few exceptions, for all zeros and all Bessel functions with non-negative indices. We provide numerical evidence showing that our bounds either improve or closely match the best previously known ones.
This paper presents a procedure to add broader diversity at the beginning of the evolutionary process. It consists of creating two initial populations with different parameter settings, evolving them for a small number of generations, selecting the best individuals from each population in the same proportion and combining them to constitute a new initial population. At this point the main loop of an evolutionary algorithm is applied to the new population. The results show that our proposal considerably improves both the efficiency of previous methodologies and also, significantly, their efficacy in most of the data sets. We have carried out our experimentation on twelve data sets from the UCI repository and two complex real-world problems which differ in their number of instances, features and classes.
Complex interval arithmetic is a powerful tool for the analysis of computational errors. The naturally arising rectangular, polar, and circular (together called primitive) interval types are not closed under simple arithmetic operations and their use yields overly relaxed bounds. The later introduced polygonal type, on the other hand, allows for arbitrarily precise representaion of the above operations for a higher computational cost. We propose the polyarcular interval type as an effective extension of the previous types. The polyarcular interval can represent all primitive intervals and most of their arithmetic combinations precisely and has a approximation capability competing with that of the polygonal interval. In particular, in antenna tolerance analysis it can achieve perfect accuracy for lower computational cost then the polygonal type, which we show in a relevant case study. In this paper, we present a rigorous analysis of the arithmetic properties of all five interval types, involving a new algebro-geometric method of boundary analysis.
We present a generic framework for gradient reconstruction schemes on unstructured meshes using the notion of a dyadic sum-vector product. The proposed formulation reconstructs centroidal gradients of a scalar from its directional derivatives along specific directions in a suitably defined neighbourhood. We show that existing gradient reconstruction schemes can be encompassed within this framework by a suitable choice of the geometric vectors that define the dyadic sum tensor. The proposed framework also allows us to re-interpret certain hybrid schemes, which might not be derivable through traditional routes. Additionally, a generalization of flexible gradient schemes is proposed that can be employed to enhance the robustness of consistent gradient schemes without compromising on the accuracy of the computed gradients.
We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form of a pair of coupled partial differential equations: a forward-controlled Fokker-Planck (FP) equation, and a backward Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation. Second, with the mathematical structure at hand, we present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs in terms of training samples and training time. In addition, the WPO-informed kernel model is explicitly constructed to avoid the recently studied memorization effects of score-based generative models. The mathematical form of the new kernel-based models in combination with the use of the terminal condition of the MFG reveals new explanations for the manifold learning and generalization properties of SGMs, and provides a resolution to their memorization effects. Finally, our mathematically informed, interpretable kernel-based model suggests new scalable bespoke neural network architectures for high-dimensional applications.
We provide a new theoretical framework for the variable-step deferred correction (DC) methods based on the well-known BDF2 formula. By using the discrete orthogonal convolution kernels, some high-order BDF2-DC methods are proven to be stable on arbitrary time grids according to the recent definition of stability (SINUM, 60: 2253-2272). It significantly relaxes the existing step-ratio restrictions for the BDF2-DC methods (BIT, 62: 1789-1822). The associated sharp error estimates are established by taking the numerical effects of the starting approximations into account, and they suggest that the BDF2-DC methods have no aftereffect, that is, the lower-order starting scheme for the BDF2 scheme will not cause a loss in the accuracy of the high-order BDF2-DC methods. Extensive tests on the graded and random time meshes are presented to support the new theory.