A countable structure is indivisible if for every coloring with finite range there is a monochromatic isomorphic subcopy of the structure. Each indivisible structure $\mathcal{S}$ naturally corresponds to an indivisibility problem $\mathsf{Ind}\ \mathcal{S}$, which outputs such a subcopy given a presentation and coloring. We investigate the Weihrauch complexity of the indivisibility problems for two structures: the rational numbers $\mathbb{Q}$ as a linear order, and the equivalence relation $\mathscr{E}$ with countably many equivalence classes each having countably many members. We separate the Weihrauch degrees of both $\mathsf{Ind}\ \mathbb{Q}$ and $\mathsf{Ind}\ \mathscr{E}$ from several benchmark problems, showing in particular that $\mathsf{C}_\mathbb{N} \vert_\mathrm{W} \mathsf{Ind}\ \mathbb{Q}$ and hence $\mathsf{Ind}\ \mathbb{Q}$ is strictly weaker than the problem of finding an interval in which some color is dense for a given coloring of $\mathbb{Q}$; and that the Weihrauch degree of $\mathsf{Ind}\ \mathscr{E}_k$ is strictly between those of $\mathsf{SRT}^2_k$ and $\mathsf{RT}^2_k$, where $\mathsf{Ind}\ \mathcal{S}_k$ is the restriction of $\mathsf{Ind}\ \mathcal{S}$ to $k$-colorings.
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
We propose a novel algorithm for the support estimation of partially known Gaussian graphical models that incorporates prior information about the underlying graph. In contrast to classical approaches that provide a point estimate based on a maximum likelihood or a maximum a posteriori criterion using (simple) priors on the precision matrix, we consider a prior on the graph and rely on annealed Langevin diffusion to generate samples from the posterior distribution. Since the Langevin sampler requires access to the score function of the underlying graph prior, we use graph neural networks to effectively estimate the score from a graph dataset (either available beforehand or generated from a known distribution). Numerical experiments demonstrate the benefits of our approach.
Mendelian randomization uses genetic variants as instrumental variables to make causal inferences about the effects of modifiable risk factors on diseases from observational data. One of the major challenges in Mendelian randomization is that many genetic variants are only modestly or even weakly associated with the risk factor of interest, a setting known as many weak instruments. Many existing methods, such as the popular inverse-variance weighted (IVW) method, could be biased when the instrument strength is weak. To address this issue, the debiased IVW (dIVW) estimator, which is shown to be robust to many weak instruments, was recently proposed. However, this estimator still has non-ignorable bias when the effective sample size is small. In this paper, we propose a modified debiased IVW (mdIVW) estimator by multiplying a modification factor to the original dIVW estimator. After this simple correction, we show that the bias of the mdIVW estimator converges to zero at a faster rate than that of the dIVW estimator under some regularity conditions. Moreover, the mdIVW estimator has smaller variance than the dIVW estimator.We further extend the proposed method to account for the presence of instrumental variable selection and balanced horizontal pleiotropy. We demonstrate the improvement of the mdIVW estimator over the dIVW estimator through extensive simulation studies and real data analysis.
Inferring parameters of a latent variable model can be a daunting task when the conditional distribution of the latent variables given the observed ones is intractable. Variational approaches prove to be computationally efficient but, possibly, lack theoretical guarantees on the estimates, while sampling based solutions are quite the opposite. Starting from already available variational approximations, we define a first Monte Carlo EM algorithm to obtain maximum likelihood estimators, focusing on the Poisson log-normal model which provides a generic framework for the analysis of multivariate count data. We then extend this algorithm to the case of a composite likelihood in order to be able to handle higher dimensional count data.
The expansion of a polytope is an important parameter for the analysis of the random walks on its graph. A conjecture of Mihai and Vazirani states that all $0/1$-polytopes have expansion at least 1. We show that the generalization to half-integral polytopes does not hold by constructing $d$-dimensional half-integral polytopes whose expansion decreases exponentially fast with $d$. We also prove that the expansion of half-integral zonotopes is uniformly bounded away from $0$. As an intermediate result, we show that half-integral zonotopes are always graphical.
Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used to answer research questions related to non-linear changes in cognitive aging: the linear mixed model (LMM) with a quadratic term, LMM with splines, the functional mixed model, the piecewise linear mixed model, and the sigmoidal mixed model. We first theoretically describe the models. Next, using data from two prospective cohorts with annual cognitive testing, we compared the interpretation of the models by investigating associations of education on cognitive change before death. Lastly, we performed a simulation study to empirically evaluate the models and provide practical recommendations. Except for the LMM-quadratic, the fit of all models was generally adequate to capture non-linearity of cognitive change and models were relatively robust. Although spline-based models have no interpretable nonlinearity parameters, their convergence was easier to achieve, and they allow graphical interpretation. In contrast, piecewise and sigmoidal models, with interpretable non-linear parameters, may require more data to achieve convergence.
Electromagnetic forming and perforations (EMFP) are complex and innovative high strain rate processes that involve electromagnetic-mechanical interactions for simultaneous metal forming and perforations. Instead of spending costly resources on repetitive experimental work, a properly designed numerical model can be effectively used for detailed analysis and characterization of the complex process. A coupled finite element (FE) model is considered for analyzing the multi-physics of the EMFP because of its robustness and improved accuracy. In this work, a detailed understanding of the process has been achieved by numerically simulating forming and perforations of Al6061-T6 tube for 12 holes and 36 holes with two different punches, i.e., pointed and concave punches using Ls-Dyna software. In order to shed light on EMFP physics, a comparison between experimental data and the formulated numerical simulation has been carried out to compare the average hole diameter and the number of perforated holes, for different types of punches and a range of discharge energies. The simulated results show acceptable agreement with experimental studies, with maximum deviations being less than or equal to 6%, which clearly illustrates the efficacy and capability of the developed coupled Multi-physics FE model.
Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute the source string efficiently, having a tradeoff between scalability and avoiding information loss. The scalable de Bruijn graphs come at the price of losing crucial overlap information. The complete overlap information is stored in overlap graphs using quadratic space. Hierarchical overlap graphs [IPL20] (HOG) overcome these limitations, avoiding information loss despite using linear space. After a series of suboptimal improvements, Khan and Park et al. simultaneously presented two optimal algorithms [CPM2021], where only the former was seemingly practical. We empirically analyze all the practical algorithms for computing HOG, where the optimal algorithm [CPM2021] outperforms the previous algorithms as expected, though at the expense of extra memory. However, it uses non-intuitive approach and non-trivial data structures. We present arguably the most intuitive algorithm, using only elementary arrays, which is also optimal. Our algorithm empirically proves even better for both time and memory over all the algorithms, highlighting its significance in both theory and practice. We further explore the applications of hierarchical overlap graphs to solve various forms of suffix-prefix queries on a set of strings. Loukides et al. [CPM2023] recently presented state-of-the-art algorithms for these queries. However, these algorithms require complex black-box data structures and are seemingly impractical. Our algorithms, despite failing to match the state-of-the-art algorithms theoretically, answer different queries ranging from 0.01-100 milliseconds for a data set having around a billion characters.
The scale function holds significant importance within the fluctuation theory of Levy processes, particularly in addressing exit problems. However, its definition is established through the Laplace transform, thereby lacking explicit representations in general. This paper introduces a novel series representation for this scale function, employing Laguerre polynomials to construct a uniformly convergent approximate sequence. Additionally, we derive statistical inference based on specific discrete observations, presenting estimators of scale functions that are asymptotically normal.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.