Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions, and suggest potential improvements.
The notion of Laplacian of a graph can be generalized to simplicial complexes and hypergraphs, and contains information on the topology of these structures. Even for a graph, the consideration of associated simplicial complexes is interesting to understand its shape. Whereas the Laplacian of a graph has a simple probabilistic interpretation as the generator of a continuous time Markov chain on the graph, things are not so direct when considering simplicial complexes. We define here new Markov chains on simplicial complexes. For a given order~$k$, the state space is the set of $k$-cycles that are chains of $k$-simplexes with null boundary. This new framework is a natural generalization of the canonical Markov chains on graphs. We show that the generator of our Markov chain is the upper Laplacian defined in the context of algebraic topology for discrete structure. We establish several key properties of this new process: in particular, when the number of vertices is finite, the Markov chain is positive recurrent. This result is not trivial, since the cycles can loop over themselves an unbounded number of times. We study the diffusive limits when the simplicial complexes under scrutiny are a sequence of ever refining triangulations of the flat torus. Using the analogy between singular and Hodge homologies, we express this limit as valued in the set of currents. The proof of tightness and the identification of the limiting martingale problem make use of the flat norm and carefully controls of the error terms in the convergence of the generator. Uniqueness of the solution to the martingale problem is left open. An application to hole detection is carried.
Characterized by an outer integral connected to an inner integral through a nonlinear function, nested integration is a challenging problem in various fields, such as engineering and mathematical finance. The available numerical methods for nested integration based on Monte Carlo (MC) methods can be prohibitively expensive owing to the error propagating from the inner to the outer integral. Attempts to enhance the efficiency of these approximations using the quasi-MC (QMC) or randomized QMC (rQMC) method have focused on either the inner or outer integral approximation. This work introduces a novel nested rQMC method that simultaneously addresses the approximation of the inner and outer integrals. This method leverages the unique nested integral structure to offer a more efficient approximation mechanism. By incorporating Owen's scrambling techniques, we address integrands exhibiting infinite variation in the Hardy--Krause sense, enabling theoretically sound error estimates. As the primary contribution, we derive asymptotic error bounds for the bias and variance of our estimator, along with the regularity conditions under which these bounds can be attained. In addition, we provide nearly optimal sample sizes for the rQMC approximations underlying the numerical implementation of the proposed method. Moreover, we indicate how to combine this method with importance sampling to remedy the measure concentration arising in the inner integral. We verify the estimator quality through numerical experiments in the context of expected information gain estimation. We compare the computational efficiency of the nested rQMC method against standard nested MC integration for two case studies: one in thermomechanics and the other in pharmacokinetics. These examples highlight the computational savings and enhanced applicability of the proposed approach.
First order shape optimization methods, in general, require a large number of iterations until they reach a locally optimal design. While higher order methods can significantly reduce the number of iterations, they exhibit only local convergence properties, necessitating a sufficiently close initial guess. In this work, we present an unregularized shape-Newton method and combine shape optimization with homotopy (or continuation) methods in order to allow for the use of higher order methods even if the initial design is far from a solution. The idea of homotopy methods is to continuously connect the problem of interest with a simpler problem and to follow the corresponding solution path by a predictor-corrector scheme. We use a shape-Newton method as a corrector and arbitrary order shape derivatives for the predictor. Moreover, we apply homotopy methods also to the case of multi-objective shape optimization to efficiently obtain well-distributed points on a Pareto front. Finally, our results are substantiated with a set of numerical experiments.
We study the properties of a family of distances between functions of a single variable. These distances are examples of integral probability metrics, and have been used previously for comparing probability measures on the line; special cases include the Earth Mover's Distance and the Kolmogorov Metric. We examine their properties for general signals, proving that they are robust to a broad class of deformations. We also establish corresponding robustness results for the induced sliced distances between multivariate functions. Finally, we establish error bounds for approximating the univariate metrics from finite samples, and prove that these approximations are robust to additive Gaussian noise. The results are illustrated in numerical experiments, which include comparisons with Wasserstein distances.
In reinsurance, Poisson and Negative binomial distributions are employed for modeling frequency. However, the incomplete data regarding reported incurred claims above a priority level presents challenges in estimation. This paper focuses on frequency estimation using Schnieper's framework for claim numbering. We demonstrate that Schnieper's model is consistent with a Poisson distribution for the total number of claims above a priority at each year of development, providing a robust basis for parameter estimation. Additionally, we explain how to build an alternative assumption based on a Negative binomial distribution, which yields similar results. The study includes a bootstrap procedure to manage uncertainty in parameter estimation and a case study comparing assumptions and evaluating the impact of the bootstrap approach.
Using dominating sets to separate vertices of graphs is a well-studied problem in the larger domain of identification problems. In such problems, the objective is to choose a suitable dominating set $C$ of a graph $G$ such that the neighbourhoods of all vertices of $G$ have distinct intersections with $C$. Such a dominating and separating set $C$ is often referred to as a \emph{code} in the literature. Depending on the types of dominating and separating sets used, various problems arise under various names in the literature. In this paper, we introduce a new problem in the same realm of identification problems whereby the code, called \emph{open-separating dominating code}, or \emph{OSD-code} for short, is a dominating set and uses open neighbourhoods for separating vertices. The paper studies the fundamental properties concerning the existence, hardness and minimality of OSD-codes. Due to the emergence of a close and yet difficult to establish relation of the OSD-codes with another well-studied code in the literature called open locating dominating codes, or OLD-codes for short, we compare the two on various graph families. Finally, we also provide an equivalent reformulation of the problem of finding OSD-codes of a graph as a covering problem in a suitable hypergraph and discuss the polyhedra associated with OSD-codes, again in relation to OLD-codes of some graph families already studied in this context.
We generalize McDiarmid's inequality for functions with bounded differences on a high probability set, using an extension argument. Those functions concentrate around their conditional expectations. We further extend the results to concentration in general metric spaces.
With the recent success of generative models in image and text, the evaluation of generative models has gained a lot of attention. Whereas most generative models are compared in terms of scalar values such as Frechet Inception Distance (FID) or Inception Score (IS), in the last years (Sajjadi et al., 2018) proposed a definition of precision-recall curve to characterize the closeness of two distributions. Since then, various approaches to precision and recall have seen the light (Kynkaanniemi et al., 2019; Naeem et al., 2020; Park & Kim, 2023). They center their attention on the extreme values of precision and recall, but apart from this fact, their ties are elusive. In this paper, we unify most of these approaches under the same umbrella, relying on the work of (Simon et al., 2019). Doing so, we were able not only to recover entire curves, but also to expose the sources of the accounted pitfalls of the concerned metrics. We also provide consistency results that go well beyond the ones presented in the corresponding literature. Last, we study the different behaviors of the curves obtained experimentally.
Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.