We develop a new, powerful method for counting elements in a {\em multiset.} As a first application, we use this algorithm to study the number of occurrences of patterns in a permutation. For patterns of length 3 there are two Wilf classes, and the general behaviour of these is reasonably well-known. We slightly extend some of the known results in that case, and exhaustively study the case of patterns of length 4, about which there is little previous knowledge. For such patterns, there are seven Wilf classes, and based on extensive enumerations and careful series analysis, we have conjectured the asymptotic behaviour for all classes. Finally, we investigate a proposal of Blitvi\'c and Steingr\'imsson as to the range of a parameter for which a particular generating function formed from the occurrence sequences is itself a Stieltjes moment sequence.
We develop the no-propagate algorithm for sampling the linear response of random dynamical systems, which are non-uniform hyperbolic deterministic systems perturbed by noise with smooth density. We first derive a Monte-Carlo type formula and then the algorithm, which is different from the ensemble (stochastic gradient) algorithms, finite-element algorithms, and fast-response algorithms; it does not involve the propagation of vectors or covectors, and only the density of the noise is differentiated, so the formula is not cursed by gradient explosion, dimensionality, or non-hyperbolicity. We demonstrate our algorithm on a tent map perturbed by noise and a chaotic neural network with 51 layers $\times$ 9 neurons. By itself, this algorithm approximates the linear response of non-hyperbolic deterministic systems, with an additional error proportional to the noise. We also discuss the potential of using this algorithm as a part of a bigger algorithm with smaller error.
Making inference with spatial extremal dependence models can be computationally burdensome since they involve intractable and/or censored likelihoods. Building on recent advances in likelihood-free inference with neural Bayes estimators, that is, neural networks that approximate Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that encode censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators to make inference with popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess extreme particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.
The problem of finding a solution to the linear system $Ax = b$ with certain minimization properties arises in numerous scientific and engineering areas. In the era of big data, the stochastic optimization algorithms become increasingly significant due to their scalability for problems of unprecedented size. This paper focuses on the problem of minimizing a strongly convex function subject to linear constraints. We consider the dual formulation of this problem and adopt the stochastic coordinate descent to solve it. The proposed algorithmic framework, called fast stochastic dual coordinate descent, utilizes sampling matrices sampled from user-defined distributions to extract gradient information. Moreover, it employs Polyak's heavy ball momentum acceleration with adaptive parameters learned through iterations, overcoming the limitation of the heavy ball momentum method that it requires prior knowledge of certain parameters, such as the singular values of a matrix. With these extensions, the framework is able to recover many well-known methods in the context, including the randomized sparse Kaczmarz method, the randomized regularized Kaczmarz method, the linearized Bregman iteration, and a variant of the conjugate gradient (CG) method. We prove that, with strongly admissible objective function, the proposed method converges linearly in expectation. Numerical experiments are provided to confirm our results.
We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $y$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $Y$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $Y_f = 1$, and where the variables $Y_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $Y_{e_1}, Y_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $Y_{e_1} Y_{e_2}$ is significantly below $y_{e_1} y_{e_2}$. This algorithm is a refinement of a dependent-rounding algorithm of Im \& Shadloo (2020) based on simulation of Poisson processes. Our algorithm allows greater flexibility, in particular, it allows ``irregular'' fractional assignments, and it gives more refined bounds on the negative correlation. Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.407$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023).
Bayesian linear mixed-effects models and Bayesian ANOVA are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can - if the sphericity assumption is violated - likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian linear mixed-effects models on non-aggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from \url{//osf.io/mjf47/}.
We show that any application of the technique of unbiased simulation becomes perfect simulation when coalescence of the two coupled Markov chains can be practically assured in advance. This happens when a fixed number of iterations is high enough that the probability of needing any more to achieve coalescence is negligible; we suggest a value of $10^{-20}$. This finding enormously increases the range of problems for which perfect simulation, which exactly follows the target distribution, can be implemented. We design a new algorithm to make practical use of the high number of iterations by producing extra perfect sample points with little extra computational effort, at a cost of a small, controllable amount of serial correlation within sample sets of about 20 points. Different sample sets remain completely independent. The algorithm includes maximal coupling for continuous processes, to bring together chains that are already close. We illustrate the methodology on a simple, two-state Markov chain and on standard normal distributions up to 20 dimensions. Our technical formulation involves a nonzero probability, which can be made arbitrarily small, that a single perfect sample point may have its place taken by a "string" of many points which are assigned weights, each equal to $\pm 1$, that sum to~$1$. A point with a weight of $-1$ is a "hole", which is an object that can be cancelled by an equivalent point that has the same value but opposite weight $+1$.
We describe a `discretize-then-relax' strategy to globally minimize integral functionals over functions $u$ in a Sobolev space satisfying prescribed Dirichlet boundary conditions. The strategy applies whenever the integral functional depends polynomially on $u$ and its derivatives, even if it is nonconvex. The `discretize' step uses a bounded finite-element scheme to approximate the integral minimization problem with a convergent hierarchy of polynomial optimization problems over a compact feasible set, indexed by the decreasing size $h$ of the finite-element mesh. The `relax' step employs sparse moment-SOS relaxations to approximate each polynomial optimization problem with a hierarchy of convex semidefinite programs, indexed by an increasing relaxation order $\omega$. We prove that, as $\omega\to\infty$ and $h\to 0$, solutions of such semidefinite programs provide approximate minimizers that converge in $L^p$ to the global minimizer of the original integral functional if this is unique. We also report computational experiments that show our numerical strategy works well even when technical conditions required by our theoretical analysis are not satisfied.
Given a property (graph class) $\Pi$, a graph $G$, and an integer $k$, the \emph{$\Pi$-completion} problem consists in deciding whether we can turn $G$ into a graph with the property $\Pi$ by adding at most $k$ edges to $G$. The $\Pi$-completion problem is known to be NP-hard for general graphs when $\Pi$ is the property of being a proper interval graph (PIG). In this work, we study the PIG-completion problem %when $\Pi$ is the class of proper interval graphs (PIG) within different subclasses of chordal graphs. We show that the problem remains NP-complete even when restricted to split graphs. We then turn our attention to positive results and present polynomial time algorithms to solve the PIG-completion problem when the input is restricted to caterpillar and threshold graphs. We also present an efficient algorithm for the minimum co-bipartite-completion for quasi-threshold graphs, which provides a lower bound for the PIG-completion problem within this graph class.
Blumer et al. (1987, 1989) showed that any concept class that is learnable by Occam algorithms is PAC learnable. Board and Pitt (1990) showed a partial converse of this theorem: for concept classes that are closed under exception lists, any class that is PAC learnable is learnable by an Occam algorithm. However, their Occam algorithm outputs a hypothesis whose complexity is $\delta$-dependent, which is an important limitation. In this paper, we show that their partial converse applies to Occam algorithms with $\delta$-independent complexities as well. Thus, we provide a posteriori justification of various theoretical results and algorithm design methods which use the partial converse as a basis for their work.
The paper aims to study the performance of the amplitude-based model \newline $\widehat{\mathbf x} \in {\rm argmin}_{{\mathbf x}\in \mathbb{C}^d}\sum_{j=1}^m\left(|\langle {\mathbf a}_j,{\mathbf x}\rangle|-b_j\right)^2$, where $b_j:=|\langle {\mathbf a}_j,{\mathbf x}_0\rangle|+\eta_j$ and ${\mathbf x}_0\in \mathbb{C}^d$ is a target signal. The model is raised in phase retrieval as well as in absolute value rectification neural networks. Many efficient algorithms have been developed to solve it in the past decades. {However, there are very few results available regarding the estimation performance in the complex case under noisy conditions.} In this paper, {we present a theoretical guarantee on the amplitude-based model for the noisy complex phase retrieval problem}. Specifically, we show that $\min_{\theta\in[0,2\pi)}\|\widehat{\mathbf x}-\exp(\mathrm{i}\theta)\cdot{\mathbf x}_0\|_2 \lesssim \frac{\|{\mathbf \eta}\|_2}{\sqrt{m}}$ holds with high probability provided the measurement vectors ${\mathbf a}_j\in \mathbb{C}^d,$ $j=1,\ldots,m,$ are {i.i.d.} complex sub-Gaussian random vectors and $m\gtrsim d$. Here ${\mathbf \eta}=(\eta_1,\ldots,\eta_m)\in \mathbb{R}^m$ is the noise vector without any assumption on the distribution. Furthermore, we prove that the reconstruction error is sharp. For the case where the target signal ${\mathbf x}_0\in \mathbb{C}^{d}$ is sparse, we establish a similar result for the nonlinear constrained $\ell_1$ minimization model. { To accomplish this, we leverage a strong version of restricted isometry property for an operator on the space of simultaneous low-rank and sparse matrices.}