We consider a stationary Markov process that models certain queues with a bulk service of a fixed number $m$ of admitted customers. We find an integral expression of its transition probability function in terms of certain multi-orthogonal polynomials with respect to a system of distributions that contain measures supported on starlike subsets of the complex plane. We also study the corresponding steady state.
Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal $O(n \log n)$ sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using $L^p$-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of $L^p$-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the $L^p$-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph $G(n,d/n)$, where the previously known algorithms run in time $n^{O(\log d)}$ or applied only to large $d$. We refine these algorithmic bounds significantly, and develop fast $n^{1+o(1)}$ algorithms based on Glauber dynamics that apply to all $d$, throughout the uniqueness regime.
We revisit the task of quantum state redistribution in the one-shot setting, and design a protocol for this task with communication cost in terms of a measure of distance from quantum Markov chains. More precisely, the distance is defined in terms of quantum max-relative entropy and quantum hypothesis testing entropy. Our result is the first to operationally connect quantum state redistribution and quantum Markov chains, and can be interpreted as an operational interpretation for a possible one-shot analogue of quantum conditional mutual information. The communication cost of our protocol is lower than all previously known ones and asymptotically achieves the well-known rate of quantum conditional mutual information. Thus, our work takes a step towards the important open question of near-optimal characterization of the one-shot quantum state redistribution.
The stability of an approximating sequence $(A_n)$ for an operator $A$ usually requires, besides invertibility of $A$, the invertibility of further operators, say $B, C, \dots$, that are well-associated to the sequence $(A_n)$. We study this set, $\{A,B,C,\dots\}$, of so-called stability indicators of $(A_n)$ and connect it to the asymptotics of $\|A_n\|$, $\|A_n^{-1}\|$ and $\kappa(A_n)=\|A_n\|\|A_n^{-1}\|$ as well as to spectral pollution by showing that $\limsup {\rm Spec}_\varepsilon A_n= {\rm Spec}_\varepsilon A\cup{\rm Spec}_\varepsilon B\cup{\rm Spec}_\varepsilon C\cup\dots$. We further specify, for each of $\|A_n\|$, $\|A_n^{-1}\|$, $\kappa(A_n)$ and ${\rm Spec}_\varepsilon A_n$, under which conditions even convergence applies.
Recurrent neural networks (RNNs) have yielded promising results for both recognizing objects in challenging conditions and modeling aspects of primate vision. However, the representational dynamics of recurrent computations remain poorly understood, especially in large-scale visual models. Here, we studied such dynamics in RNNs trained for object classification on MiniEcoset, a novel subset of ecoset. We report two main insights. First, upon inference, representations continued to evolve after correct classification, suggesting a lack of the notion of being ``done with classification''. Second, focusing on ``readout zones'' as a way to characterize the activation trajectories, we observe that misclassified representations exhibit activation patterns with lower L2 norm, and are positioned more peripherally in the readout zones. Such arrangements help the misclassified representations move into the correct zones as time progresses. Our findings generalize to networks with lateral and top-down connections, and include both additive and multiplicative interactions with the bottom-up sweep. The results therefore contribute to a general understanding of RNN dynamics in naturalistic tasks. We hope that the analysis framework will aid future investigations of other types of RNNs, including understanding of representational dynamics in primate vision.
This paper presents a new approach for training two-stage object detection ensemble models, more specifically, Faster R-CNN models to estimate uncertainty. We propose training one Region Proposal Network(RPN)~\cite{//doi.org/10.48550/arxiv.1506.01497} and multiple Fast R-CNN prediction heads is all you need to build a robust deep ensemble network for estimating uncertainty in object detection. We present this approach and provide experiments to show that this approach is much faster than the naive method of fully training all $n$ models in an ensemble. We also estimate the uncertainty by measuring this ensemble model's Expected Calibration Error (ECE). We then further compare the performance of this model with that of Gaussian YOLOv3, a variant of YOLOv3 that models uncertainty using predicted bounding box coordinates. The source code is released at \url{//github.com/Akola-Mbey-Denis/EfficientEnsemble}
We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are non-zero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$ and $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all non-zero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the non-zero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.
The k-sample testing problem involves determining whether $k$ groups of data points are each drawn from the same distribution. The standard method for k-sample testing in biomedicine is Multivariate analysis of variance (MANOVA), despite that it depends on strong, and often unsuitable, parametric assumptions. Moreover, independence testing and k-sample testing are closely related, and several universally consistent high-dimensional independence tests such as distance correlation (Dcorr) and Hilbert-Schmidt-Independence-Criterion (Hsic) enjoy solid theoretical and empirical properties. In this paper, we prove that independence tests achieve universally consistent k-sample testing and that k-sample statistics such as Energy and Maximum Mean Discrepancy (MMD) are precisely equivalent to Dcorr. An empirical evaluation of nonparametric independence tests showed that they generally perform better than the popular MANOVA test, even in Gaussian distributed scenarios. The evaluation included several popular independence statistics and covered a comprehensive set of simulations. Additionally, the testing approach was extended to perform multiway and multilevel tests, which were demonstrated in a simulated study as well as a real-world fMRI brain scans with a set of attributes.
A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces.
The generalized Golub-Kahan bidiagonalization has been used to solve saddle-point systems where the leading block is symmetric and positive definite. We extend this iterative method for the case where the symmetry condition no longer holds. We do so by relying on the known connection the algorithm has with the Conjugate Gradient method and following the line of reasoning that adapts the latter into the Full Orthogonalization Method. We propose appropriate stopping criteria based on the residual and an estimate of the energy norm for the error associated with the primal variable. Numerical comparison with GMRES highlights the advantages of our proposed strategy regarding its low memory requirements and the associated implications.
In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.