We consider the online hitting set problem for the range space $\Sigma=(\cal X,\cal R)$, where the point set $\cal X$ is known beforehand, but the set $\cal R$ of geometric objects is not known in advance. Here, geometric objects arrive one by one, the objective is to maintain a hitting set of minimum cardinality by taking irrevocable decisions. In this paper, we have considered the problem when the objects are unit balls or unit hypercubes in $\mathbb{R}^d$, and the points from $\mathbb{Z}^d$ are used for hitting them. First, we consider the problem for objects (unit balls and unit hypercubes) in lower dimensions. We obtain $4$ and $8$-competitive deterministic online algorithms for hitting unit hypercubes in $\mathbb{R}^2$ and $\mathbb{R}^3$, respectively. On the other hand, we present $4$ and $14$-competitive deterministic online algorithms for hitting unit balls in $\mathbb{R}^2$ and $\mathbb{R}^3$, respectively. Next, we consider the problem for objects (unit balls and unit hypercubes) in the higher dimension. For hitting unit hypercubes in $\mathbb{R}^d$, we present a $O(d^2)$-competitive randomized online algorithm for $d\geq 3$ and prove the competitive ratio of any deterministic algorithm for the problem is at least $d+1$ for any $d\in\mathbb{N}$. Then, for hitting unit balls in $\mathbb{R}^d$, we propose a $O(d^4)$-competitive deterministic algorithm, and for $d<4$, we establish that the competitive ratio of any deterministic algorithm is at least $d+1$.
This paper investigates Distributed Hypothesis testing (DHT), in which a source $\mathbf{X}$ is encoded given that side information $\mathbf{Y}$ is available at the decoder only. Based on the received coded data, the receiver aims to decide on the two hypotheses $H_0$ or $H_1$ related to the joint distribution of $\mathbf{X}$ and $\mathbf{Y}$. While most existing contributions in the literature on DHT consider i.i.d. assumptions, this paper assumes more generic, non-i.i.d., non-stationary, and non-ergodic sources models. It relies on information-spectrum tools to provide general formulas on the achievable Type-II error exponent under a constraint on the Type-I error. The achievability proof is based on a quantize-and-binning scheme. It is shown that with the quantize-and-binning approach, the error exponent boils down to a trade-off between a binning error and a decision error, as already observed for the i.i.d. sources. The last part of the paper provides error exponents for particular source models, \emph{e.g.}, Gaussian, stationary, and ergodic models.
In this paper, we put forward the model of zero-error distributed function compression system of two binary memoryless sources X and Y, where there are two encoders En1 and En2 and one decoder De, connected by two channels (En1, De) and (En2, De) with the capacity constraints C1 and C2, respectively. The encoder En1 can observe X or (X,Y) and the encoder En2 can observe Y or (X,Y) according to the two switches s1 and s2 open or closed (corresponding to taking values 0 or 1). The decoder De is required to compress the binary arithmetic sum f(X,Y)=X+Y with zero error by using the system multiple times. We use (s1s2;C1,C2;f) to denote the model in which it is assumed that C1 \geq C2 by symmetry. The compression capacity for the model is defined as the maximum average number of times that the function f can be compressed with zero error for one use of the system, which measures the efficiency of using the system. We fully characterize the compression capacities for all the four cases of the model (s1s2;C1,C2;f) for s1s2= 00,01,10,11. Here, the characterization of the compression capacity for the case (01;C1,C2;f) with C1>C2 is highly nontrivial, where a novel graph coloring approach is developed. Furthermore, we apply the compression capacity for (01;C1,C2;f) to an open problem in network function computation that whether the best known upper bound of Guang et al. on computing capacity is in general tight.
In this paper I propose a concept of a correct loss function in a generative model of supervised learning for an input space $\mathcal{X}$ and a label space $\mathcal{Y}$, which are measurable spaces. A correct loss function in a generative model of supervised learning must correctly measure the discrepancy between elements of a hypothesis space $\mathcal{H}$ of possible predictors and the supervisor operator, which may not belong to $\mathcal{H}$. To define correct loss functions, I propose a characterization of a regular conditional probability measure $\mu_{\mathcal{Y}|\mathcal{X}}$ for a probability measure $\mu$ on $\mathcal{X} \times \mathcal{Y}$ relative to the projection $\Pi_{\mathcal{X}}: \mathcal{X}\times\mathcal{Y}\to \mathcal{X}$ as a solution of a linear operator equation. If $\mathcal{Y}$ is a separable metrizable topological space with the Borel $\sigma$-algebra $ \mathcal{B} (\mathcal{Y})$, I propose another characterization of a regular conditional probability measure $\mu_{\mathcal{Y}|\mathcal{X}}$ as a minimizer of a mean square error on the space of Markov kernels, called probabilistic morphisms, from $\mathcal{X}$ to $\mathcal{Y}$, using kernel mean embedding. Using these results and using inner measure to quantify generalizability of a learning algorithm, I give a generalization of a result due to Cucker-Smale, which concerns the learnability of a regression model, to a setting of a conditional probability estimation problem. I also give a variant of Vapnik's method of solving stochastic ill-posed problem, using inner measure and discuss its applications.
High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of \emph{width-stable} graphs, for which a popular heuristic is a \gwsimple-approximation ($|X|$ being the total flow of $X$), and strengthen its worst-case approximation ratio from $\Omega(\sqrt{m})$ to $\Omega(m / \log m)$ for sparse graphs, where $m$ is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a $(\lceil \log \Vert X \Vert \rceil +1)$-approximation ($\Vert X \Vert$ being the maximum absolute value of $X$ on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations ($\Vert X \Vert \leq 1$), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [ALENEX 2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.
Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). The selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace's method and the Gaussian integral. We empirically assess BPLS for parametric generalized linear and non-parametric generalized additive models on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.
A random algebraic graph is defined by a group $G$ with a uniform distribution over it and a connection $\sigma:G\longrightarrow[0,1]$ with expectation $p,$ satisfying $\sigma(g)=\sigma(g^{-1}).$ The random graph $\mathsf{RAG}(n,G,p,\sigma)$ with vertex set $[n]$ is formed as follows. First, $n$ independent vectors $x_1,\ldots,x_n$ are sampled uniformly from $G.$ Then, vertices $i,j$ are connected with probability $\sigma(x_ix_j^{-1}).$ This model captures random geometric graphs over the sphere and the hypercube, certain regimes of the stochastic block model, and random subgraphs of Cayley graphs. The main question of interest to the current paper is: when is a random algebraic graph statistically and/or computationally distinguishable from $\mathsf{G}(n,p)$? Our results fall into two categories. 1) Geometric. We focus on the case $G =\{\pm1\}^d$ and use Fourier-analytic tools. For hard threshold connections, we match [LMSY22b] for $p = \omega(1/n)$ and for $1/(r\sqrt{d})$-Lipschitz connections we extend the results of [LR21b] when $d = \Omega(n\log n)$ to the non-monotone setting. We study other connections such as indicators of interval unions and low-degree polynomials. 2) Algebraic. We provide evidence for an exponential statistical-computational gap. Consider any finite group $G$ and let $A\subseteq G$ be a set of elements formed by including each set of the form $\{g, g^{-1}\}$ independently with probability $1/2.$ Let $\Gamma_n(G,A)$ be the distribution of random graphs formed by taking a uniformly random induced subgraph of size $n$ of the Cayley graph $\Gamma(G,A).$ Then, $\Gamma_n(G,A)$ and $\mathsf{G}(n,1/2)$ are statistically indistinguishable with high probability over $A$ if and only if $\log|G|\gtrsim n.$ However, low-degree polynomial tests fail to distinguish $\Gamma_n(G,A)$ and $\mathsf{G}(n,1/2)$ with high probability over $A$ when $\log |G|=\log^{\Omega(1)}n.$
We study the question of local testability of low (constant) degree functions from a product domain $S_1 \times \dots \times {S}_n$ to a field $\mathbb{F}$, where ${S_i} \subseteq \mathbb{F}$ can be arbitrary constant sized sets. We show that this family is locally testable when the grid is "symmetric". That is, if ${S_i} = {S}$ for all i, there is a probabilistic algorithm using constantly many queries that distinguishes whether $f$ has a polynomial representation of degree at most $d$ or is $\Omega(1)$-far from having this property. In contrast, we show that there exist asymmetric grids with $|{S}_1| =\dots= |{S}_n| = 3$ for which testing requires $\omega_n(1)$ queries, thereby establishing that even in the context of polynomials, local testing depends on the structure of the domain and not just the distance of the underlying code. The low-degree testing problem has been studied extensively over the years and a wide variety of tools have been applied to propose and analyze tests. Our work introduces yet another new connection in this rich field, by building low-degree tests out of tests for "junta-degrees". A function $f : {S}_1 \times \dots \times {S}_n \to {G}$, for an abelian group ${G}$ is said to be a junta-degree-$d$ function if it is a sum of $d$-juntas. We derive our low-degree test by giving a new local test for junta-degree-$d$ functions. For the analysis of our tests, we deduce a small-set expansion theorem for spherical noise over large grids, which may be of independent interest.
Knowledge graph embedding (KGE) is a increasingly popular technique that aims to represent entities and relations of knowledge graphs into low-dimensional semantic spaces for a wide spectrum of applications such as link prediction, knowledge reasoning and knowledge completion. In this paper, we provide a systematic review of existing KGE techniques based on representation spaces. Particularly, we build a fine-grained classification to categorise the models based on three mathematical perspectives of the representation spaces: (1) Algebraic perspective, (2) Geometric perspective, and (3) Analytical perspective. We introduce the rigorous definitions of fundamental mathematical spaces before diving into KGE models and their mathematical properties. We further discuss different KGE methods over the three categories, as well as summarise how spatial advantages work over different embedding needs. By collating the experimental results from downstream tasks, we also explore the advantages of mathematical space in different scenarios and the reasons behind them. We further state some promising research directions from a representation space perspective, with which we hope to inspire researchers to design their KGE models as well as their related applications with more consideration of their mathematical space properties.
Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them. While the former has been addressed in prior work, the latter is often overlooked, resulting in informal model understanding derived from a handful of local explanations. In this paper, we introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding, and propose metrics for its quality assessment. On two domains, ExSum highlights various limitations in the current practice, helps develop accurate model understanding, and reveals easily overlooked properties of the model. We also connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual minimality and plausibility.