{mayi_des}
In this paper, I consider a fine-grained dichotomy of Boolean counting constraint satisfaction problem (#CSP), under the exponential time hypothesis of counting version (#ETH). Suppose $\mathscr{F}$ is a finite set of algebraic complex-valued functions defined on Boolean domain. When $\mathscr{F}$ is a subset of either two special function sets, I prove that #CSP($\mathscr{F}$) is polynomial-time solvable, otherwise it can not be computed in sub-exponential time unless #ETH fails. I also improve the result by proving the same dichotomy holds for #CSP with bounded degree (every variable appears at most constant constraints), even for #R$_3$-CSP. An important preparation before proving the result is to argue that pinning (two special unary functions $[1,0]$ and $[0,1]$ are used to reduce arity) can also keep the sub-exponential lower bound of a Boolean #CSP problem. I discuss this issue by utilizing some common methods in proving #P-hardness of counting problems. The proof illustrates the internal correlation among these commonly used methods.
For relational structures A, B of the same signature, the Promise Constraint Satisfaction Problem PCSP(A,B) asks whether a given input structure maps homomorphically to A or does not even map to B. We are promised that the input satisfies exactly one of these two cases. If there exists a structure C with homomorphisms $A\to C\to B$, then PCSP(A,B) reduces naturally to CSP(C). To the best of our knowledge all known tractable PCSPs reduce to tractable CSPs in this way. However Barto showed that some PCSPs over finite structures A, B require solving CSPs over infinite C. We show that even when such a reduction to finite C is possible, this structure may become arbitrarily large. For every integer $n>1$ and every prime p we give A, B of size n with a single relation of arity $n^p$ such that PCSP(A, B) reduces via a chain of homomorphisms $ A\to C\to B$ to a tractable CSP over some C of size p but not over any smaller structure. In a second family of examples, for every prime $p\geq 7$ we construct A, B of size $p-1$ with a single ternary relation such that PCSP(A, B) reduces via $A\to C\to B$ to a tractable CSP over some C of size p but not over any smaller structure. In contrast we show that if A, B are graphs and PCSP(A,B) reduces to tractable CSP(C) for some finite digraph C, then already A or B has a tractable CSP. This extends results and answers a question of Deng et al.
Community detection refers to the problem of clustering the nodes of a network into groups. Existing inferential methods for community structure mainly focus on unweighted (binary) networks. Many real-world networks are nonetheless weighted and a common practice is to dichotomize a weighted network to an unweighted one which is known to result in information loss. Literature on hypothesis testing in the latter situation is still missing. In this paper, we study the problem of testing the existence of community structure in weighted networks. Our contributions are threefold: (a). We use the (possibly infinite-dimensional) exponential family to model the weights and derive the sharp information-theoretic limit for the existence of consistent test. Within the limit, any test is inconsistent; and beyond the limit, we propose a useful consistent test. (b). Based on the information-theoretic limits, we provide the first formal way to quantify the loss of information incurred by dichotomizing weighted graphs into unweighted graphs in the context of hypothesis testing. (c). We propose several new and practically useful test statistics. Simulation study show that the proposed tests have good performance. Finally, we apply the proposed tests to an animal social network.
The binary rank of a $0,1$ matrix is the smallest size of a partition of its ones into monochromatic combinatorial rectangles. A matrix $M$ is called $(k_1, \ldots, k_m ; n_1, \ldots, n_m)$ circulant block diagonal if it is a block matrix with $m$ diagonal blocks, such that for each $i \in [m]$, the $i$th diagonal block of $M$ is the circulant matrix whose first row has $k_i$ ones followed by $n_i-k_i$ zeros, and all of whose other entries are zeros. In this work, we study the binary rank of these matrices and of their complement. In particular, we compare the binary rank of these matrices to their rank over the reals, which forms a lower bound on the former. We present a general method for proving upper bounds on the binary rank of block matrices that have diagonal blocks of some specified structure and ones elsewhere. Using this method, we prove that the binary rank of the complement of a $(k_1, \ldots, k_m ; n_1, \ldots, n_m)$ circulant block diagonal matrix for integers satisfying $n_i>k_i>0$ for each $i \in [m]$ exceeds its real rank by no more than the maximum of $\gcd(n_i,k_i)-1$ over all $i \in [m]$. We further present several sufficient conditions for the binary rank of these matrices to strictly exceed their real rank. By combining the upper and lower bounds, we determine the exact binary rank of various families of matrices and, in addition, significantly generalize a result of Gregory. Motivated by a question of Pullman, we study the binary rank of $k$-regular $0,1$ matrices and of their complement. As an application of our results on circulant block diagonal matrices, we show that for every $k \geq 2$, there exist $k$-regular $0,1$ matrices whose binary rank is strictly larger than that of their complement. Furthermore, we exactly determine for every integer $r$, the smallest possible binary rank of the complement of a $2$-regular $0,1$ matrix with binary rank $r$.
Given a set $P$ of $n$ points in the plane, the $k$-center problem is to find $k$ congruent disks of minimum possible radius such that their union covers all the points in $P$. The $2$-center problem is a special case of the $k$-center problem that has been extensively studied in the recent past \cite{CAHN,HT,SH}. In this paper, we consider a generalized version of the $2$-center problem called \textit{proximity connected} $2$-center (PCTC) problem. In this problem, we are also given a parameter $\delta\geq 0$ and we have the additional constraint that the distance between the centers of the disks should be at most $\delta$. Note that when $\delta=0$, the PCTC problem is reduced to the $1$-center(minimum enclosing disk) problem and when $\delta$ tends to infinity, it is reduced to the $2$-center problem. The PCTC problem first appeared in the context of wireless networks in 1992 \cite{ACN0}, but obtaining a nontrivial deterministic algorithm for the problem remained open. In this paper, we resolve this open problem by providing a deterministic $O(n^2\log n)$ time algorithm for the problem.
We study dynamic algorithms for the problem of maximizing a monotone submodular function over a stream of $n$ insertions and deletions. We show that any algorithm that maintains a $(0.5+\epsilon)$-approximate solution under a cardinality constraint, for any constant $\epsilon>0$, must have an amortized query complexity that is $\mathit{polynomial}$ in $n$. Moreover, a linear amortized query complexity is needed in order to maintain a $0.584$-approximate solution. This is in sharp contrast with recent dynamic algorithms of [LMNF+20, Mon20] that achieve $(0.5-\epsilon)$-approximation with a $\mathsf{poly}\log(n)$ amortized query complexity. On the positive side, when the stream is insertion-only, we present efficient algorithms for the problem under a cardinality constraint and under a matroid constraint with approximation guarantee $1-1/e-\epsilon$ and amortized query complexities $\smash{O(\log (k/\epsilon)/\epsilon^2)}$ and $\smash{k^{\tilde{O}(1/\epsilon^2)}\log n}$, respectively, where $k$ denotes the cardinality parameter or the rank of the matroid.
The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.
The minimum energy path (MEP) describes the mechanism of reaction, and the energy barrier along the path can be used to calculate the reaction rate in thermal systems. The nudged elastic band (NEB) method is one of the most commonly used schemes to compute MEPs numerically. It approximates an MEP by a discrete set of configuration images, where the discretization size determines both computational cost and accuracy of the simulations. In this paper, we consider a discrete MEP to be a stationary state of the NEB method and prove an optimal convergence rate of the discrete MEP with respect to the number of images. Numerical simulations for the transitions of some several proto-typical model systems are performed to support the theory.
Universal coding of integers~(UCI) is a class of variable-length code, such that the ratio of the expected codeword length to $\max\{1,H(P)\}$ is within a constant factor, where $H(P)$ is the Shannon entropy of the decreasing probability distribution $P$. However, if we consider the ratio of the expected codeword length to $H(P)$, the ratio tends to infinity by using UCI, when $H(P)$ tends to zero. To solve this issue, this paper introduces a class of codes, termed generalized universal coding of integers~(GUCI), such that the ratio of the expected codeword length to $H(P)$ is within a constant factor $K$. First, the definition of GUCI is proposed and the coding structure of GUCI is introduced. Next, we propose a class of GUCI $\mathcal{C}$ to achieve the expansion factor $K_{\mathcal{C}}=2$ and show that the optimal GUCI is in the range $1\leq K_{\mathcal{C}}^{*}\leq 2$. Then, by comparing UCI and GUCI, we show that when the entropy is very large or $P(0)$ is not large, there are also cases where the average codeword length of GUCI is shorter. Finally, the asymptotically optimal GUCI is presented.
Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at //github.com/ehsansaleh/firth_bias_reduction
There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.