For a complexity class $C$ and language $L$, a constructive separation of $L \notin C$ gives an efficient algorithm (also called a refuter) to find counterexamples (bad inputs) for every $C$-algorithm attempting to decide $L$. We study the questions: Which lower bounds can be made constructive? What are the consequences of constructive separations? We build a case that "constructiveness" serves as a dividing line between many weak lower bounds we know how to prove, and strong lower bounds against $P$, $ZPP$, and $BPP$. Put another way, constructiveness is the opposite of a complexity barrier: it is a property we want lower bounds to have. Our results fall into three broad categories. 1. Our first set of results shows that, for many well-known lower bounds against streaming algorithms, one-tape Turing machines, and query complexity, as well as lower bounds for the Minimum Circuit Size Problem, making these lower bounds constructive would imply breakthrough separations ranging from $EXP \neq BPP$ to even $P \neq NP$. 2. Our second set of results shows that for most major open problems in lower bounds against $P$, $ZPP$, and $BPP$, including $P \neq NP$, $P \neq PSPACE$, $P \neq PP$, $ZPP \neq EXP$, and $BPP \neq NEXP$, any proof of the separation would further imply a constructive separation. Our results generalize earlier results for $P \neq NP$ [Gutfreund, Shaltiel, and Ta-Shma, CCC 2005] and $BPP \neq NEXP$ [Dolev, Fandina and Gutfreund, CIAC 2013]. 3. Our third set of results shows that certain complexity separations cannot be made constructive. We observe that for all super-polynomially growing functions $t$, there are no constructive separations for detecting high $t$-time Kolmogorov complexity (a task which is known to be not in $P$) from any complexity class, unconditionally.
A pervasive methodological error is the post-hoc interpretation of $p$-values. A $p$-value $p$ is not the level at which we reject the null, it is the level at which we would have rejected the null had we chosen level $p$. We introduce post-hoc $p$-values, that do admit this interpretation. We show that $p$ is a post-hoc $p$-value if and only if $1/p$ is an $e$-value. This implies that the product of independent post-hoc $p$-values is a post-hoc $p$-value, making them easy to combine. If we permit external randomization, we find any non-randomized post-hoc $p$-value can be trivially improved. However, we find only (essentially) non-randomized post-hoc $p$-values can be arbitrarily merged through multiplication. Our results extend to post-hoc anytime validity in a sequential setting. Moreover, we introduce two-way post-hoc $p$-values, whose reciprocal is also post-hoc under the alternative. Likelihood ratios are two-way post-hoc $p$-values, which supports their 'direct' interpretation often purported in the context of Bayes factors and links their interpretation to post-hoc $p$-values. Finally, we extend to geometric post-hoc validity and show that GRO $e$-values are the reciprocal of post-hoc $p$-values that minimize the geometric post-hoc error under the alternative.
We embark on a systematic study of the $(k+1)$-th derivative of $x^{k-r}H(x^r)$, where $H(x):=-x\log x-(1-x)\log(1-x)$ is the binary entropy and $k>r\geq 1$ are integers. Our motivation is the conjectural entropy inequality $\alpha_k H(x^k)\geq x^{k-1}H(x)$, where $0<\alpha_k<1$ is given by a functional equation. The $k=2$ case was the key technical tool driving recent breakthroughs on the union-closed sets conjecture. We express $ \frac{d^{k+1}}{dx^{k+1}}x^{k-r}H(x^r)$ as a rational function, an infinite series, and a sum over generalized Stirling numbers. This allows us to reduce the proof of the entropy inequality for real $k$ to showing that an associated polynomial has only two real roots in the interval $(0,1)$, which also allows us to prove the inequality for fractional exponents such as $k=3/2$. The proof suggests a new framework for proving tight inequalities for the sum of polynomials times the logarithms of polynomials, which converts the inequality into a statement about the real roots of a simpler associated polynomial.
A linear code over $\mathbb{F}_q$ with the Hamming metric is called $\Delta$-divisible if the weights of all codewords are divisible by $\Delta$. They have been introduced by Harold Ward a few decades ago. Applications include subspace codes, partial spreads, vector space partitions, and distance optimal codes. The determination of the possible lengths of projective divisible codes is an interesting and comprehensive challenge.
Basic algebraic and combinatorial properties of finite vector spaces in which individual vectors are allowed to have multiplicities larger than $ 1 $ are derived. An application in coding theory is illustrated by showing that multispace codes that are introduced here may be used in random linear network coding scenarios, and that they in fact generalize standard subspace codes (defined in the set of all subspaces of $ \mathbb{F}_q^n $) and extend them to an infinitely larger set of parameters. In particular, in contrast to subspace codes, multispace codes of arbitrarily large cardinality and minimum distance exist for any fixed $ n $ and $ q $.
We explore the concept of separating systems of vertex sets of graphs. A separating system of a set $X$ is a collection of subsets of $X$ such that for any pair of distinct elements in $X$, there exists a set in the separating system that contains exactly one of the two elements. A separating system of the vertex set of a graph $G$ is called a vertex-separating path (tree) system of $G$ if the elements of the separating system are paths (trees) in the graph $G$. In this paper, we focus on the size of the smallest vertex-separating path (tree) system for different types of graphs, including trees, grids, and maximal outerplanar graphs.
We study the problem of contextual feature selection, where the goal is to learn a predictive function while identifying subsets of informative features conditioned on specific contexts. Towards this goal, we generalize the recently proposed stochastic gates (STG) Yamada et al. [2020] by modeling the probabilistic gates as conditional Bernoulli variables whose parameters are predicted based on the contextual variables. Our new scheme, termed conditional-STG (c-STG), comprises two networks: a hypernetwork that establishes the mapping between contextual variables and probabilistic feature selection parameters and a prediction network that maps the selected feature to the response variable. Training the two networks simultaneously ensures the comprehensive incorporation of context and feature selection within a unified model. We provide a theoretical analysis to examine several properties of the proposed framework. Importantly, our model leads to improved flexibility and adaptability of feature selection and, therefore, can better capture the nuances and variations in the data. We apply c-STG to simulated and real-world datasets, including healthcare, housing, and neuroscience, and demonstrate that it effectively selects contextually meaningful features, thereby enhancing predictive performance and interpretability.
For a permutation $\pi: [K]\rightarrow [K]$, a sequence $f: \{1,2,\cdots, n\}\rightarrow \mathbb R$ contains a $\pi$-pattern of size $K$, if there is a sequence of indices $(i_1, i_2, \cdots, i_K)$ ($i_1<i_2<\cdots<i_K$), satisfying that $f(i_a)<f(i_b)$ if $\pi(a)<\pi(b)$, for $a,b\in [K]$. Otherwise, $f$ is referred to as $\pi$-free. For the special case where $\pi = (1,2,\cdots, K)$, it is referred to as the monotone pattern. \cite{newman2017testing} initiated the study of testing $\pi$-freeness with one-sided error. They focused on two specific problems, testing the monotone permutations and the $(1,3,2)$ permutation. For the problem of testing monotone permutation $(1,2,\cdots,K)$, \cite{ben2019finding} improved the $(\log n)^{O(K^2)}$ non-adaptive query complexity of \cite{newman2017testing} to $O((\log n)^{\lfloor \log_{2} K\rfloor})$. Further, \cite{ben2019optimal} proposed an adaptive algorithm with $O(\log n)$ query complexity. However, no progress has yet been made on the problem of testing $(1,3,2)$-freeness. In this work, we present an adaptive algorithm for testing $(1,3,2)$-freeness. The query complexity of our algorithm is $O(\epsilon^{-2}\log^4 n)$, which significantly improves over the $O(\epsilon^{-7}\log^{26}n)$-query adaptive algorithm of \cite{newman2017testing}. This improvement is mainly achieved by the proposal of a new structure embedded in the patterns.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Graph Convolutional Network (GCN) has achieved extraordinary success in learning effective task-specific representations of nodes in graphs. However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for a target object, which hinders both effectiveness and interpretability; (2) they often need to generate intermediate meta-path based dense graphs, which leads to high computational complexity. To address the above issues, we propose an interpretable and efficient Heterogeneous Graph Convolutional Network (ie-HGCN) to learn the representations of objects in HINs. It is designed as a hierarchical aggregation architecture, i.e., object-level aggregation first, followed by type-level aggregation. The novel architecture can automatically extract useful meta-paths for each object from all possible meta-paths (within a length limit), which brings good model interpretability. It can also reduce the computational cost by avoiding intermediate HIN transformation and neighborhood attention. We provide theoretical analysis about the proposed ie-HGCN in terms of evaluating the usefulness of all possible meta-paths, its connection to the spectral graph convolution on HINs, and its quasi-linear time complexity. Extensive experiments on three real network datasets demonstrate the superiority of ie-HGCN over the state-of-the-art methods.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.