Multilevel lattice codes, such as those associated to Constructions $C$, $\overline{D}$, D and D', have relevant applications in communications. In this paper, we investigate some properties of lattices obtained via Constructions D and D' from $q$-ary linear codes. Connections with Construction A, generator matrices, expressions and bounds for the lattice volume and minimum distances are derived. Extensions of previous results regarding construction and decoding of binary and $p$-ary linear codes ($p$ prime) are also presented.
Existing methods have demonstrated effective performance on a single degradation type. In practical applications, however, the degradation is often unknown, and the mismatch between the model and the degradation will result in a severe performance drop. In this paper, we propose an all-in-one image restoration network that tackles multiple degradations. Due to the heterogeneous nature of different types of degradations, it is difficult to process multiple degradations in a single network. To this end, we propose to learn a neural degradation representation (NDR) that captures the underlying characteristics of various degradations. The learned NDR decomposes different types of degradations adaptively, similar to a neural dictionary that represents basic degradation components. Subsequently, we develop a degradation query module and a degradation injection module to effectively recognize and utilize the specific degradation based on NDR, enabling the all-in-one restoration ability for multiple degradations. Moreover, we propose a bidirectional optimization strategy to effectively drive NDR to learn the degradation representation by optimizing the degradation and restoration processes alternately. Comprehensive experiments on representative types of degradations (including noise, haze, rain, and downsampling) demonstrate the effectiveness and generalization capability of our method.
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception - a critical ability of human professionals in comprehending molecules' topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an LM (e.g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector. Specifically, the cross-modal projector is implemented as a Q-Former to connect a graph encoder's representation space and an LM's text space. Further, MolCA employs a uni-modal adapter (i.e., LoRA) for the LM's efficient adaptation to downstream tasks. Unlike previous studies that couple an LM with a graph encoder via cross-modal contrastive learning, MolCA retains the LM's ability of open-ended text generation and augments it with 2D graph information. To showcase its effectiveness, we extensively benchmark MolCA on tasks of molecule captioning, IUPAC name prediction, and molecule-text retrieval, on which MolCA significantly outperforms the baselines. Our codes and checkpoints can be found at //github.com/acharkq/MolCA.
We consider rather a general class of multi-level optimization problems, where a convex objective function is to be minimized subject to constraints of optimality of nested convex optimization problems. As a special case, we consider a trilevel optimization problem, where the objective of the two lower layers consists of a sum of a smooth and a non-smooth term.~Based on fixed-point theory and related arguments, we present a natural first-order algorithm and analyze its convergence and rates of convergence in several regimes of parameters.
Functional bootstrapping is a core technique in Fully Homomorphic Encryption (FHE). For large plaintext, to evaluate a general function homomorphically over a ciphertext, in the FHEW/TFHE approach, since the function in look-up table form is encoded in the coefficients of a test polynomial, the degree of the polynomial must be high enough to hold the entire table. This increases the bootstrapping time complexity and memory cost, as the size of bootstrapping keys and keyswitching keys need to be large accordingly. In this paper, we propose to encode the look-up table of any function in a polynomial vector, whose coefficients can hold more data. The corresponding representation of the additive group Zq used in the RGSW-based bootstrapping is the group of monic monomial permutation matrices, which integrates the permutation matrix representation used by Alperin-Sheriff and Peikert in 2014, and the monic monomial representation used in the FHEW/TFHE scheme. We make comprehensive investigation of the new representation, and propose a new bootstrapping algorithm based on it. The new algorithm has the prominent benefit of small bootstrapping key size and small key-switching key size, which leads to polynomial factor improvement in key size, in addition to constant factor improvement in run-time cost.
Large language models (LLMs), such as ChatGPT, have simplified text generation tasks, yet their inherent privacy risks are increasingly garnering attention. While differential privacy techniques have been successfully applied to text classification tasks, the resultant semantic bias makes them unsuitable for text generation. Homomorphic encryption inference methods have also been introduced, however, the significant computational and communication costs limit their viability. Furthermore, closed-source, black-box models such as GPT-4 withhold their architecture, thwarting certain privacy-enhancing strategies such as splitting inference into local and remote and then adding noise when communicating. To overcome these challenges, we introduce PrivInfer, the first privacy-preserving inference framework for black-box LLMs in text generation. Inspired by human writing, PrivInfer employs differential privacy methods to generate perturbed prompts for remote LLMs inference and extracts the meaningful response from the remote perturbed results. We also introduce RANTEXT, a differential privacy scheme specifically for LLMs that leverages random adjacency in text perturbations. Experimental results indicate that PrivInfer is comparable to GPT-4 in terms of text generation quality while protecting privacy, and RANTEXT provides enhanced privacy protection against three types of differential privacy attacks, including our newly introduced GPT inference attack, compared to baseline methods.
Recent work has shown how to prompt large language models with explanations to obtain strong performance on textual reasoning tasks, i.e., the chain-of-thought paradigm. However, subtly different explanations can yield widely varying downstream task accuracy. Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion. We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework. We first evaluate explanations for each in-context example in isolation according to two proxy metrics, log likelihood and accuracy on new examples. Then, we search over combinations of explanations to find one that yields high performance against a silver-labeled development set. Across four textual reasoning tasks spanning question answering, mathematical reasoning, and natural language inference, results show that our proxy metrics correlate with ground truth accuracy and our overall method can effectively improve prompts over crowdworker annotations and naive search strategies
We introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. In particular, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we first conduct a human evaluation involving ten participants who use ClarifyGPT for code generation on two publicly available benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to perform large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The automated evaluation results also demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines. In particular, ClarifyGPT improves the average performance of GPT-4 and ChatGPT across four benchmarks from 68.02% to 75.75% and from 58.55% to 67.22%, respectively. We believe that ClarifyGPT can effectively facilitate the practical application of LLMs in real-world development environments.
We present streaming algorithms for the graph $k$-matching problem in both the insert-only and dynamic models. Our algorithms, with space complexity matching the best upper bounds, have optimal or near-optimal update time, significantly improving on previous results. More specifically, for the insert-only streaming model, we present a one-pass algorithm with optimal space complexity $O(k^2)$ and optimal update time $O(1)$, that with high probability computes a maximum weighted $k$-matching of a given weighted graph. The update time of our algorithm significantly improves the previous upper bound of $O(\log k)$, which was derived only for $k$-matching on unweighted graphs. For the dynamic streaming model, we present a one-pass algorithm that with high probability computes a maximum weighted $k$-matching in $O(Wk^2 \cdot \mbox{polylog}(n)$ space and with $O(\mbox{polylog}(n))$ update time, where $W$ is the number of distinct edge weights. Again the update time of our algorithm improves the previous upper bound of $O(k^2 \cdot \mbox{polylog}(n))$. This algorithm, when applied to unweighted graphs, gives a streaming algorithm on the dynamic model whose space and update time complexities are both near-optimal. Our results also imply a streaming approximation algorithm for maximum weighted $k$-matching whose space complexity matches the best known upper bound with a significantly improved update time.
Pre-trained Language Models (PLMs) which are trained on large text corpus via self-supervised learning method, have yielded promising performance on various tasks in Natural Language Processing (NLP). However, though PLMs with huge parameters can effectively possess rich knowledge learned from massive training text and benefit downstream tasks at the fine-tuning stage, they still have some limitations such as poor reasoning ability due to the lack of external knowledge. Research has been dedicated to incorporating knowledge into PLMs to tackle these issues. In this paper, we present a comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) to provide a clear insight into this thriving field. We introduce appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight these two main tasks of NLP. For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG), and rule knowledge. The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods. Finally, we point out some promising future directions of KE-PLMs.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.