Counting the number of linear extensions of a partial order was considered by Stanley about 50 years ago. For the partial order on the vertices and edges of a graph determined by inclusion, we call such linear extensions {\it construction sequences} for the graph as each edge follows both of its endpoints. The number of such sequences for paths, cycles, stars, double-stars, and complete graphs is found. For paths, we agree with Stanley (the Tangent numbers) and get formulas for the other classes. Structure and applications are also studied.
Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
Realizing the recent advances in Natural Language Processing (NLP) to the legal sector poses challenging problems such as extremely long sequence lengths, specialized vocabulary that is usually only understood by legal professionals, and high amounts of data imbalance. The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain due to their ability to handle lengthy, complex sequences. Moreover, the emergence of domain-specific LLMs has displayed extremely promising results on various tasks. In this study, we aim to quantify how general LLMs perform in comparison to legal-domain models (be it an LLM or otherwise). Specifically, we compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification. Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases. However, we find that their mic-F1/mac-F1 performance is up to 19.2/26.8\% lesser than smaller models fine-tuned on the legal domain, thus underscoring the need for more powerful legal-domain LLMs.
There has been an emergence of various models for long-term time series forecasting. Recent studies have demonstrated that a single linear layer, using Channel Dependent (CD) or Channel Independent (CI) modeling, can even outperform a large number of sophisticated models. However, current research primarily considers CD and CI as two complementary yet mutually exclusive approaches, unable to harness these two extremes simultaneously. And it is also a challenging issue that both CD and CI are static strategies that cannot be determined to be optimal for a specific dataset without extensive experiments. In this paper, we reconsider whether the current CI strategy is the best solution for time series forecasting. First, we propose a simple yet effective strategy called CSC, which stands for $\mathbf{C}$hannel $\mathbf{S}$elf-$\mathbf{C}$lustering strategy, for linear models. Our Channel Self-Clustering (CSC) enhances CI strategy's performance improvements while reducing parameter size, for exmpale by over 10 times on electricity dataset, and significantly cutting training time. Second, we further propose Channel Rearrangement (CR), a method for deep models inspired by the self-clustering. CR attains competitive performance against baselines. Finally, we also discuss whether it is best to forecast the future values using the historical values of the same channel as inputs. We hope our findings and methods could inspire new solutions beyond CD/CI.
This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment"). I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an outcome, given these conditions, is roughly 25%). In particular: if performing well in training is a good strategy for gaining power (as I think it might well be), then a very wide variety of goals would motivate scheming -- and hence, good training performance. This makes it plausible that training might either land on such a goal naturally and then reinforce it, or actively push a model's motivations towards such a goal as an easy way of improving performance. What's more, because schemers pretend to be aligned on tests designed to reveal their motivations, it may be quite difficult to tell whether this has occurred. However, I also think there are reasons for comfort. In particular: scheming may not actually be such a good strategy for gaining power; various selection pressures in training might work against schemer-like goals (for example, relative to non-schemers, schemers need to engage in extra instrumental reasoning, which might harm their training performance); and we may be able to increase such pressures intentionally. The report discusses these and a wide variety of other considerations in detail, and it suggests an array of empirical research directions for probing the topic further.
We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d \log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\sqrt{T}$ order regret. Ours is also the first result that allows infinite action sets.
We present ColorFloat, a family of O(1) space complexity algorithms that solve the problem of attributing (coloring) fungible tokens to the entity that minted them (minter). Tagging fungible tokens with metadata is not a new problem and was first formalized in the Colored Coins protocol. In certain contexts, practical solutions to this challenge have been implemented and deployed such as NFT. We define the fungible token coloring problem, one specific aspect of the Colored Coins problem, to be the problem of retaining fungible characteristics of the underlying token while accurately tracking the attribution of fungible tokens to their respective minters. Fungible token coloring has a wide range of Web3 applications. One application which we highlight in this paper is the onchain yield-sharing collateral-based stablecoin.
Self-testing is a method to certify quantum states and measurements in a device-independent way. The device-independent certification of quantum properties is purely based on input-output measurement statistics of the involved devices with minimal knowledge about their internal workings. Bipartite pure entangled states can be self-tested, but, in the case of multipartite pure entangled states, the answer is not so straightforward. Nevertheless, \v{S}upi\'{c} et al. recently introduced a novel self-testing method for any pure entangled quantum state, which leverages network assistance and relies on bipartite entangled measurements. Hence, their scheme loses the true device-independent flavor of self-testing. In this regard, we provide a self-testing scheme for genuine multipartite pure entangle states in the true sense by employing a generalized Hardy-type non-local argument. It is important to note that our approach involves only local operations and classical communications and it does not depend on bipartite entangled measurements and is free from any network assistance. In addition, we provide the device-independent bound of the maximum probability of success of the generalized Hardy-type nonlocality test.
While this question remains widely open, in this short note, we prove that the two-dimensional Sobol' sequence is not quasi-uniform. This result partially answers an unsolved problem of Sobol' and Shukhman (2007) in a negative manner.
Partiality is a natural phenomenon in computability that we cannot get around. So, the question is whether we can give the areas where partiality occurs, that is, where non-termination happens, more structure. In this paper we consider function classes which besides the total functions only contain finite functions whose domain of definition is an initial segment of the natural numbers. Such functions appear naturally in computation. We show that a rich computability theory can be developed for these functions classes which embraces the central results of classical computability theory, in which all partial (computable) functions are considered. To do so, the concept of a G\"odel number is generalised, resulting in a broader class of numberings. The central algorithmic idea in this approach is to search in enumerated lists. In this way, function computability is reduced to set listability. Besides the development of a computability theory for the functions classes, the new numberings -- called quasi-G\"odel numberings -- are studied from a numbering-theoretic perspective: they are complete, and each of the function classes numbered in this way is a retract of the G\"odel numbered set of all partial computable functions. Moreover, the Rogers semi-lattice of all computable numberings of the considered function classes is studied and results as in the case of the computable numberings of the partial computable functions are obtained. The function classes are shown to be effectively given algebraic domains in the sense of Scott-Ershov. The quasi-G\"odel numberings are exactly the admissible numberings of the computable elements of the domain. Moreover, the domain can be computably mapped onto every other effectively given one so that every admissible numbering of the computable domain elements is generated by a quasi-G\"odel numbering via this mapping.
Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the $\ell_1$-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.