In this work we initiate the study of buy-and-sell prophet inequalities. We start by considering what is arguably the most fundamental setting. In this setting the online algorithm observes a sequence of prices one after the other. At each time step, the online algorithm can decide to buy and pay the current price if it does not hold the item already; or it can decide to sell and collect the current price as a reward if it holds the item. We show that for i.i.d. prices a single-threshold online algorithm achieves at least $1/2$ of the expected profit of the optimal offline algorithm and we prove that this is optimal. For non-i.i.d. prices in random order, where prices are no longer independent, we give a single-threshold online algorithm that achieves at least a $1/16$ fraction of the expected profit of the optimal offline algorithm. We also show that for this setting no online algorithm can yield a better than $1/3$ approximation, and thus establish a formal separation from the i.i.d. case. On the other hand, we present a threshold-based online algorithm for this setting that yields a $1/2-o(1)$ approximation. For non-i.i.d. prices no approximation is possible. We use the results for these base cases to solve a variety of more complex settings. For instance, we show a $1/2-o(1)$ approximation for settings where prices are affiliated and the online algorithm has only access to a single sample. We also extend our upper and lower bounds for the single item case to $k$ items, and thus in particular show that it is impossible to achieve $1-o(1)$ approximations. For the budgeted version, where fractions of an item can be bought, and gains can be reinvested, we show a constant-factor approximation to the optimal offline algorithm's growth rate. In a setting with $k$ item types and price streams, we achieve a $\Omega(1/k)$ approximation for the unit-capacity case, which is optimal.
We introduce a new quantum algorithm for computing the Betti numbers of a simplicial complex. In contrast to previous quantum algorithms that work by estimating the eigenvalues of the combinatorial Laplacian, our algorithm is an instance of the generic Incremental Algorithm for computing Betti numbers that incrementally adds simplices to the simplicial complex and tests whether or not they create a cycle. In contrast to existing quantum algorithms for computing Betti numbers that work best when the complex has close to the maximal number of simplices, our algorithm works best for sparse complexes. To test whether a simplex creates a cycle, we introduce a quantum span-program algorithm. We show that the query complexity of our span program is parameterized by quantities called the effective resistance and effective capacitance of the boundary of the simplex. Unfortunately, we also prove upper and lower bounds on the effective resistance and capacitance, showing both quantities can be exponentially large with respect to the size of the complex, implying that our algorithm would have to run for exponential time to exactly compute Betti numbers. However, as a corollary to these bounds, we show that the spectral gap of the combinatorial Laplacian can be exponentially small. As the runtime of all previous quantum algorithms for computing Betti numbers are parameterized by the inverse of the spectral gap, our bounds show that all quantum algorithms for computing Betti numbers must run for exponentially long to exactly compute Betti numbers. Finally, we prove some novel formulas for effective resistance and effective capacitance to give intuition for these quantities.
Software testing is a mandatory activity in any serious software development process, as bugs are a reality in software development. This raises the question of quality: good tests are effective in finding bugs, but until a test case actually finds a bug, its effectiveness remains unknown. Therefore, determining what constitutes a good or bad test is necessary. This is not a simple task, and there are a number of studies that identify different characteristics of a good test case. A previous study evaluated 29 hypotheses regarding what constitutes a good test case, but the findings are based on developers' beliefs, which are subjective and biased. In this paper we investigate eight of these hypotheses, through an extensive empirical study based on open software repositories. Despite our best efforts, we were unable to find evidence that supports these beliefs. This indicates that, although these hypotheses represent good software engineering advice, they do not necessarily mean that they are enough to provide the desired outcome of good testing code.
We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
We show how the basic Combinatory Homomorphic Automatic Differentiation (CHAD) algorithm can be optimised, using well-known methods, to yield a simple and generally applicable reverse-mode automatic differentiation (AD) technique that has the correct computational complexity that we would expect of a reverse AD algorithm. Specifically, we show that the standard optimisations of sparse vectors and state-passing style code (as well as defunctionalisation/closure conversion, for higher-order languages) give us a purely functional algorithm that is most of the way to the correct complexity, with (functional) mutable updates taking care of the final log-factors. We provide an Agda formalisation of our complexity proof. Finally, we discuss how the techniques apply to differentiating parallel functional programs: the key observations are 1) that all required mutability is (commutative, associative) accumulation, which lets us preserve task-parallelism and 2) that we can write down data-parallel derivatives for most data-parallel array primitives.
Recommender system is adored in the internet industry as one of the most profitable technologies. Unlike other sectors such as fraud detection in the Fintech industry, recommender system is both deep and broad. In recent years, many researchers start to focus on the cold-start problem of recommender systems. In spite of the large volume of research literature, the majority of the research utilizes transfer learning / meta learning and pretrained model to solve the problem. Although the researchers claim the effectiveness of the approaches, everyone of them does rely on extra input data from other sources. In 2021 and 2022, several zeroshot learning algorithm for recommender system such as ZeroMat, DotMat, PoissonMat and PowerMat were invented. They are the first batch of the algorithms that rely on no transfer learning or pretrained models to tackle the problem. In this paper, we follow this line and invent a new zeroshot learning algorithm named LogitMat. We take advantage of the Zipf Law property of the user item rating values and logistic regression model to tackle the cold-start problem and generate competitive results with other competing techniques. We prove in experiments that our algorithm is fast, robust and effective.
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.
Video instance segmentation (VIS) is the task that requires simultaneously classifying, segmenting and tracking object instances of interest in video. Recent methods typically develop sophisticated pipelines to tackle this task. Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem. Given a video clip consisting of multiple image frames as input, VisTR outputs the sequence of masks for each instance in the video in order directly. At the core is a new, effective instance sequence matching and segmentation strategy, which supervises and segments instances at the sequence level as a whole. VisTR frames the instance segmentation and tracking in the same perspective of similarity learning, thus considerably simplifying the overall pipeline and is significantly different from existing approaches. Without bells and whistles, VisTR achieves the highest speed among all existing VIS models, and achieves the best result among methods using single model on the YouTube-VIS dataset. For the first time, we demonstrate a much simpler and faster video instance segmentation framework built upon Transformers, achieving competitive accuracy. We hope that VisTR can motivate future research for more video understanding tasks.
While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.
Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.