In this paper, we examine the biases arising in A/B tests where a firm modifies a continuous parameter, such as price, to estimate the global treatment effect associated to a given performance metric. Such biases emerge from canonical designs and estimators due to interference among market participants. We employ structural modeling and differential calculus to derive intuitive structural characterizations of this bias. We then specialize our general model to a standard revenue management pricing problem. This setting highlights a key potential pitfall in the use of pricing experiments to guide profit maximization: notably, the canonical estimator for the change in profits can have the {\em wrong sign}. In other words, following the guidance of the canonical estimator may lead the firm to move prices in the wrong direction, and thereby decrease profits relative to the status quo. We apply these results to a two-sided market model and show how this ``change of sign" regime depends on model parameters, and discuss structural and practical implications for platform operators.
In this paper, we provide novel tail bounds on the optimization error of Stochastic Mirror Descent for convex and Lipschitz objectives. Our analysis extends the existing tail bounds from the classical light-tailed Sub-Gaussian noise case to heavier-tailed noise regimes. We study the optimization error of the last iterate as well as the average of the iterates. We instantiate our results in two important cases: a class of noise with exponential tails and one with polynomial tails. A remarkable feature of our results is that they do not require an upper bound on the diameter of the domain. Finally, we support our theory with illustrative experiments that compare the behavior of the average of the iterates with that of the last iterate in heavy-tailed noise regimes.
In this work, we consider the problem of regularization in minimum mean-squared error (MMSE) linear filters. Exploiting the relationship with statistical machine learning methods, the regularization parameter is found from the observed signals in a simple and automatic manner. The proposed approach is illustrated through system identification examples, where the automatic regularization yields near-optimal results.
In this paper, we delve into the problem of using monetary incentives to encourage players to shift from an initial Nash equilibrium to a more favorable one within a game. Our main focus revolves around computing the minimum reward required to facilitate this equilibrium transition. The game involves a single row player who possesses $m$ strategies and $k$ column players, each endowed with $n$ strategies. Our findings reveal that determining whether the minimum reward is zero is NP-complete, and computing the minimum reward becomes APX-hard. Nonetheless, we bring some positive news, as this problem can be efficiently handled if either $k$ or $n$ is a fixed constant. Furthermore, we have devised an approximation algorithm with an additive error that runs in polynomial time. Lastly, we explore a specific case wherein the utility functions exhibit single-peaked characteristics, and we successfully demonstrate that the optimal reward can be computed in polynomial time.
In this paper we investigate the parameterized complexity of the task of counting and detecting occurrences of small patterns in unit disk graphs: Given an $n$-vertex unit disk graph $G$ with an embedding of ply $p$ (that is, the graph is represented as intersection graph with closed disks of unit size, and each point is contained in at most $p$ disks) and a $k$-vertex unit disk graph $P$, count the number of (induced) copies of $P$ in $G$. For general patterns $P$, we give an $2^{O(p k /\log k)}n^{O(1)}$ time algorithm for counting pattern occurrences. We show this is tight, even for ply $p=2$ and $k=n$: any $2^{o(n/\log n)}n^{O(1)}$ time algorithm violates the Exponential Time Hypothesis (ETH). For most natural classes of patterns, such as connected graphs and independent sets we present the following results: First, we give an $(pk)^{O(\sqrt{pk})}n^{O(1)}$ time algorithm, which is nearly tight under the ETH for bounded ply and many patterns. Second, for $p= k^{O(1)}$ we provide a Turing kernelization (i.e. we give a polynomial time preprocessing algorithm to reduce the instance size to $k^{O(1)}$). Our approach combines previous tools developed for planar subgraph isomorphism such as `efficient inclusion-exclusion' from [Nederlof STOC'20], and `isomorphisms checks' from [Bodlaender et al. ICALP'16] with a different separator hierarchy and a new bound on the number of non-isomorphic separations of small order tailored for unit disk graphs.
In this paper, we comprehensively analyze the vertical and horizontal extensions of existing memory hierarchy. The difference between memory and big memory is well reported. We present the state-of-the-art studies upon the big memory systems, together with design methodology and implementations. Persistence is the first principle of big memory systems. We further show the full-stack and moving persistence.
In this paper, we study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value can be quite large; thus it is necessary to estimate a threshold so that numbers above it are truncated to reduce the amount of noise that is required to all the data. The estimation must be done based on the data in a private fashion. We develop such a method that uses the Exponential Mechanism with a quality function that approximates well the utility goal while maintaining a low sensitivity. Given the threshold, we then propose a novel online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps into a framework for private publishing of stream data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noises to the stream, and a smoother that improves the result using post-processing. Within our framework, we design an algorithm satisfying the more stringent setting of DP called local DP (LDP). To our knowledge, this is the first LDP algorithm for publishing streaming data. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6-10 orders of magnitude in terms of utility (measured by the mean squared error of answering a random range query).
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.