Consider a binary statistical hypothesis testing problem, where $n$ independent and identically distributed random variables $Z^n$ are either distributed according to the null hypothesis $P$ or the alternative hypothesis $Q$, and only $P$ is known. A well-known test that is suitable for this case is the so-called Hoeffding test, which accepts $P$ if the Kullback-Leibler (KL) divergence between the empirical distribution of $Z^n$ and $P$ is below some threshold. This work characterizes the first and second-order terms of the type-II error probability for a fixed type-I error probability for the Hoeffding test as well as for divergence tests, where the KL divergence is replaced by a general divergence. It is demonstrated that, irrespective of the divergence, divergence tests achieve the first-order term of the Neyman-Pearson test, which is the optimal test when both $P$ and $Q$ are known. In contrast, the second-order term of divergence tests is strictly worse than that of the Neyman-Pearson test. It is further demonstrated that divergence tests with an invariant divergence achieve the same second-order term as the Hoeffding test, but divergence tests with a non-invariant divergence may outperform the Hoeffding test for some alternative hypotheses $Q$. Potentially, this behavior could be exploited by a composite hypothesis test with partial knowledge of the alternative hypothesis $Q$ by tailoring the divergence of the divergence test to the set of possible alternative hypotheses.
Discourse relation classification is an especially difficult task without explicit context markers \cite{Prasad2008ThePD}. Current approaches to implicit relation prediction solely rely on two neighboring sentences being targeted, ignoring the broader context of their surrounding environments \cite{Atwell2021WhereAW}. In this research, we propose three new methods in which to incorporate context in the task of sentence relation prediction: (1) Direct Neighbors (DNs), (2) Expanded Window Neighbors (EWNs), and (3) Part-Smart Random Neighbors (PSRNs). Our findings indicate that the inclusion of context beyond one discourse unit is harmful in the task of discourse relation classification.
We introduce and characterize the operational diversity order (ODO) in fading channels, as a proxy to the classical notion of diversity order at any arbitrary operational signal-to-noise ratio (SNR). Thanks to this definition, relevant insights are brought up in a number of cases: (i) We quantify that in line-of-sight scenarios an increased diversity order is attainable compared to that achieved asymptotically; (ii) this effect is attenuated, but still visible, in the presence of an additional dominant specular component; (iii) we confirm that the decay slope in Rayleigh product channels increases very slowly and never fully achieves unitary slope for finite values of SNR.
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.
This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.
We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients. Previous work omitted the problem and proposed various federated algorithms, while we provide a comprehensive answer from the perspective of computational constraints. We propose a federated algorithm and analyze the upper and lower bounds on the regret that show (i) collaboration is unnecessary in the absence of additional constraints on the problem; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces. We clarify the unnecessary nature of collaboration in previous federated algorithms, and improve the regret bounds of algorithms for distributed online multi-kernel learning at a smaller computational and communication cost. Our algorithm relies on three new techniques including an improved Bernstein's inequality for martingale, a federated online mirror descent framework, and decoupling model selection and predictions, which might be of independent interest.
Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq \epsilon$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ \epsilon\big)$.
This paper revisited the problem of Private Information Retrieval (PIR), where there are $N$ replicated non-communicating databases containing the same $M$ messages and a user who wishes to retrieve one of the messages without revealing the wanted message's index to the databases. However, we assume a block-fading additive white Gaussian noise multiple access channel (AWGN MAC) linking the user and the databases. Previous work \cite{shmuel2021private} presented a joint channel-PIR scheme, utilizing the Compute and Forward protocol, showing the potential of a joint channel-PIR scheme over a separated one. This paper proposes an improved joint scheme tailored for the PIR problem with $N$ databases over a block-fading AWGN. Unlike the C\&F protocol, our scheme offers reduced computational complexity while improving the scaling laws governing the achievable rate. Specifically, the achievable rate scales with the number of databases $N$ and the power $P$ similarly to the channel capacity without the privacy constraint and outperforms the C\&F-based approach. Furthermore, the analysis demonstrates that the improved rate exhibits only a finite gap from the unconstrained channel capacity -- one bit per second per Hz as $N$ increases.
We popularize the question whether, for $m$ large enough, all $m$-uniform shift-chain hypergraphs are properly $2$-colorable. On the other hand, we show that for every $m$ some $m$-uniform shift-chains are not polychromatic $3$-colorable.
Airplane refueling problem is a nonlinear unconstrained optimization problem with $n!$ feasible solutions. Given a fleet of $n$ airplanes with mid-air refueling technique, the question is to find the best refueling policy to make the last remaining airplane travels the farthest. In order to deal with the large scale of airplanes refueling instances, we proposed the definition of sequential feasible solution by employing the refueling properties of data structure. We proved that if an airplanes refueling instance has feasible solutions, it must have the sequential feasible solutions; and the optimal feasible solution must be the optimal sequential feasible solution. Then we proposed the sequential search algorithm which consists of two steps. The first step of the sequential search algorithm aims to seek out all of the sequential feasible solutions. When the input size of $n$ is greater than an index number, we proved that the number of the sequential feasible solutions will change to grow at a polynomial rate. The second step of the sequential search algorithm aims to search for the maximal sequential feasible solution by bubble sorting all of the sequential feasible solutions. Moreover, we built an efficient computability scheme, according to which we could forecast within a polynomial time the computational complexity of the sequential search algorithm that runs on any given airplanes refueling instance. Thus we could provide a computational strategy for decision makers or algorithm users by considering with their available computing resources.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.