An important statistic in analyzing some (finite) network data, called \emph{PageRank}, and a related new statistic, which we call \emph{MarkovRank}, are studied in this paper. The PageRank was originally developed by the cofounders of \emph{Google}, Sergey Brin and Larry Page, to optimize the ranking of websites for their search engine outcomes, and it is computed using an iterative algorithm, based on the idea that nodes with a larger number of incoming edges are more important. The aim of this paper is to analyze the common features and some significant differences between the PageRank and the new Rank. A common merit of the two Ranks is that both statistics can be easily computed by either the mathematical computation or the iterative algorithm. According to the analysis of some examples, these two statistics seem to return somewhat different values, but the resulting rank statistics of both statistics are not far away from each other. One of the differences is that only MarkovRank has the property that its rank statistic does not depend on any tuning parameter, and it is determined only through the adjacency matrix for given network data. Moreover, it is also shown that the rank statistic of MarkovRank coincides with that of the stationary distribution vector of the corresponding Markov chain (with finite state space) whenever the Markov chain is regular. Thus, MarkovRank may have a potential to play a similar role to the well established PageRank from a practical point of view, not only commonly with light computational tasks, but also with some new theoretical validity.
While digital divide studies primarily focused on access to information and communications technology (ICT) in the past, its influence on other associated dimensions such as privacy is becoming critical with a far-reaching impact on the people and society. For example, the various levels of government legislation and compliance on information privacy worldwide have created a new era of digital divide in the privacy preservation domain. In this article, the concept "digital privacy divide (DPD)" is introduced to describe the perceived gap in the privacy preservation of individuals based on the geopolitical location of different countries. To better understand the DPD phenomenon, we created an online questionnaire and collected answers from more than 700 respondents from four different countries (the United States, Germany, Bangladesh, and India) who come from two distinct cultural orientations as per Hofstede's individualist vs. collectivist society. However, our results revealed some interesting findings. DPD does not depend on Hofstede's cultural orientation of the countries. For example, individuals residing in Germany and Bangladesh share similar privacy concerns, while there is a significant similarity among individuals residing in the United States and India. Moreover, while most respondents acknowledge the importance of privacy legislation to protect their digital privacy, they do not mind their governments to allow domestic companies and organizations collecting personal data on individuals residing outside their countries, if there are economic, employment, and crime prevention benefits. These results suggest a social dilemma in the perceived privacy preservation, which could be dependent on many other contextual factors beyond government legislation and countries' cultural orientation.
In this paper, a class of statistics based on high frequency observations of oscillating and skew Brownian motions is considered. Their convergence rate towards the local time of the underlying process is obtained in form of a Central Limit Theorem. Oscillating and skew Brownian motions are solutions to stochastic differential equations with singular coefficients: piecewise constant diffusion coefficient or drift involving the local time. The result is applied to provide estimators of the parameter of skew Brownian motion and study their asymptotic behavior. Moreover, in the case of the classical statistic given by the normalized number of crossings, the result is proved to hold for a larger class of It\^o-processes with singular coefficients.
We present a basis for studying questions of cause and effect in statistics which subsumes and reconciles the models proposed by Pearl, Robins, Rubin and others, and which, as far as mathematical notions and notation are concerned, is entirely conventional. In particular, we show that, contrary to what several authors had thought, standard probability can be used to treat problems that involve notions of causality, and in a way not essentially different from the way it has been used in the area generally known (since the 1960s, at least) as 'applied probability'. Conventional, elementary proofs are given of some of the most important results obtained by the various schools of 'statistical causality', and a variety of examples considered by those schools are worked out in detail. Pearl's 'calculus of intervention' is examined anew, and its first two rules are formulated and proved by means of elementary probability for the first time since they were stated 25 years or so ago.
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.
Many modern data analytics applications on graphs operate on domains where graph topology is not known a priori, and hence its determination becomes part of the problem definition, rather than serving as prior knowledge which aids the problem solution. Part III of this monograph starts by addressing ways to learn graph topology, from the case where the physics of the problem already suggest a possible topology, through to most general cases where the graph topology is learned from the data. A particular emphasis is on graph topology definition based on the correlation and precision matrices of the observed data, combined with additional prior knowledge and structural conditions, such as the smoothness or sparsity of graph connections. For learning sparse graphs (with small number of edges), the least absolute shrinkage and selection operator, known as LASSO is employed, along with its graph specific variant, graphical LASSO. For completeness, both variants of LASSO are derived in an intuitive way, and explained. An in-depth elaboration of the graph topology learning paradigm is provided through several examples on physically well defined graphs, such as electric circuits, linear heat transfer, social and computer networks, and spring-mass systems. As many graph neural networks (GNN) and convolutional graph networks (GCN) are emerging, we have also reviewed the main trends in GNNs and GCNs, from the perspective of graph signal filtering. Tensor representation of lattice-structured graphs is next considered, and it is shown that tensors (multidimensional data arrays) are a special class of graph signals, whereby the graph vertices reside on a high-dimensional regular lattice structure. This part of monograph concludes with two emerging applications in financial data processing and underground transportation networks modeling.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.
Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.
Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Moreover, the task is further complicated when applied to Twitter data that is known to be highly informal and noisy. In this paper, we develop a hybrid method for sentiment analysis for Arabic tweets for a specific Arabic dialect which is the Saudi Dialect. Several features were engineered and evaluated using a feature backward selection method. Then a hybrid method that combines a corpus-based and lexicon-based method was developed for several classification models (two-way, three-way, four-way). The best F1-score for each of these models was (69.9,61.63,55.07) respectively.
Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.$L2$-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate metric for $L2$-normalized embedding space, i.e., a hyper-sphere. Another challenge of current DDML methods is that their loss functions are usually based on rigid data formats, such as the triplet tuple. Thus, an extra process is needed to prepare data in specific formats. In addition, their losses are obtained from a limited number of samples, which leads to a lack of the global view of the embedding space. In this paper, we replace the Euclidean distance with the cosine similarity to better utilize the $L2$-normalization, which is able to attenuate the curse of dimensionality. More specifically, a novel loss function based on the von Mises-Fisher distribution is proposed to learn a compact hyper-spherical embedding space. Moreover, a new efficient learning algorithm is developed to better capture the global structure of the embedding space. Experiments for both classification and retrieval tasks on several standard datasets show that our method achieves state-of-the-art performance with a simpler training procedure. Furthermore, we demonstrate that, even with a small number of convolutional layers, our model can still obtain significantly better classification performance than the widely used softmax loss.
We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.