Extreme-value copulas arise as the limiting dependence structure of component-wise maxima. Defined in terms of a functional parameter, they are one of the most widespread copula families due to its flexibility and ability to capture asymmetry. Despite this, meeting the complex analytical properties of this parameter in an unconstrained setting still remains a challenge, restricting most uses to either models with very few parameters or non-parametric models. On this paper we focus on the bivariate case and propose a novel approach for estimating this functional parameter in a semiparametric manner. Our procedure relies on a series of basic transformations starting from a zero-integral spline. Spline coordinates are fit through maximum likelihood estimation, leveraging gradient optimization, without imposing further constraints. We conduct several experiments on both simulated and real data. Specifically, we test our method on scarce data gathered by the gravitational wave detection LIGO and Virgo collaborations.
Multipoint evaluation is the computational task of evaluating a polynomial given as a list of coefficients at a given set of inputs. And while \emph{nearly linear time} algorithms have been known for the univariate instance of multipoint evaluation for close to five decades due to a work of Borodin and Moenck \cite{BM74}, fast algorithms for the multivariate version have been much harder to come by. In a significant improvement to the state of art for this problem, Umans \cite{Umans08} and Kedlaya \& Umans \cite{Kedlaya11} gave nearly linear time algorithms for this problem over field of small characteristic and over all finite fields respectively, provided that the number of variables $n$ is at most $d^{o(1)}$ where the degree of the input polynomial in every variable is less than $d$. They also stated the question of designing fast algorithms for the large variable case (i.e. $n \notin d^{o(1)}$) as an open problem. In this work, we show that there is a deterministic algorithm for multivariate multipoint evaluation over a field $\F_{q}$ of characteristic $p$ which evaluates an $n$-variate polynomial of degree less than $d$ in each variable on $N$ inputs in time $$\left((N + d^n)^{1 + o(1)}\text{poly}(\log q, d, p, n)\right)$$ provided that $p$ is at most $d^{o(1)}$, and $q$ is at most $(\exp(\exp(\exp(\cdots (\exp(d)))))$, where the height of this tower of exponentials is fixed. When the number of variables is large (e.g. $n \notin d^{o(1)}$), this is the first {nearly linear} time algorithm for this problem over any (large enough) field.Our algorithm is based on elementary algebraic ideas and this algebraic structure naturally leads to the applications to data structure upper bounds for polynomial evaluation and to an upper bound on the rigidity of Vandermonde matrices.
We consider the problem of assigning appearing times to the edges of a digraph in order to maximize the (average) temporal reachability between pairs of nodes. Motivated by the application to public transit networks, where edges cannot be scheduled independently one of another, we consider the setting where the edges are grouped into certain walks (called trips) in the digraph and where assigning the appearing time to the first edge of a trip forces the appearing times of the subsequent edges. In this setting, we show that, quite surprisingly, it is NP-complete to decide whether there exists an assignment of times connecting a given pair of nodes. This result allows us to prove that the problem of maximising the temporal reachability cannot be approximated within a factor better than some polynomial term in the size of the graph. We thus focus on the case where, for each pair of nodes, there exists an assignment of times such that one node is reachable from the other. We call this property strong temporalisability. It is a very natural assumption for the application to public transit networks. On the negative side, the problem of maximising the temporal reachability remains hard to approximate within a factor $\sqrt$ n/12 in that setting. Moreover, we show the existence of collections of trips that are strongly temporalisable but for which any assignment of starting times to the trips connects at most an O(1/ $\sqrt$ n) fraction of all pairs of nodes. On the positive side, we show that there must exist an assignment of times that connects a constant fraction of all pairs in the strongly temporalisable and symmetric case, that is, when the set of trips to be scheduled is such that, for each trip, there is a symmetric trip visiting the same nodes in reverse order. Keywords:edge labeling edge scheduled network network optimisation temporal graph temporal path temporal reachability time assignment
Accurate and trustworthy epidemic forecasting is an important problem that has impact on public health planning and disease mitigation. Most existing epidemic forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations; e.g. it is difficult to specify meaningful priors in Bayesian NNs, while methods like deep ensembling are computationally expensive in practice. In this paper, we fill this important gap. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP, which directly models the probability density of the forecast value. EPIFNP leverages a dynamic stochastic correlation graph to model the correlations between sequences in a non-parametric way, and designs different stochastic latent variables to capture functional uncertainty from different perspectives. Our extensive experiments in a real-time flu forecasting setting show that EPIFNP significantly outperforms previous state-of-the-art models in both accuracy and calibration metrics, up to 2.5x in accuracy and 2.4x in calibration. Additionally, due to properties of its generative process,EPIFNP learns the relations between the current season and similar patterns of historical seasons,enabling interpretable forecasts. Beyond epidemic forecasting, the EPIFNP can be of independent interest for advancing principled uncertainty quantification in deep sequential models for predictive analytics
A fibration of graphs is an homomorphism that is a local isomorphism of in-neighbourhoods, much in the same way a covering projection is a local isomorphism of neighbourhoods. Recently, it has been shown that graph fibrations are useful tools to uncover symmetries and synchronization patterns in biological networks ranging from gene, protein,and metabolic networks to the brain. However, the inherent incompleteness and disordered nature of biological data precludes the application of the definition of fibration as it is; as a consequence, also the currently known algorithms to identify fibrations fail in these domains. In this paper, we introduce and develop systematically the theory of quasifibrations which attempts to capture more realistic patterns of almost-synchronization of units in biological networks. We provide an algorithmic solution to the problem of finding quasifibrations in networks where the existence of missing links and variability across samples preclude the identification of perfect symmetries in the connectivity structure. We test the algorithm against other strategies to repair missing links in incomplete networks using real connectome data and synthetic networks. Quasifibrations can be applied to reconstruct any incomplete network structure characterized by underlying symmetries and almost synchronized clusters.
A confidence sequence is a sequence of confidence intervals that is uniformly valid over an unbounded time horizon. Our work develops confidence sequences whose widths go to zero, with nonasymptotic coverage guarantees under nonparametric conditions. We draw connections between the Cram\'er-Chernoff method for exponential concentration, the law of the iterated logarithm (LIL), and the sequential probability ratio test -- our confidence sequences are time-uniform extensions of the first; provide tight, nonasymptotic characterizations of the second; and generalize the third to nonparametric settings, including sub-Gaussian and Bernstein conditions, self-normalized processes, and matrix martingales. We illustrate the generality of our proof techniques by deriving an empirical-Bernstein bound growing at a LIL rate, as well as a novel upper LIL for the maximum eigenvalue of a sum of random matrices. Finally, we apply our methods to covariance matrix estimation and to estimation of sample average treatment effect under the Neyman-Rubin potential outcomes model.
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.
Few-shot Learning aims to learn classifiers for new classes with only a few training examples per class. Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels. The meta-learning approaches train a meta learner to predict weights of homogeneous-structured task-specific networks, requiring a uniform number of classes across tasks. The metric-learning approaches learn one task-invariant metric for all the tasks, and they fail if the tasks diverge. We propose to deal with these limitations with meta metric learning. Our meta metric learning approach consists of task-specific learners, that exploit metric learning to handle flexible labels, and a meta learner, that discovers good parameters and gradient decent to specify the metrics in task-specific learners. Thus the proposed model is able to handle unbalanced classes as well as to generate task-specific metrics. We test our approach in the `$k$-shot $N$-way' few-shot learning setting used in previous work and new realistic few-shot setting with diverse multi-domain tasks and flexible label numbers. Experiments show that our approach attains superior performances in both settings.
Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.
We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.