We propose a multicountry quantile factor augmeneted vector autoregression (QFAVAR) to model heterogeneities both across countries and across characteristics of the distributions of macroeconomic time series. The presence of quantile factors allows for summarizing these two heterogeneities in a parsimonious way. We develop two algorithms for posterior inference that feature varying level of trade-off between estimation precision and computational speed. Using monthly data for the euro area, we establish the good empirical properties of the QFAVAR as a tool for assessing the effects of global shocks on country-level macroeconomic risks. In particular, QFAVAR short-run tail forecasts are more accurate compared to a FAVAR with symmetric Gaussian errors, as well as univariate quantile autoregressions that ignore comovements among quantiles of macroeconomic variables. We also illustrate how quantile impulse response functions and quantile connectedness measures, resulting from the new model, can be used to implement joint risk scenario analysis.
A networked time series (NETS) is a family of time series on a given graph, one for each node. It has found a wide range of applications from intelligent transportation, environment monitoring to mobile network management. An important task in such applications is to predict the future values of a NETS based on its historical values and the underlying graph. Most existing methods require complete data for training. However, in real-world scenarios, it is not uncommon to have missing data due to sensor malfunction, incomplete sensing coverage, etc. In this paper, we study the problem of NETS prediction with incomplete data. We propose NETS-ImpGAN, a novel deep learning framework that can be trained on incomplete data with missing values in both history and future. Furthermore, we propose novel Graph Temporal Attention Networks by incorporating the attention mechanism to capture both inter-time series correlations and temporal correlations. We conduct extensive experiments on three real-world datasets under different missing patterns and missing rates. The experimental results show that NETS-ImpGAN outperforms existing methods except when data exhibit very low variance, in which case NETS-ImpGAN still achieves competitive performance.
The detection of exoplanets with the radial velocity method consists in detecting variations of the stellar velocity caused by an unseen sub-stellar companion. Instrumental errors, irregular time sampling, and different noise sources originating in the intrinsic variability of the star can hinder the interpretation of the data, and even lead to spurious detections. In recent times, work began to emerge in the field of extrasolar planets that use Machine Learning algorithms, some with results that exceed those obtained with the traditional techniques in the field. We seek to explore the scope of the neural networks in the radial velocity method, in particular for exoplanet detection in the presence of correlated noise of stellar origin. In this work, a neural network is proposed to replace the computation of the significance of the signal detected with the radial velocity method and to classify it as of planetary origin or not. The algorithm is trained using synthetic data of systems with and without planetary companions. We injected realistic correlated noise in the simulations, based on previous studies of the behaviour of stellar activity. The performance of the network is compared to the traditional method based on null hypothesis significance testing. The network achieves 28 % fewer false positives. The improvement is observed mainly in the detection of small-amplitude signals associated with low-mass planets. In addition, its execution time is five orders of magnitude faster than the traditional method. The superior performance exhibited by the algorithm has only been tested on simulated radial velocity data so far. Although in principle it should be straightforward to adapt it for use in real time series, its performance has to be tested thoroughly. Future work should permit evaluating its potential for adoption as a valuable tool for exoplanet detection.
Colonies of the arboreal turtle ant create networks of trails that link nests and food sources on the graph formed by branches and vines in the canopy of the tropical forest. Ants put down a volatile pheromone on edges as they traverse them. At each vertex, the next edge to traverse is chosen using a decision rule based on the current pheromone level. There is a bidirectional flow of ants around the network. In a field study, Chandrasekhar et al. (2021) observed that the trail networks approximately minimize the number of vertices, thus solving a variant of the popular shortest path problem without any central control and with minimal computational resources. We propose a biologically plausible model, based on a variant of the reinforced random walk on a graph, which explains this observation and suggests surprising algorithms for the shortest path problem and its variants. Through simulations and analysis, we show that when the rate of flow of ants does not change, the dynamics converges to the path with the minimum number of vertices, as observed in the field. The dynamics converges to the shortest path when the rate of flow increases with time, so the colony can solve the shortest path problem merely by increasing the flow rate. We also show that to guarantee convergence to the shortest path, bidirectional flow and a decision rule dividing the flow in proportion to the pheromone level are necessary, but convergence to approximately short paths is possible with other decision rules.
In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment.
Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-$\text{M}^3$-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-$\text{M}^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as virtual acoustic environments using room acoustics simulation, the perceptual relevance of individual parameters in the simulation is unclear. Here we investigate the importance of underlying acoustical measurements and technical evaluation methods to obtain high-quality room acoustics simulations in agreement with dummy-head recordings of a real space. We focus on the role of source directivity. The effect of including measured, modelled, and omnidirectional source directivity in room acoustics simulations was assessed in comparison to the measured reference. Technical evaluation strategies to verify and improve the accuracy of various elements in the simulation processing chain from source, the room properties, to the receiver are presented. Perceptual results from an ABX listening experiment with random speech tokens are shown and compared with technical measures for a ranking of simulation approaches.
Given the high incidence of cardio and cerebrovascular diseases (CVD), and its association with morbidity and mortality, its prevention is a major public health issue. A high level of blood pressure is a well-known risk factor for these events and an increasing number of studies suggest that blood pressure variability may also be an independent risk factor. However, these studies suffer from significant methodological weaknesses. In this work we propose a new location-scale joint model for the repeated measures of a marker and competing events. This joint model combines a mixed model including a subject-specific and time-dependent residual variance modeled through random effects, and cause-specific proportional intensity models for the competing events. The risk of events may depend simultaneously on the current value of the variance, as well as, the current value and the current slope of the marker trajectory. The model is estimated by maximizing the likelihood function using the Marquardt-Levenberg algorithm. The estimation procedure is implemented in a R-package and is validated through a simulation study. This model is applied to study the association between blood pressure variability and the risk of CVD and death from other causes. Using data from a large clinical trial on the secondary prevention of stroke, we find that the current individual variability of blood pressure is associated with the risk of CVD and death. Moreover, the comparison with a model without heterogeneous variance shows the importance of taking into account this variability in the goodness-of-fit and for dynamic predictions.
We consider geometric problems on planar $n^2$-point sets in the congested clique model. Initially, each node in the $n$-clique network holds a batch of $n$ distinct points in the Euclidean plane given by $O(\log n)$-bit coordinates. In each round, each node can send a distinct $O(\log n)$-bit message to each other node in the clique and perform unlimited local computations. We show that the convex hull of the input $n^2$-point set can be constructed in $O(\min\{ h,\log n\})$ rounds, where $h$ is the size of the hull, on the congested clique. We also show that a triangulation of the input $n^2$-point set can be constructed in $O(\log^2n)$ rounds on the congested clique. Finally, we demonstrate that the Voronoi diagram of $n^2$ points with $O(\log n)$-bit coordinates drawn uniformly at random from a unit square can be computed within the square with high probability in $O(1)$ rounds on the congested clique.
Time series forecasting is widely used in business intelligence, e.g., forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.