This paper presents an engine able to forecast jointly the concentrations of the main pollutants harming people's health: nitrogen dioxide (NO2), ozone (O3) and particulate matter (PM2.5 and PM10, which are respectively the particles whose diameters are below 2.5um and 10um respectively). The engine is fed with air quality monitoring stations' measurements, weather forecasts, physical models' outputs and traffic estimates to produce forecasts up to 24 hours. The forecasts are produced with several spatial resolutions, from a few dozens of meters to dozens of kilometers, fitting several use-cases needing air quality data. We introduce the Scale-Unit block, which enables to integrate seamlessly all available inputs at a given resolution to return forecasts at the same resolution. Then, the engine is based on a U-Net architecture built with several of those blocks, giving it the ability to process inputs and to output predictions at different resolutions. We have implemented and evaluated the engine on the largest cities in Europe and the United States, and it clearly outperforms other prediction methods. In particular, the out-of-sample accuracy remains high, meaning that the engine can be used in cities which are not included in the training dataset. A valuable advantage of the engine is that it does not need much computing power: the forecasts can be built in a few minutes on a standard CPU. Thus, they can be updated very frequently, as soon as new air quality monitoring stations' measurements are available (generally every hour), which is not the case of physical models traditionally used for air quality forecasting.
Traffic forecasting is important for the success of intelligent transportation systems. Deep learning models, including convolution neural networks and recurrent neural networks, have been extensively applied in traffic forecasting problems to model spatial and temporal dependencies. In recent years, to model the graph structures in transportation systems as well as contextual information, graph neural networks have been introduced and have achieved state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of research using different graph neural networks, e.g. graph convolutional and graph attention networks, in various traffic forecasting problems, e.g. road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in ride-hailing platforms. We also present a comprehensive list of open data and source resources for each problem and identify future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public GitHub repository where the latest papers, open data, and source resources will be updated.
This expository paper discusses Bayesian decision analysis perspectives on problems of constrained forecasting. Foundational and pedagogic discussion contrasts decision analytic approaches with the traditional, but typically inappropriate, inferential approach. Illustrative examples include development of novel constrained point forecasting and entropic tilting methodology to explore consistency of a predictive distribution with an imposed or hypothesized constraint. Linear, aggregate constraints define illuminating examples that relate to broadly important problems involving aggregate and hierarchical constraints in commercial and economic forecasting. Discussion explores the impact of different loss functions, questions of how constrained forecasting is impacted by dependencies among outcomes being predicted, and promotes the broader use of decision analysis including routine evaluation of predictive distributions of loss under chosen forecasts/decisions. Extensions to more general constrained forecasting problems, connections with broader interests in forecast reconciliation and other considerations are noted.
An essential factor to achieve high performance in face recognition systems is the quality of its samples. Since these systems are involved in daily life there is a strong need of making face recognition processes understandable for humans. In this work, we introduce the concept of pixel-level face image quality that determines the utility of pixels in a face image for recognition. We propose a training-free approach to assess the pixel-level qualities of a face image given an arbitrary face recognition network. To achieve this, a model-specific quality value of the input image is estimated and used to build a sample-specific quality regression model. Based on this model, quality-based gradients are back-propagated and converted into pixel-level quality estimates. In the experiments, we qualitatively and quantitatively investigated the meaningfulness of our proposed pixel-level qualities based on real and artificial disturbances and by comparing the explanation maps on faces incompliant with the ICAO standards. In all scenarios, the results demonstrate that the proposed solution produces meaningful pixel-level qualities enhancing the interpretability of the complete face image quality. The code is publicly available
To reduce passenger waiting time and driver search friction, ride-hailing companies need to accurately forecast spatio-temporal demand and supply-demand gap. However, due to spatio-temporal dependencies pertaining to demand and supply-demand gap in a ride-hailing system, making accurate forecasts for both demand and supply-demand gap is a difficult task. Furthermore, due to confidentiality and privacy issues, ride-hailing data are sometimes released to the researchers by removing spatial adjacency information of the zones, which hinders the detection of spatio-temporal dependencies. To that end, a novel spatio-temporal deep learning architecture is proposed in this paper for forecasting demand and supply-demand gap in a ride-hailing system with anonymized spatial adjacency information, which integrates feature importance layer with a spatio-temporal deep learning architecture containing one-dimensional convolutional neural network (CNN) and zone-distributed independently recurrent neural network (IndRNN). The developed architecture is tested with real-world datasets of Didi Chuxing, which shows that our models based on the proposed architecture can outperform conventional time-series models (e.g., ARIMA) and machine learning models (e.g., gradient boosting machine, distributed random forest, generalized linear model, artificial neural network). Additionally, the feature importance layer provides an interpretation of the model by revealing the contribution of the input features utilized in prediction.
We introduce NeuralProphet, a successor to Facebook Prophet, which set an industry standard for explainable, scalable, and user-friendly forecasting frameworks. With the proliferation of time series data, explainable forecasting remains a challenging task for business and operational decision making. Hybrid solutions are needed to bridge the gap between interpretable classical methods and scalable deep learning models. We view Prophet as a precursor to such a solution. However, Prophet lacks local context, which is essential for forecasting the near-term future and is challenging to extend due to its Stan backend. NeuralProphet is a hybrid forecasting framework based on PyTorch and trained with standard deep learning methods, making it easy for developers to extend the framework. Local context is introduced with auto-regression and covariate modules, which can be configured as classical linear regression or as Neural Networks. Otherwise, NeuralProphet retains the design philosophy of Prophet and provides the same basic model components. Our results demonstrate that NeuralProphet produces interpretable forecast components of equivalent or superior quality to Prophet on a set of generated time series. NeuralProphet outperforms Prophet on a diverse collection of real-world datasets. For short to medium-term forecasts, NeuralProphet improves forecast accuracy by 55 to 92 percent.
Producing an accurate weather forecast and a reliable quantification of its uncertainty is an open scientific challenge. Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty. The main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty, particularly those associated with model errors. In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input. We compare different training strategies: one based on a direct training using the mean and spread of an ensemble forecast as target, the other ones rely on an indirect training strategy using a deterministic forecast as target in which the uncertainty is implicitly learned from the data. For the last approach two alternative loss functions are proposed and evaluated, one based on the data observation likelihood and the other one based on a local estimation of the error. The performance of the networks is examined at different lead times and in scenarios with and without model errors. Experiments using the Lorenz'96 model show that the ANNs are able to emulate some of the properties of ensemble forecasts like the filtering of the most unpredictable modes and a state-dependent quantification of the forecast uncertainty. Moreover, ANNs provide a reliable estimation of the forecast uncertainty in the presence of model error.
Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. We propose a novel cross interaction attention mechanism that exploits historical information of both persons, and learns to predict cross dependencies between the two pose sequences. Since no dataset to train such interactive situations is available, we collected ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing Lindy-hop dancing actions, which contains 115 sequences with 30K frames annotated with 3D body poses and shapes. We thoroughly evaluate our cross interaction network on ExPI and show that both in short- and long-term predictions, it consistently outperforms state-of-the-art methods for single-person motion prediction.
Liver cancer is one of the most common malignant diseases in the world. Segmentation and labeling of liver tumors and blood vessels in CT images can provide convenience for doctors in liver tumor diagnosis and surgical intervention. In the past decades, automatic CT segmentation methods based on deep learning have received widespread attention in the medical field. Many state-of-the-art segmentation algorithms appeared during this period. Yet, most of the existing segmentation methods only care about the local feature context and have a perception defect in the global relevance of medical images, which significantly affects the segmentation effect of liver tumors and blood vessels. We introduce a multi-scale feature context fusion network called TransFusionNet based on Transformer and SEBottleNet. This network can accurately detect and identify the details of the region of interest of the liver vessel, meanwhile it can improve the recognition of morphologic margins of liver tumors by exploiting the global information of CT images. Experiments show that TransFusionNet is better than the state-of-the-art method on both the public dataset LITS and 3Dircadb and our clinical dataset. Finally, we propose an automatic 3D reconstruction algorithm based on the trained model. The algorithm can complete the reconstruction quickly and accurately in 1 second.
With the fast development of modern deep learning techniques, the study of dynamic systems and neural networks is increasingly benefiting each other in a lot of different ways. Since uncertainties often arise in real world observations, SDEs (stochastic differential equations) come to play an important role. To be more specific, in this paper, we use a collection of SDEs equipped with neural networks to predict long-term trend of noisy time series which has big jump properties and high probability distribution shift. Our contributions are, first, we use the phase space reconstruction method to extract intrinsic dimension of the time series data so as to determine the input structure for our forecasting model. Second, we explore SDEs driven by $\alpha$-stable L\'evy motion to model the time series data and solve the problem through neural network approximation. Third, we construct the attention mechanism to achieve multi-time step prediction. Finally, we illustrate our method by applying it to stock marketing time series prediction and show the results outperform several baseline deep learning models.
Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.