District heating is a network of pipes through which heat is delivered from a centralised source. It is expected to play an important role in the decarbonisation of the energy sector in the coming years. In district heating, heat is traditionally generated through fossil fuels, often with combined heat and power (CHP) units. However, increasingly, waste heat is being used as a low carbon alternative, either directly or, for low temperature sources, via a heat pump. The design of district heating often has competing objectives: the need for inexpensive energy and meeting low carbon targets. In addition, the planning of district heating schemes is subject to multiple sources of uncertainty such as variability in heat demand and energy prices. This paper proposes a decision support tool to analyse and compare system designs for district heating under uncertainty using stochastic ordering (dominance). Contrary to traditional uncertainty metrics that provide statistical summaries and impose total ordering, stochastic ordering is a partial ordering and operates with full probability distributions. In our analysis, we apply the orderings in the mean and dispersion to the waste heat recovery problem in Brunswick, Germany.
Uncertainty in physical parameters can make the solution of forward or inverse light scattering problems in astrophysical, biological, and atmospheric sensing applications, cost prohibitive for real-time applications. For example, given a probability density in the parametric space of dimensions, refractive index and wavelength, the number of required evaluations for the expected scattering increases dramatically. In the case of dielectric and weakly absorbing spherical particles (both homogeneous and layered), we begin with a Fraunhofer approximation of the scattering coefficients consisting of Riccati-Bessel functions, and reduce it into simpler nested trigonometric approximations. They provide further computational advantages when parameterized on lines of constant optical path lengths. This can reduce the cost of evaluations by large factors $\approx$ 50, without a loss of accuracy in the integrals of these scattering coefficients. We analyze the errors of the proposed approximation, and present numerical results for a set of forward problems as a demonstration.
Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention approximation by sampling from a Gumbel distribution is upper bounded. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets. The experimental results demonstrate that our approach: (1) achieves the best predictive performance and uncertainty trade-off among compared methods; (2) exhibits very competitive (in most cases, improved) predictive performance on ID datasets; (3) is on par with Monte Carlo dropout and ensemble methods in uncertainty estimation on OOD datasets.
Black-box machine learning learning methods are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Distribution-free uncertainty quantification (distribution-free UQ) is a user-friendly paradigm for creating statistically rigorous confidence intervals/sets for such predictions. Critically, the intervals/sets are valid without distributional assumptions or model assumptions, possessing explicit guarantees even with finitely many datapoints. Moreover, they adapt to the difficulty of the input; when the input example is difficult, the uncertainty intervals/sets are large, signaling that the model might be wrong. Without much work and without retraining, one can use distribution-free methods on any underlying algorithm, such as a neural network, to produce confidence sets guaranteed to contain the ground truth with a user-specified probability, such as 90%. Indeed, the methods are easy-to-understand and general, applying to many modern prediction problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on. This hands-on introduction is aimed at a reader interested in the practical implementation of distribution-free UQ who is not necessarily a statistician. We lead the reader through the practical theory and applications of distribution-free UQ, beginning with conformal prediction and culminating with distribution-free control of any risk, such as the false-discovery rate, false positive rate of out-of-distribution detection, and so on. We will include many explanatory illustrations, examples, and code samples in Python, with PyTorch syntax. The goal is to provide the reader a working understanding of distribution-free UQ, allowing them to put confidence intervals on their algorithms, with one self-contained document.
We propose a framework for learning calibrated uncertainties under domain shifts. We consider the case where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts through the use of a binary domain classifier and integrate it with the task network and train them jointly end-to-end. The binary domain classifier yields a density ratio that reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift through adversarial risk minimization. We demonstrate that our method generates calibrated uncertainties that benefit many downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). In these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also demonstrate that the estimated density ratios show agreement with the human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.
A Gibbs distribution based combinatorial optimization algorithm for joint antenna splitting and user scheduling problem in full duplex massive multiple-input multiple-output (MIMO) system is proposed in this paper. The optimal solution of this problem can be determined by exhaustive search. However, the complexity of this approach becomes prohibitive in practice when the sample space is large, which is usually the case in massive MIMO systems. Our algorithm overcomes this drawback by converting the original problem into a Kullback-Leibler (KL) divergence minimization problem, and solving it through a related dynamical system via a stochastic gradient descent method. Using this approach, we improve the spectral efficiency (SE) of the system by performing joint antenna splitting and user scheduling. Additionally, numerical results show that the SE curves obtained with our proposed algorithm overlap with the curves achieved by exhaustive search for user scheduling.
In this paper, with the aid of the mathematical tool of stochastic geometry, we introduce analytical and computational frameworks for the distribution of three different definitions of delay, i.e., the time that it takes for a user to successfully receive a data packet, in large-scale cellular networks. We also provide an asymptotic analysis of one of the delay distributions, which can be regarded as the packet loss probability of a given network. To mitigate the inherent computational difficulties of the obtained analytical formulations in some cases, we propose efficient numerical approximations based on the numerical inversion method, the Riemann sum, and the Beta distribution. Finally, we demonstrate the accuracy of the obtained analytical formulations and the corresponding approximations against Monte Carlo simulation results, and unveil insights on the delay performance with respect to several design parameters, such as the decoding threshold, the transmit power, and the deployment density of the base stations. The proposed methods can facilitate the analysis and optimization of cellular networks subject to reliability constraints on the network packet delay that are not restricted to the local (average) delay, e.g., in the context of delay sensitive applications.
In Bayesian statistics, exploring multimodal posterior distribution poses major challenges for existing techniques such as Markov Chain Monte Carlo (MCMC). These problems are exacerbated in high-dimensional settings where MCMC methods typically rely upon localised proposal mechanisms. This paper introduces the Annealed Leap-Point Sampler (ALPS), which augments the target distribution state space with modified annealed (cooled) target distributions, in contrast to traditional approaches which have employed tempering. The temperature of the coldest state is chosen such that its corresponding annealed target density can be sufficiently well-approximated by a Laplace approximation. As a result, a Gaussian mixture independence Metropolis-Hastings sampler can perform mode-jumping proposals even in high-dimensional problems. The ability of this procedure to "mode hop" at this super-cold state is then filtered through to the target state using a sequence of tempered targets in a similar way to that in parallel tempering methods. ALPS also incorporates the best aspects of current gold-standard approaches to multimodal sampling in high-dimensional contexts. A theoretical analysis of the ALPS approach in high dimensions is given, providing practitioners with a gauge on the optimal setup as well as the scalability of the algorithm. For a $d$-dimensional problem the it is shown that the coldest inverse temperature level required for the ALPS only needs to be linear in the dimension, $\mathcal{O}(d)$, and this means that for a collection of multimodal problems the algorithmic cost is polynomial, $\mathcal{O}\left(d^{3}\right)$. ALPS is illustrated on a complex multimodal posterior distribution that arises from a seemingly-unrelated regression (SUR) model of longitudinal data from U.S. manufacturing firms.
A fundamental problem in numerical analysis and approximation theory is approximating smooth functions by polynomials. A much harder version under recent consideration is to enforce bounds constraints on the approximating polynomial. In this paper, we consider the problem of approximating functions by polynomials whose Bernstein coefficients with respect to a given degree satisfy such bounds, which implies such bounds on the approximant. We frame the problem as an inequality-constrained optimization problem and give an algorithm for finding the Bernstein coefficients of the exact solution. Additionally, our method can be modified slightly to include equality constraints such as mass preservation. It also extends naturally to multivariate polynomials over a simplex.
The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often refereed to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of hitherto attempts at handling uncertainty in general and formalizing this distinction in particular.
We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user's requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale Chinese dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance.\footnote{The data is publicly available at \url{//ai.tencent.com/ailab/nlp/dataset/}.}