The nature of the Fermi gamma-ray Galactic Center Excess (GCE) has remained a persistent mystery for over a decade. Although the excess is broadly compatible with emission expected due to dark matter annihilation, an explanation in terms of a population of unresolved astrophysical point sources e.g., millisecond pulsars, remains viable. The effort to uncover the origin of the GCE is hampered in particular by an incomplete understanding of diffuse emission of Galactic origin. This can lead to spurious features that make it difficult to robustly differentiate smooth emission, as expected for a dark matter origin, from more "clumpy" emission expected for a population of relatively bright, unresolved point sources. We use recent advancements in the field of simulation-based inference, in particular density estimation techniques using normalizing flows, in order to characterize the contribution of modeled components, including unresolved point source populations, to the GCE. Compared to traditional techniques based on the statistical distribution of photon counts, our machine learning-based method is able to utilize more of the information contained in a given model of the Galactic Center emission, and in particular can perform posterior parameter estimation while accounting for pixel-to-pixel spatial correlations in the gamma-ray map. This makes the method demonstrably more resilient to certain forms of model misspecification. On application to Fermi data, the method generically attributes a smaller fraction of the GCE flux to unresolved point sources when compared to traditional approaches. We nevertheless infer such a contribution to make up a non-negligible fraction of the GCE across all analysis variations considered, with at least $38^{+9}_{-19}\%$ of the excess attributed to unresolved points sources in our baseline analysis.
Modeling fuzziness and imprecision in human rating data is a crucial problem in many research areas, including applied statistics, behavioral, social, and health sciences. Because of the interplay between cognitive, affective, and contextual factors, the process of answering survey questions is a complex task, which can barely be captured by standard (crisp) rating responses. Fuzzy rating scales have progressively been adopted to overcome some of the limitations of standard rating scales, including their inability to disentangle decision uncertainty from individual responses. The aim of this article is to provide a novel fuzzy scaling procedure which uses Item Response Theory trees (IRTrees) as a psychometric model for the stage-wise latent response process. In so doing, fuzziness of rating data is modeled using the overall rater's pattern of responses instead of being computed using a single-item based approach. This offers a consistent system for interpreting fuzziness in terms of individual-based decision uncertainty. A simulation study and two empirical applications are adopted to assess the characteristics of the proposed model and provide converging results about its effectiveness in modeling fuzziness and imprecision in rating data.
Monocular 3D object detection is an important task in autonomous driving. It can be easily intractable where there exists ego-car pose change w.r.t. ground plane. This is common due to the slight fluctuation of road smoothness and slope. Due to the lack of insight in industrial application, existing methods on open datasets neglect the camera pose information, which inevitably results in the detector being susceptible to camera extrinsic parameters. The perturbation of objects is very popular in most autonomous driving cases for industrial products. To this end, we propose a novel method to capture camera pose to formulate the detector free from extrinsic perturbation. Specifically, the proposed framework predicts camera extrinsic parameters by detecting vanishing point and horizon change. A converter is designed to rectify perturbative features in the latent space. By doing so, our 3D detector works independent of the extrinsic parameter variations and produces accurate results in realistic cases, e.g., potholed and uneven roads, where almost all existing monocular detectors fail to handle. Experiments demonstrate our method yields the best performance compared with the other state-of-the-arts by a large margin on both KITTI 3D and nuScenes datasets.
In decision modelling with time to event data, parametric models are often used to extrapolate the survivor function. One such model is the piecewise exponential model whereby the hazard function is partitioned into segments, with the hazard constant within the segment and independent between segments and the boundaries of these segments are known as change-points. We present an approach for determining the location and number of change-points in piecewise exponential models. Inference is performed in a Bayesian framework using Markov Chain Monte Carlo (MCMC) where the model parameters can be integrated out of the model and the number of change-points can be sampled as part of the MCMC scheme. We can estimate both the uncertainty in the change-point locations and hazards for a given change-point model and obtain a probabilistic interpretation for the number of change-points. We evaluate model performance to determine changepoint numbers and locations in a simulation study and show the utility of the method using two data sets for time to event data. In a dataset of Glioblastoma patients we use the piecewise exponential model to describe the general trends in the hazard function. In a data set of heart transplant patients, we show the piecewise exponential model produces the best statistical fit and extrapolation amongst other standard parametric models. Piecewise exponential models may be useful for survival extrapolation if a long-term constant hazard trend is clinically plausible. A key advantage of this method is that the number and change-point locations are automatically estimated rather than specified by the analyst.
Physical activity (PA) is an important risk factor for many health outcomes. Wearable-devices such as accelerometers are increasingly used in biomedical studies to understand the associations between PA and health outcomes. Statistical analyses involving accelerometer data are challenging due to the following three characteristics: (i) high-dimensionality, (ii) temporal dependence, and (iii) measurement error. To address these challenges we treat accelerometer-based measures of physical activity as a single function-valued covariate prone to measurement error. Specifically, in order to determine the relationship between PA and a health outcome of interest, we propose a regression model with a functional covariate that accounts for measurement error. Using regression calibration, we develop a two-step estimation method for the model parameters and establish their consistency. A test is also proposed to test the significance of the estimated model parameters. Simulation studies are conducted to compare the proposed methods with existing alternative approaches under varying scenarios. Finally, the developed methods are used to assess the relationship between PA intensity and BMI obtained from the National Health and Nutrition Examination Survey data.
We consider a situation where the distribution of a random variable is being estimated by the empirical distribution of noisy measurements of that variable. This is common practice in, for example, teacher value-added models and other fixed-effect models for panel data. We use an asymptotic embedding where the noise shrinks with the sample size to calculate the leading bias in the empirical distribution arising from the presence of noise. The leading bias in the empirical quantile function is equally obtained. These calculations are new in the literature, where only results on smooth functionals such as the mean and variance have been derived. We provide both analytical and jackknife corrections that recenter the limit distribution and yield confidence intervals with correct coverage in large samples. Our approach can be connected to corrections for selection bias and shrinkage estimation and is to be contrasted with deconvolution. Simulation results confirm the much-improved sampling behavior of the corrected estimators. An empirical illustration on heterogeneity in deviations from the law of one price is equally provided.
Physically-inspired latent force models offer an interpretable alternative to purely data driven tools for inference in dynamical systems. They carry the structure of differential equations and the flexibility of Gaussian processes, yielding interpretable parameters and dynamics-imposed latent functions. However, the existing inference techniques associated with these models rely on the exact computation of posterior kernel terms which are seldom available in analytical form. Most applications relevant to practitioners, such as Hill equations or diffusion equations, are hence intractable. In this paper, we overcome these computational problems by proposing a variational solution to a general class of non-linear and parabolic partial differential equation latent force models. Further, we show that a neural operator approach can scale our model to thousands of instances, enabling fast, distributed computation. We demonstrate the efficacy and flexibility of our framework by achieving competitive performance on several tasks where the kernels are of varying degrees of tractability.
The multi-user Holographic Multiple-Input and Multiple-Output Surface (MU-HMIMOS) paradigm, which is capable of realizing large continuous apertures with minimal power consumption, has been recently considered as an energyefficient solution for future wireless networks, offering the increased flexibility in impacting electromagnetic wave propagation according to the desired communication, localization, and sensing objectives. The tractable channel modeling of MU-HMIMOS systems is one of the most critical challenges, mainly due to the coupling effect induced by the excessively large number of closely spaced patch antennas. In this paper, we focus on this challenge for downlink multi-user communications and model the electromagnetic channel in the wavenumber domain using the Fourier plane wave representation. Based on the proposed channel model, we devise the maximum-ratio transmission and Zero-Forcing (ZF) precoding schemes capitalizing on the sampled channel variance that depends on the number and spacing of the patch antennas in MU-HMIMOS, and present their analytical spectral efficiency performance. Moreover, we propose a low computational ZF precoding scheme leveraging Neumann series expansion to replace the matrix inversion, since it is practically impossible to perform direct matrix inversion when the number of patch antennas is extremely large. Our extensive simulation results showcase the impact of the number of patch antennas and their spacing on the spectral efficiency of the considered systems. It is shown that the more patch antennas and larger spacing results in improved performance due to the decreased correlation among the patches.
Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. Since it is known that ball indicators are hard to approximate with respect to a certain heavy-tailed distribution when using depth 2 networks with a single layer of non-linearities (Safran and Shamir, 2017), this establishes what is to the best of our knowledge, the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice. Our proof technique relies on a random features approach which reduces the problem to learning with a single neuron, where new tools are required to show the convergence of gradient descent when the distribution of the data is heavy-tailed.
The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork. We propose a subnetwork selection strategy that aims to maximally preserve the model's predictive uncertainty. Empirically, our approach is effective compared to ensembles and less expressive posterior approximations over full networks.
Recently, neural machine translation (NMT) has emerged as a powerful alternative to conventional statistical approaches. However, its performance drops considerably in the presence of morphologically rich languages (MRLs). Neural engines usually fail to tackle the large vocabulary and high out-of-vocabulary (OOV) word rate of MRLs. Therefore, it is not suitable to exploit existing word-based models to translate this set of languages. In this paper, we propose an extension to the state-of-the-art model of Chung et al. (2016), which works at the character level and boosts the decoder with target-side morphological information. In our architecture, an additional morphology table is plugged into the model. Each time the decoder samples from a target vocabulary, the table sends auxiliary signals from the most relevant affixes in order to enrich the decoder's current state and constrain it to provide better predictions. We evaluated our model to translate English into German, Russian, and Turkish as three MRLs and observed significant improvements.