In the realm of statistical exploration, the manipulation of pseudo-random values to discern their impact on data distribution presents a compelling avenue of inquiry. This article investigates the question: Is it possible to add pseudo-random values without compelling a shift towards a normal distribution?. Employing Python techniques, the study explores the nuances of pseudo-random value addition within the context of additions, aiming to unravel the interplay between randomness and resulting statistical characteristics. The Materials and Methods chapter details the construction of datasets comprising up to 300 billion pseudo-random values, employing three distinct layers of manipulation. The Results chapter visually and quantitatively explores the generated datasets, emphasizing distribution and standard deviation metrics. The study concludes with reflections on the implications of pseudo-random value manipulation and suggests avenues for future research. In the layered exploration, the first layer introduces subtle normalization with increasing summations, while the second layer enhances normality. The third layer disrupts typical distribution patterns, leaning towards randomness despite pseudo-random value summation. Standard deviation patterns across layers further illuminate the dynamic interplay of pseudo-random operations on statistical characteristics. While not aiming to disrupt academic norms, this work modestly contributes insights into data distribution complexities. Future studies are encouraged to delve deeper into the implications of data manipulation on statistical outcomes, extending the understanding of pseudo-random operations in diverse contexts.
We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial "deflation" step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the "deflated function class" in terms of a generalization of Talagrand's $\gamma$ functional, and the deviation of the function instance, both of which are formulated based on the natural seminorm induced by the corresponding Cram\'{e}r functions. We also provide certain approximations for the mentioned seminorm when the function class lies in a given (exponential type) Orlicz space, that can be used to make the complexity term and the deviation term more explicit.
In many communication contexts, the capabilities of the involved actors cannot be known beforehand, whether it is a cell, a plant, an insect, or even a life form unknown to Earth. Regardless of the recipient, the message space and time scale could be too fast, too slow, too large, or too small and may never be decoded. Therefore, it pays to devise a way to encode messages agnostic of space and time scales. We propose the use of fractal functions as self-executable infinite-frequency carriers for sending messages, given their properties of structural self-similarity and scale invariance. We call it `fractal messaging'. Starting from a spatial embedding, we introduce a framework for a space-time scale-free messaging approach to this challenge. When considering a space and time-agnostic framework for message transmission, it would be interesting to encode a message such that it could be decoded at several spatio-temporal scales. Hence, the core idea of the framework proposed herein is to encode a binary message as waves along infinitely many frequencies (in power-like distributions) and amplitudes, transmit such a message, and then decode and reproduce it. To do so, the components of the Weierstrass function, a known fractal, are used as carriers of the message. Each component will have its amplitude modulated to embed the binary stream, allowing for a space-time-agnostic approach to messaging.
The subject of this work is an adaptive stochastic Galerkin finite element method for parametric or random elliptic partial differential equations, which generates sparse product polynomial expansions with respect to the parametric variables of solutions. For the corresponding spatial approximations, an independently refined finite element mesh is used for each polynomial coefficient. The method relies on multilevel expansions of input random fields and achieves error reduction with uniform rate. In particular, the saturation property for the refinement process is ensured by the algorithm. The results are illustrated by numerical experiments, including cases with random fields of low regularity.
Although there are obstacles related to obtaining data, ensuring model precision, and upholding ethical standards, the advantages of utilizing machine learning to generate predictive models for unemployment rates in developing nations amid the implementation of Industry 4.0 (I4.0) are noteworthy. This research delves into the concept of utilizing machine learning techniques through a predictive conceptual model to understand and address factors that contribute to unemployment rates in developing nations during the implementation of I4.0. A thorough examination of the literature was carried out through a literature review to determine the economic and social factors that have an impact on the unemployment rates in developing nations. The examination of the literature uncovered that considerable influence on unemployment rates in developing nations is attributed to elements such as economic growth, inflation, population increase, education levels, and technological progress. A predictive conceptual model was developed that indicates factors that contribute to unemployment in developing nations can be addressed by using techniques of machine learning like regression analysis and neural networks when adopting I4.0. The study's findings demonstrated the effectiveness of the proposed predictive conceptual model in accurately understanding and addressing unemployment rate factors within developing nations when deploying I4.0. The model serves a dual purpose of predicting future unemployment rates and tracking the advancement of reducing unemployment rates in emerging economies. By persistently conducting research and improvements, decision-makers and enterprises can employ these patterns to arrive at more knowledgeable judgments that can advance the growth of the economy, generation of employment, and alleviation of poverty specifically in emerging nations.
We consider a geometric programming problem consisting in minimizing a function given by the supremum of finitely many log-Laplace transforms of discrete nonnegative measures on a Euclidean space. Under a coerciveness assumption, we show that a $\varepsilon$-minimizer can be computed in a time that is polynomial in the input size and in $|\log\varepsilon|$. This is obtained by establishing bit-size estimates on approximate minimizers and by applying the ellipsoid method. We also derive polynomial iteration complexity bounds for the interior point method applied to the same class of problems. We deduce that the spectral radius of a partially symmetric, weakly irreducible nonnegative tensor can be approximated within $\varepsilon$ error in poly-time. For strongly irreducible tensors, we also show that the logarithm of the positive eigenvector is poly-time computable. Our results also yield that the the maximum of a nonnegative homogeneous $d$-form in the unit ball with respect to $d$-H\"older norm can be approximated in poly-time. In particular, the spectral radius of uniform weighted hypergraphs and some known upper bounds for the clique number of uniform hypergraphs are poly-time computable.
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
Hidden Markov models (HMMs) are probabilistic methods in which observations are seen as realizations of a latent Markov process with discrete states that switch over time. Moving beyond standard statistical tests, HMMs offer a statistical environment to optimally exploit the information present in multivariate time series, uncovering the latent dynamics that rule them. Here, we extend the Poisson HMM to the multilevel framework, accommodating variability between individuals with continuously distributed individual random effects following a lognormal distribution, and describe how to estimate the model in a fully parametric Bayesian framework. The proposed multilevel HMM enables probabilistic decoding of hidden state sequences from multivariate count time-series based on individual-specific parameters, and offers a framework to quantificate between-individual variability formally. Through a Monte Carlo study we show that the multilevel HMM outperforms the HMM for scenarios involving heterogeneity between individuals, demonstrating improved decoding accuracy and estimation performance of parameters of the emission distribution, and performs equally well when not between heterogeneity is present. Finally, we illustrate how to use our model to explore the latent dynamics governing complex multivariate count data in an empirical application concerning pilot whale diving behaviour in the wild, and how to identify neural states from multi-electrode recordings of motor neural cortex activity in a macaque monkey in an experimental set up. We make the multilevel HMM introduced in this study publicly available in the R-package mHMMbayes in CRAN.
We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.
The use of discretized variables in the development of prediction models is a common practice, in part because the decision-making process is more natural when it is based on rules created from segmented models. Although this practice is perhaps more common in medicine, it is extensible to any area of knowledge where a predictive model helps in decision-making. Therefore, providing researchers with a useful and valid categorization method could be a relevant issue when developing prediction models. In this paper, we propose a new general methodology that can be applied to categorize a predictor variable in any regression model where the response variable belongs to the exponential family distribution. Furthermore, it can be applied in any multivariate context, allowing to categorize more than one continuous covariate simultaneously. In addition, a computationally very efficient method is proposed to obtain the optimal number of categories, based on a pseudo-BIC proposal. Several simulation studies have been conducted in which the efficiency of the method with respect to both the location and the number of estimated cut-off points is shown. Finally, the categorization proposal has been applied to a real data set of 543 patients with chronic obstructive pulmonary disease from Galdakao Hospital's five outpatient respiratory clinics, who were followed up for 10 years. We applied the proposed methodology to jointly categorize the continuous variables six-minute walking test and forced expiratory volume in one second in a multiple Poisson generalized additive model for the response variable rate of the number of hospital admissions by years of follow-up. The location and number of cut-off points obtained were clinically validated as being in line with the categorizations used in the literature.
Mendelian randomization uses genetic variants as instrumental variables to make causal inferences about the effects of modifiable risk factors on diseases from observational data. One of the major challenges in Mendelian randomization is that many genetic variants are only modestly or even weakly associated with the risk factor of interest, a setting known as many weak instruments. Many existing methods, such as the popular inverse-variance weighted (IVW) method, could be biased when the instrument strength is weak. To address this issue, the debiased IVW (dIVW) estimator, which is shown to be robust to many weak instruments, was recently proposed. However, this estimator still has non-ignorable bias when the effective sample size is small. In this paper, we propose a modified debiased IVW (mdIVW) estimator by multiplying a modification factor to the original dIVW estimator. After this simple correction, we show that the bias of the mdIVW estimator converges to zero at a faster rate than that of the dIVW estimator under some regularity conditions. Moreover, the mdIVW estimator has smaller variance than the dIVW estimator.We further extend the proposed method to account for the presence of instrumental variable selection and balanced horizontal pleiotropy. We demonstrate the improvement of the mdIVW estimator over the dIVW estimator through extensive simulation studies and real data analysis.