We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial "deflation" step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the "deflated function class" in terms of a generalization of Talagrand's $\gamma$ functional, and the deviation of the function instance, both of which are formulated based on the natural seminorm induced by the corresponding Cram\'{e}r functions. We also provide certain approximations for the mentioned seminorm when the function class lies in a given (exponential type) Orlicz space, that can be used to make the complexity term and the deviation term more explicit.
We present a novel framework for the development of fourth-order lattice Boltzmann schemes to tackle multidimensional nonlinear systems of conservation laws. Our numerical schemes preserve two fundamental characteristics inherent in classical lattice Boltzmann methods: a local relaxation phase and a transport phase composed of elementary shifts on a Cartesian grid. Achieving fourth-order accuracy is accomplished through the composition of second-order time-symmetric basic schemes utilizing rational weights. This enables the representation of the transport phase in terms of elementary shifts. Introducing local variations in the relaxation parameter during each stage of relaxation ensures the entropic nature of the schemes. This not only enhances stability in the long-time limit but also maintains fourth-order accuracy. To validate our approach, we conduct comprehensive testing on scalar equations and systems in both one and two spatial dimensions.
Existing statistical methods for the analysis of micro-randomized trials (MRTs) are designed to estimate causal excursion effects using data from a single MRT. In practice, however, researchers can often find previous MRTs that employ similar interventions. In this paper, we develop data integration methods that capitalize on this additional information, leading to statistical efficiency gains. To further increase efficiency, we demonstrate how to combine these approaches according to a generalization of multivariate precision weighting that allows for correlation between estimates, and we show that the resulting meta-estimator possesses an asymptotic optimality property. We illustrate our methods in simulation and in a case study involving two MRTs in the area of smoking cessation.
The subject of this work is an adaptive stochastic Galerkin finite element method for parametric or random elliptic partial differential equations, which generates sparse product polynomial expansions with respect to the parametric variables of solutions. For the corresponding spatial approximations, an independently refined finite element mesh is used for each polynomial coefficient. The method relies on multilevel expansions of input random fields and achieves error reduction with uniform rate. In particular, the saturation property for the refinement process is ensured by the algorithm. The results are illustrated by numerical experiments, including cases with random fields of low regularity.
We propose a material design method via gradient-based optimization on compositions, overcoming the limitations of traditional methods: exhaustive database searches and conditional generation models. It optimizes inputs via backpropagation, aligning the model's output closely with the target property and facilitating the discovery of unlisted materials and precise property determination. Our method is also capable of adaptive optimization under new conditions without retraining. Applying to exploring high-Tc superconductors, we identified potential compositions beyond existing databases and discovered new hydrogen superconductors via conditional optimization. This method is versatile and significantly advances material design by enabling efficient, extensive searches and adaptability to new constraints.
We consider a geometric programming problem consisting in minimizing a function given by the supremum of finitely many log-Laplace transforms of discrete nonnegative measures on a Euclidean space. Under a coerciveness assumption, we show that a $\varepsilon$-minimizer can be computed in a time that is polynomial in the input size and in $|\log\varepsilon|$. This is obtained by establishing bit-size estimates on approximate minimizers and by applying the ellipsoid method. We also derive polynomial iteration complexity bounds for the interior point method applied to the same class of problems. We deduce that the spectral radius of a partially symmetric, weakly irreducible nonnegative tensor can be approximated within $\varepsilon$ error in poly-time. For strongly irreducible tensors, we also show that the logarithm of the positive eigenvector is poly-time computable. Our results also yield that the the maximum of a nonnegative homogeneous $d$-form in the unit ball with respect to $d$-H\"older norm can be approximated in poly-time. In particular, the spectral radius of uniform weighted hypergraphs and some known upper bounds for the clique number of uniform hypergraphs are poly-time computable.
Global variance-based reliability sensitivity indices arise from a variance decomposition of the indicator function describing the failure event. The first-order indices reflect the main effect of each variable on the variance of the failure event and can be used for variable prioritization; the total-effect indices represent the total effect of each variable, including its interaction with other variables, and can be used for variable fixing. This contribution derives expressions for the variance-based reliability indices of systems with multiple failure modes that are based on the first-order reliability method (FORM). The derived expressions are a function of the FORM results and, hence, do not require additional expensive model evaluations. They do involve the evaluation of multinormal integrals, for which effective solutions are available. We demonstrate that the derived expressions enable an accurate estimation of variance-based reliability sensitivities for general system problems to which FORM is applicable.
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
The direct parametrisation method for invariant manifold is a model-order reduction technique that can be applied to nonlinear systems described by PDEs and discretised e.g. with a finite element procedure in order to derive efficient reduced-order models (ROMs). In nonlinear vibrations, it has already been applied to autonomous and non-autonomous problems to propose ROMs that can compute backbone and frequency-response curves of structures with geometric nonlinearity. While previous developments used a first-order expansion to cope with the non-autonomous term, this assumption is here relaxed by proposing a different treatment. The key idea is to enlarge the dimension of the parametrising coordinates with additional entries related to the forcing. A new algorithm is derived with this starting assumption and, as a key consequence, the resonance relationships appearing through the homological equations involve multiple occurrences of the forcing frequency, showing that with this new development, ROMs for systems exhibiting a superharmonic resonance, can be derived. The method is implemented and validated on academic test cases involving beams and arches. It is numerically demonstrated that the method generates efficient ROMs for problems involving 3:1 and 2:1 superharmonic resonances, as well as converged results for systems where the first-order truncation on the non-autonomous term showed a clear limitation.
The use of discretized variables in the development of prediction models is a common practice, in part because the decision-making process is more natural when it is based on rules created from segmented models. Although this practice is perhaps more common in medicine, it is extensible to any area of knowledge where a predictive model helps in decision-making. Therefore, providing researchers with a useful and valid categorization method could be a relevant issue when developing prediction models. In this paper, we propose a new general methodology that can be applied to categorize a predictor variable in any regression model where the response variable belongs to the exponential family distribution. Furthermore, it can be applied in any multivariate context, allowing to categorize more than one continuous covariate simultaneously. In addition, a computationally very efficient method is proposed to obtain the optimal number of categories, based on a pseudo-BIC proposal. Several simulation studies have been conducted in which the efficiency of the method with respect to both the location and the number of estimated cut-off points is shown. Finally, the categorization proposal has been applied to a real data set of 543 patients with chronic obstructive pulmonary disease from Galdakao Hospital's five outpatient respiratory clinics, who were followed up for 10 years. We applied the proposed methodology to jointly categorize the continuous variables six-minute walking test and forced expiratory volume in one second in a multiple Poisson generalized additive model for the response variable rate of the number of hospital admissions by years of follow-up. The location and number of cut-off points obtained were clinically validated as being in line with the categorizations used in the literature.
Mendelian randomization uses genetic variants as instrumental variables to make causal inferences about the effects of modifiable risk factors on diseases from observational data. One of the major challenges in Mendelian randomization is that many genetic variants are only modestly or even weakly associated with the risk factor of interest, a setting known as many weak instruments. Many existing methods, such as the popular inverse-variance weighted (IVW) method, could be biased when the instrument strength is weak. To address this issue, the debiased IVW (dIVW) estimator, which is shown to be robust to many weak instruments, was recently proposed. However, this estimator still has non-ignorable bias when the effective sample size is small. In this paper, we propose a modified debiased IVW (mdIVW) estimator by multiplying a modification factor to the original dIVW estimator. After this simple correction, we show that the bias of the mdIVW estimator converges to zero at a faster rate than that of the dIVW estimator under some regularity conditions. Moreover, the mdIVW estimator has smaller variance than the dIVW estimator.We further extend the proposed method to account for the presence of instrumental variable selection and balanced horizontal pleiotropy. We demonstrate the improvement of the mdIVW estimator over the dIVW estimator through extensive simulation studies and real data analysis.