We consider a cellular network, where the uplink transmissions to a base station (BS) are interferenced by other devices, a condition that may occur, e.g., in cell-free networks or when using non-orthogonal multiple access (NOMA) techniques. Assuming that the BS treats this interference as additional noise, we focus on the problem of estimating the interference correlation matrix from received signal samples. We consider a BS equipped with multiple antennas and operating in the millimeter-wave (mmWave) bands and propose techniques exploiting the fact that channels comprise only a few reflections at these frequencies. This yields a specific structure of the interference correlation matrix that can be decomposed into three matrices, two rectangular depending on the angle of arrival (AoA) of the interference and the third square with smaller dimensions. We resort to gridless approaches to estimate the AoAs and then project the least square estimate of the interference correlation matrix into a subspace with a smaller dimension, thus reducing the estimation error. Moreover, we derive two simplified estimators, still based on the gridless angle estimation that turns out to be convenient when estimating the interference over a larger number of samples.
In this work we propose an extension of physics informed supervised learning strategies to parametric partial differential equations. Indeed, even if the latter are indisputably useful in many applications, they can be computationally expensive most of all in a real-time and many-query setting. Thus, our main goal is to provide a physics informed learning paradigm to simulate parametrized phenomena in a small amount of time. The physics information will be exploited in many ways, in the loss function (standard physics informed neural networks), as an augmented input (extra feature employment) and as a guideline to build an effective structure for the neural network (physics informed architecture). These three aspects, combined together, will lead to a faster training phase and to a more accurate parametric prediction. The methodology has been tested for several equations and also in an optimal control framework.
We propose attention-based modeling of quantities at arbitrary spatial points conditioned on related measurements at different locations. Our approach adapts a transformer-encoder to process measurements and read-out positions together. Attention-based models exhibit excellent performance across domains, which makes them an interesting candidate for modeling data irregularly sampled in space. We introduce a novel encoding strategy that applies the same transformation to the measurements and read-out positions, after which they are combined with encoded measurement values instead of relying on two different mappings. Efficiently learning input-output mappings from irregularly-spaced data is a fundamental challenge in modeling physical phenomena. To evaluate the effectiveness of our model, we conduct experiments on diverse problem domains, including high-altitude wind nowcasting, two-days weather forecasting, fluid dynamics, and heat diffusion. Our attention-based model consistently outperforms state-of-the-art models, such as Graph Element Networks and Conditional Neural Processes, for modeling irregularly sampled data. Notably, our model reduces root mean square error (RMSE) for wind nowcasting, improving from 9.24 to 7.98 and for a heat diffusion task from .126 to .084. We hypothesize that this superior performance can be attributed to the enhanced flexibility of our latent representation and the improved data encoding technique. To support our hypothesis, we design a synthetic experiment that reveals excessive bottlenecking in the latent representations of alternative models, which hinders information utilization and impedes training.
In this work we propose tailored model order reduction for varying boundary optimal control problems governed by parametric partial differential equations. With varying boundary control, we mean that a specific parameter changes where the boundary control acts on the system. This peculiar formulation might benefit from model order reduction. Indeed, fast and reliable simulations of this model can be of utmost usefulness in many applied fields, such as geophysics and energy engineering. However, varying boundary control features very complicated and diversified parametric behaviour for the state and adjoint variables. The state solution, for example, changing the boundary control parameter, might feature transport phenomena. Moreover, the problem loses its affine structure. It is well known that classical model order reduction techniques fail in this setting, both in accuracy and in efficiency. Thus, we propose reduced approaches inspired by the ones used when dealing with wave-like phenomena. Indeed, we compare standard proper orthogonal decomposition with two tailored strategies: geometric recasting and local proper orthogonal decomposition. Geometric recasting solves the optimization system in a reference domain simplifying the problem at hand avoiding hyper-reduction, while local proper orthogonal decomposition builds local bases to increase the accuracy of the reduced solution in very general settings (where geometric recasting is unfeasible). We compare the various approaches on two different numerical experiments based on geometries of increasing complexity.
In this paper, we study a RAN resource-slicing problem for energy-efficient communication in an orthogonal frequency division multiple access (OFDMA) based millimeter-wave (mmWave) downlink (DL) network consisting of enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services. Specifically, assuming a fixed set of predefined beams, we address an energy efficiency (EE) maximization problem to obtain the optimal beam selection, Resource Block (RB), and transmit power allocation policy to serve URLLC and eMBB users on the same physical radio resources. The problem is formulated as a mixed-integer non-linear fractional programming (MINLFP) problem considering minimum data rate and latency in packet delivery constraints. By leveraging the properties of fractional programming theory, we first transform the formulated non-convex optimization problem in fractional form into a tractable subtractive form. Subsequently, we solve the transformed problem using a two-loop iterative algorithm. The main resource-slicing problem is solved in the inner loop utilizing the difference of convex (DC) programming and successive convex approximation (SCA) techniques. Subsequently, the outer loop is solved using the Dinkelbach method to acquire an improved solution in every iteration until it converges. Our simulation results illustrate the performance gains of the proposed methodology with respect to baseline algorithms with the fixed and mixed resource grid models.
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column indices to be consecutive. In the detection problem, under the null hypothesis, the observed matrix is a realization of independent and identically distributed standard normal entries. Under the alternative, there exists a set of hidden submatrices with elevated means inside the same standard normal matrix. Recovery refers to the task of locating the hidden submatrices. For both problems, and for both models, we characterize the statistical and computational barriers by deriving information-theoretic lower bounds, designing and analyzing algorithms matching those bounds, and proving computational lower bounds based on the low-degree polynomials conjecture. In particular, we show that the space of the model parameters (i.e., number of planted submatrices, their dimensions, and elevated mean) can be partitioned into three regions: the impossible regime, where all algorithms fail; the hard regime, where while detection or recovery are statistically possible, we give some evidence that polynomial-time algorithm do not exist; and finally the easy regime, where polynomial-time algorithms exist.
Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we consider a MNAR model that generalizes several prior popular MNAR models in two ways: first, it is less restrictive in terms of statistical independence assumptions imposed on the underlying joint data distribution, and second, it allows for all variables in the observed sample to have missing values. This MNAR model corresponds to a so-called criss-cross structure considered in the literature on graphical models of missing data that prevents nonparametric identification of the entire missing data model. Nonetheless, part of the complete-data distribution remains nonparametrically identifiable. By exploiting this fact and considering a rich class of exponential family distributions, we establish sufficient conditions for identification of the complete-data distribution as well as the entire missingness mechanism. We then propose methods for testing the independence restrictions encoded in such models using odds ratio as our parameter of interest. We adopt two semiparametric approaches for estimating the odds ratio parameter and establish the corresponding asymptotic theories: one involves maximizing a conditional likelihood with order statistics and the other uses estimating equations. The utility of our methods is illustrated via simulation studies.
In this paper, we study the estimation of the derivative of a regression function in a standard univariate regression model. The estimators are defined either by derivating nonparametric least-squares estimators of the regression function or by estimating the projection of the derivative. We prove two simple risk bounds allowing to compare our estimators. More elaborate bounds under a stability assumption are then provided. Bases and spaces on which we can illustrate our assumptions and first results are both of compact or non compact type, and we discuss the rates reached by our estimators. They turn out to be optimal in the compact case. Lastly, we propose a model selection procedure and prove the associated risk bound. To consider bases with a non compact support makes the problem difficult.
The Tucker tensor decomposition is a natural extension of the singular value decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor decomposition algorithms by using randomization and parallelization. We present two algorithms that scale to large data and many processors, significantly reduce both computation and communication cost compared to previous deterministic and randomized approaches, and obtain nearly the same approximation errors. The key idea in our algorithms is to perform randomized sketches with Kronecker-structured random matrices, which reduces computation compared to unstructured matrices and can be implemented using a fundamental tensor computational kernel. We provide probabilistic error analysis of our algorithms and implement a new parallel algorithm for the structured randomized sketch. Our experimental results demonstrate that our combination of randomization and parallelization achieves accurate Tucker decompositions much faster than alternative approaches. We observe up to a 16X speedup over the fastest deterministic parallel implementation on 3D simulation data.
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). Specifically, we formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We curate a large-scale dataset of more than 400K samples, on which we evaluate seventeen existing LLMs. Through our experiments, we identify a key shortcoming of LLMs in terms of their causal inference skills, and show that these models achieve almost close to random performance on the task. This shortcoming is somewhat mitigated when we try to re-purpose LLMs for this skill via finetuning, but we find that these models still fail to generalize -- they can only perform causal inference in in-distribution settings when variable names and textual expressions used in the queries are similar to those in the training set, but fail in out-of-distribution settings generated by perturbing these queries. Corr2Cause is a challenging task for LLMs, and would be helpful in guiding future research on improving LLMs' pure reasoning skills and generalizability. Our data is at //huggingface.co/datasets/causalnlp/corr2cause. Our code is at //github.com/causalNLP/corr2cause.
The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes a structural causal model is available. However, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without a given causal model or even directly estimating conditional distributions. We re-cast counterfactual reasoning as an extended quantile regression problem using neural networks. The approach is statistically more efficient than existing ones, and further makes it possible to develop the generalization ability of the estimated counterfactual outcome to unseen data and provide an upper bound on the generalization error. Experiment results on multiple datasets strongly support our theoretical claims.