As an established bandwidth-efficient coded modulation technique, bit-interleaved coded modulation (BICM) can achieve very desirable error performance with relatively low implementation complexity for a large number of communication and storage systems. It attracted considerable attention from the research community in the past three decades. The BICM is able to approach Shannon capacity limits over various channels with the use of powerful forward-error-correction (FEC) codes, bit mappers (i.e., interleavers), and high-order modulations. Based on the natural serially-concatenated structure of BICM, iterative demapping and decoding (ID) can be adopted to boost the system performance. Due to the tremendous error-correction capability and simple structures, protograph low-density parity-check (PLDPC) codes and their spatially-coupled (SC) variants have emerged to be a pragmatic and promising FEC solution for BICM systems, and found widespread applications such as deep-space communication, satellite communication, wireless communication, optical communication, and flash-memory-based data storage in recent years. This article offers a comprehensive survey on the state-of-the-art development of PLDPC-coded BICM and its innovative SC variants over a variety of channel models, e.g., additive white Gaussian noise (AWGN) channels, fading channels, Poisson pulse position modulation (PPM) channels, and NAND flash-memory channels. Of particular interest is code construction, constellation shaping, as well as bit-mapper design, where the receiver is formulated as a serially-concatenated decoding framework consisting of a soft-decision demapper and a belief-propagation decoder. In addition, several promising research directions are discussed, which have not been adequately addressed in the current literature.
The accurate estimation of Channel State Information (CSI) is of crucial importance for the successful operation of Multiple-Input Multiple-Output (MIMO) communication systems, especially in a Multi-User (MU) time-varying environment and when employing the emerging technology of Reconfigurable Intelligent Surfaces (RISs). Their predominantly passive nature renders the estimation of the channels involved in the user-RIS-base station link a quite challenging problem. Moreover, the time-varying nature of most of the realistic wireless channels drives up the cost of real-time channel tracking significantly, especially when RISs of massive size are deployed. In this paper, we develop a channel tracking scheme for the uplink of RIS-enabled MU MIMO systems in the presence of channel fading. The starting point is a tensor representation of the received signal and we rely on its PARAllel FACtor (PARAFAC) analysis to both get the initial estimate and track the channel time variation. Simulation results for various system settings are reported, which validate the feasibility and effectiveness of the proposed channel tracking approach.
In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
This work proposes a framework for optimizing machine learning algorithms. The practicality of the framework is illustrated using an important case study from the healthcare domain, which is predicting the admission status of emergency department (ED) patients (e.g., admitted vs. discharged) using patient data at the time of triage. The proposed framework can mitigate the crowding problem by proactively planning the patient boarding process. A large retrospective dataset of patient records is obtained from the electronic health record database of all ED visits over three years from three major locations of a healthcare provider in the Midwest of the US. Three machine learning algorithms are proposed: T-XGB, T-ADAB, and T-MLP. T-XGB integrates extreme gradient boosting (XGB) and Tabu Search (TS), T-ADAB integrates Adaboost and TS, and T-MLP integrates multi-layer perceptron (MLP) and TS. The proposed algorithms are compared with the traditional algorithms: XGB, ADAB, and MLP, in which their parameters are tunned using grid search. The three proposed algorithms and the original ones are trained and tested using nine data groups that are obtained from different feature selection methods. In other words, 54 models are developed. Performance was evaluated using five measures: Area under the curve (AUC), sensitivity, specificity, F1, and accuracy. The results show that the newly proposed algorithms resulted in high AUC and outperformed the traditional algorithms. The T-ADAB performs the best among the newly developed algorithms. The AUC, sensitivity, specificity, F1, and accuracy of the best model are 95.4%, 99.3%, 91.4%, 95.2%, 97.2%, respectively.
In order to characterize the fluctuation between the ergodic limit and the time-averaging estimator of a full discretization in a quantitative way, we establish a central limit theorem for the full discretization of the parabolic stochastic partial differential equation. The theorem shows that the normalized time-averaging estimator converges to a normal distribution with the variance being the same as that of the continuous case, where the scale used for the normalization corresponds to the temporal strong convergence order of the considered full discretization. A key ingredient in the proof is to extract an appropriate martingale difference series sum from the normalized time-averaging estimator so that the convergence to the normal distribution of such a sum and the convergence to zero in probability of the remainder are well balanced. The main novelty of our method to balance the convergence lies in proposing an appropriately modified Poisson equation so as to possess the space-independent regularity estimates. As a byproduct, the full discretization is shown to fulfill the weak law of large numbers, namely, the time-averaging estimator converges to the ergodic limit in probability.
We propose GNN-Surrogate, a graph neural network-based surrogate model to explore the parameter space of ocean climate simulations. Parameter space exploration is important for domain scientists to understand the influence of input parameters (e.g., wind stress) on the simulation output (e.g., temperature). The exploration requires scientists to exhaust the complicated parameter space by running a batch of computationally expensive simulations. Our approach improves the efficiency of parameter space exploration with a surrogate model that predicts the simulation outputs accurately and efficiently. Specifically, GNN-Surrogate predicts the output field with given simulation parameters so scientists can explore the simulation parameter space with visualizations from user-specified visual mappings. Moreover, our graph-based techniques are designed for unstructured meshes, making the exploration of simulation outputs on irregular grids efficient. For efficient training, we generate hierarchical graphs and use adaptive resolutions. We give quantitative and qualitative evaluations on the MPAS-Ocean simulation to demonstrate the effectiveness and efficiency of GNN-Surrogate. Source code is publicly available at //github.com/trainsn/GNN-Surrogate.
The application of code clone technology accelerates code search, improves code reuse efficiency, and assists in software quality assessment and code vulnerability detection. However, the application of code clones also introduces software quality issues and increases the cost of software maintenance. As an important research field in software engineering, code clone has been extensively explored and studied by researchers, and related studies on various sub-research fields have emerged, including code clone detection, code clone evolution, code clone analysis, etc. However, there lacks a comprehensive exploration of the entire field of code clone, as well as an analysis of the trend of each sub-research field. This paper collects related work of code clones in the past ten years. In summary, the contributions of this paper mainly include: (1) summarize and classify the sub-research fields of code clone, and explore the relative popularity and relation of these sub-research fields; (2) analyze the overall research trend of code clone and each sub-research field; (3) compare and analyze the difference between academy and industry regarding code clone research; (4) construct a network of researchers, and excavate the major contributors in code clone research field; (5) The list of popular conferences and journals was statistically analyzed. The popular research directions in the future include clone visualization, clone management, etc. For the clone detection technique, researchers can optimize the scalability and execution efficiency of the method, targeting particular clone detection tasks and contextual environments, or apply the technology to other related research fields continuously.
Background: Increasingly, decision-making in healthcare relies on computer models, be it clinical prediction models at point of care or decision-analytic models at the policymaking level. Given the important role models play in both contexts, their structure and implementation be rigorously scrutinized. The ability to interrogate input/output associations without facing barriers can improve quality assurance mechanisms while satisfying privacy/confidentiality concerns and facilitating the integration of models into decision-making. This paper reports on the development of Programmable Interface for Statistical & Simulation Models (PRISM), a cloud-based platform for model accessibility. Methods: PRISM emphasizes two main principles: 1) minimal specifications on the side of model developer to make the model fit for cloud hosting, and 2) making client access completely independent of the resource requirement and software dependencies of the model. The server architecture integrates a RESTful Application Programming Interface (API) infrastructure, JSON for data transfer, a routing layer for access management, container technology for management of computer resources and package dependencies, and the capacity for synchronous or asynchronous model calls. Results: We discuss the architecture, the minimal API standards that enable a universal language for access to such models, the underlying server infrastructure, and the standards used for data transfer. An instance of PRISM is available as a service via the Peer Models Network //peermodelsnetwork.com. Through a series of case studies, we demonstrate how interrogating models becomes possible in standardized fashion, in a way that is irrespective of the specifics of any model. Conclusions: We have developed a publicly accessible platform and minimalist standards that facilitate model accessibility for both clinical and policy models.
We initiate the study of parameterized complexity of $\textsf{QMA}$ problems in terms of the number of non-Clifford gates in the problem description. We show that for the problem of parameterized quantum circuit satisfiability, there exists a classical algorithm solving the problem with a runtime scaling exponentially in the number of non-Clifford gates but only polynomially with the system size. This result follows from our main result, that for any Clifford + $t$ $T$-gate quantum circuit satisfiability problem, the search space of optimal witnesses can be reduced to a stabilizer subspace isomorphic to at most $t$ qubits (independent of the system size). Furthermore, we derive new lower bounds on the $T$-count of circuit satisfiability instances and the $T$-count of the $W$-state assuming the classical exponential time hypothesis ($\textsf{ETH}$). Lastly, we explore the parameterized complexity of the quantum non-identity check problem.
Given a way to evaluate an unknown polynomial with integer coefficients, we present new algorithms to recover its nonzero coefficients and corresponding exponents. As an application, we adapt this interpolation algorithm to the problem of computing the exact quotient of two given polynomials. These methods are efficient in terms of the bit-length of the sparse representation, that is, the number of nonzero terms, the size of coefficients, the number of variables, and the logarithm of the degree. At the core of our results is a new Monte Carlo randomized algorithm to recover an integer polynomial $f(x)$ given a way to evaluate $f(\theta) \bmod m$ for any chosen integers $\theta$ and $m$. This algorithm has nearly-optimal bit complexity, meaning that the total bit-length of the probes, as well as the computational running time, is softly linear (ignoring logarithmic factors) in the bit-length of the resulting sparse polynomial. To our knowledge, this is the first sparse interpolation algorithm with soft-linear bit complexity in the total output size. For integer polynomials, the best previously known results have at least a cubic dependency on the bit-length of the exponents.
We are interested in computing the treewidth $\tw(G)$ of a given graph $G$. Our approach is to design heuristic algorithms for computing a sequence of improving upper bounds and a sequence of improving lower bounds, which would hopefully converge to $\tw(G)$ from both sides. The upper bound algorithm extends and simplifies Tamaki's unpublished work on a heuristic use of the dynamic programming algorithm for deciding treewidth due to Bouchitt\'{e} and Todinca. The lower bound algorithm is based on the well-known fact that, for every minor $H$ of $G$, we have $\tw(H) \leq \tw(G)$. Starting from a greedily computed minor $H_0$ of $G$, the algorithm tries to construct a sequence of minors $H_0$, $H_1$, \ldots $H_k$ with $\tw(H_i) < \tw(H_{i + 1})$ for $0 \leq i < k$ and hopefully $\tw(H_k) = \tw(G)$. We have implemented a treewidth solver based on this approach and have evaluated it on the bonus instances from the exact treewidth track of PACE 2017 algorithm implementation challenge. The results show that our approach is extremely effective in tackling instances that are hard for conventional solvers. Our solver has an additional advantage over conventional ones in that it attaches a compact certificate to the lower bound it computes.