In conventional joint communications and sensing (JCAS) designs for multi-carrier multiple-input multiple-output (MIMO) systems, the dual-functional waveforms are often optimized for the whole frequency band, resulting in limited communications--sensing performance tradeoff. To overcome the limitation, we propose employing a subset of subcarriers for JCAS, while the communications function is performed over all the subcarriers. This offers more degrees of freedom to enhance the communications performance under a given sensing accuracy. We first formulate the rate maximization under the sensing accuracy constraint to optimize the beamformers and JCAS subcarriers. The problem is solved via Riemannian manifold optimization and closed-form solutions. Numerical results for an 8x4 MIMO system with 64 subcarriers show that compared to the conventional subcarrier sharing scheme, the proposed scheme employing 16 JCAS subcarriers offers 60% improvement in the achievable communications rate at the signal-to-noise ratio of 10 dB. Meanwhile, this scheme generates the sensing beampattern with the same quality as the conventional JCAS design.
Individuals with complex communication needs (CCN) often rely on augmentative and alternative communication (AAC) systems to have conversations and communique their wants. Such systems allow message authoring by arranging pictograms in sequence. However, the difficulty of finding the desired item to complete a sentence can increase as the user's vocabulary increases. This paper proposes using BERTimbau, a Brazilian Portuguese version of BERT, for pictogram prediction in AAC systems. To finetune BERTimbau, we constructed an AAC corpus for Brazilian Portuguese to use as a training corpus. We tested different approaches to representing a pictogram for prediction: as a word (using pictogram captions), as a concept (using a dictionary definition), and as a set of synonyms (using related terms). We also evaluated the usage of images for pictogram prediction. The results demonstrate that using embeddings computed from the pictograms' caption, synonyms, or definitions have a similar performance. Using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. This paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.
Delay alignment modulation (DAM) is a promising technology to achieve ISI-free wideband communication, by leveraging delay compensation and path-based beamforming, rather than the conventional channel equalization or multi-carrier transmission. In particular, when there exist a few strong time-dispersive channel paths, DAM can effectively align different propagation delays and achieve their constructive superposition, thus especially appealing for intelligent reflecting surfaces (IRSs)-aided communications with controllable multi-paths. In this paper, we apply DAM to multi-IRS aided wideband communication and study its practical design and achievable performance. We first provide an asymptotic analysis showing that when the number of base station (BS) antennas is much larger than that of IRSs, an ISI-free channel can be established with appropriate delay pre-compensation and the simple path-based MRT beamforming. We then consider the general system setup and study the problem of joint path-based beamforming and phase shifts design for DAM transmission, by considering the three classical beamforming techniques on a per-path basis, namely the low-complexity path-based MRT beamforming, the path-based ZF beamforming for ISI-free DAM communication, and the optimal path-based MMSE beamforming. As a comparison, OFDM-based multi-IRS aided communication is considered. Simulation results demonstrate that DAM outperforms OFDM in terms of spectral efficiency, BER, and PAPR.
Unmanned aerial vehicles (UAVs) are commonly used for edge collaborative computing in current transmission line object detection, where computationally intensive tasks generated by user nodes are offloaded to more powerful edge servers for processing. However, performing edge collaborative processing on transmission line image data may result in serious privacy breaches. To address this issue, we propose a secure single-stage detection model called SecYOLOv7 that preserves the privacy of object detecting. Based on secure multi-party computation (MPC), a series of secure computing protocols are designed for the collaborative execution of Secure Feature Contraction, Secure Bounding-Box Prediction and Secure Object Classification by two non-edge servers. Performance evaluation shows that both computational and communication overhead in this framework as well as calculation error significantly outperform existing works.
We propose fast and communication-efficient optimization algorithms for multi-robot rotation averaging and translation estimation problems that arise from collaborative simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and camera network localization applications. Our methods are based on theoretical relations between the Hessians of the underlying Riemannian optimization problems and the Laplacians of suitably weighted graphs. We leverage these results to design a collaborative solver in which robots coordinate with a central server to perform approximate second-order optimization, by solving a Laplacian system at each iteration. Crucially, our algorithms permit robots to employ spectral sparsification to sparsify intermediate dense matrices before communication, and hence provide a mechanism to trade off accuracy with communication efficiency with provable guarantees. We perform rigorous theoretical analysis of our methods and prove that they enjoy (local) linear rate of convergence. Furthermore, we show that our methods can be combined with graduated non-convexity to achieve outlier-robust estimation. Extensive experiments on real-world SLAM and SfM scenarios demonstrate the superior convergence rate and communication efficiency of our methods.
Sparse-view computed tomography (CT) -- using a small number of projections for tomographic reconstruction -- enables much lower radiation dose to patients and accelerated data acquisition. The reconstructed images, however, suffer from strong artifacts, greatly limiting their diagnostic value. Current trends for sparse-view CT turn to the raw data for better information recovery. The resultant dual-domain methods, nonetheless, suffer from secondary artifacts, especially in ultra-sparse view scenarios, and their generalization to other scanners/protocols is greatly limited. A crucial question arises: have the image post-processing methods reached the limit? Our answer is not yet. In this paper, we stick to image post-processing methods due to great flexibility and propose global representation (GloRe) distillation framework for sparse-view CT, termed GloReDi. First, we propose to learn GloRe with Fourier convolution, so each element in GloRe has an image-wide receptive field. Second, unlike methods that only use the full-view images for supervision, we propose to distill GloRe from intermediate-view reconstructed images that are readily available but not explored in previous literature. The success of GloRe distillation is attributed to two key components: representation directional distillation to align the GloRe directions, and band-pass-specific contrastive distillation to gain clinically important details. Extensive experiments demonstrate the superiority of the proposed GloReDi over the state-of-the-art methods, including dual-domain ones. The source code is available at //github.com/longzilicart/GloReDi.
The field of robotic Flexible Endoscopes (FEs) has progressed significantly, offering a promising solution to reduce patient discomfort. However, the limited autonomy of most robotic FEs results in non-intuitive and challenging manoeuvres, constraining their application in clinical settings. While previous studies have employed lumen tracking for autonomous navigation, they fail to adapt to the presence of obstructions and sharp turns when the endoscope faces the colon wall. In this work, we propose a Deep Reinforcement Learning (DRL)-based navigation strategy that eliminates the need for lumen tracking. However, the use of DRL methods poses safety risks as they do not account for potential hazards associated with the actions taken. To ensure safety, we exploit a Constrained Reinforcement Learning (CRL) method to restrict the policy in a predefined safety regime. Moreover, we present a model selection strategy that utilises Formal Verification (FV) to choose a policy that is entirely safe before deployment. We validate our approach in a virtual colonoscopy environment and report that out of the 300 trained policies, we could identify three policies that are entirely safe. Our work demonstrates that CRL, combined with model selection through FV, can improve the robustness and safety of robotic behaviour in surgical applications.
Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.