The sixth-generation (6G) wireless technology recognizes the potential of reconfigurable intelligent surfaces (RIS) as an effective technique for intelligently manipulating channel paths through reflection to serve desired users. Full-duplex (FD) systems, enabling simultaneous transmission and reception from a base station (BS), offer the theoretical advantage of doubled spectrum efficiency. However, the presence of strong self-interference (SI) in FD systems significantly degrades performance, which can be mitigated by leveraging the capabilities of RIS. Moreover, accurately obtaining channel state information (CSI) from RIS poses a critical challenge. Our objective is to maximize downlink (DL) user data rates while ensuring quality-of-service (QoS) for uplink (UL) users under imperfect CSI from reflected channels. To address this, we introduce the robust active BS and passive RIS beamforming (RAPB) scheme for RIS-FD, accounting for both SI and imperfect CSI. RAPB incorporates distributionally robust design, conditional value-at-risk (CVaR), and penalty convex-concave programming (PCCP) techniques. Additionally, RAPB extends to active and passive beamforming (APB) with perfect channel estimation. Simulation results demonstrate the UL/DL rate improvements achieved considering various levels of imperfect CSI. The proposed RAPB/APB schemes validate their effectiveness across different RIS deployment and RIS/BS configurations. Benefited from robust beamforming, RAPB outperforms existing methods in terms of non-robustness, deployment without RIS, conventional successive convex approximation, and half-duplex systems.
Complex scenario of ultrasound image, in which adjacent tissues (i.e., background) share similar intensity with and even contain richer texture patterns than lesion region (i.e., foreground), brings a unique challenge for accurate lesion segmentation. This work presents a decomposition-coupling network, called DC-Net, to deal with this challenge in a (foreground-background) saliency map disentanglement-fusion manner. The DC-Net consists of decomposition and coupling subnets, and the former preliminarily disentangles original image into foreground and background saliency maps, followed by the latter for accurate segmentation under the assistance of saliency prior fusion. The coupling subnet involves three aspects of fusion strategies, including: 1) regional feature aggregation (via differentiable context pooling operator in the encoder) to adaptively preserve local contextual details with the larger receptive field during dimension reduction; 2) relation-aware representation fusion (via cross-correlation fusion module in the decoder) to efficiently fuse low-level visual characteristics and high-level semantic features during resolution restoration; 3) dependency-aware prior incorporation (via coupler) to reinforce foreground-salient representation with the complementary information derived from background representation. Furthermore, a harmonic loss function is introduced to encourage the network to focus more attention on low-confidence and hard samples. The proposed method is evaluated on two ultrasound lesion segmentation tasks, which demonstrates the remarkable performance improvement over existing state-of-the-art methods.
As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at //github.com/zjukg/UMAEA.
With the growing prevalence of electric vehicles (EVs) and advancements in EV electronics, vehicle-to-grid (V2G) techniques and large-scale scheduling strategies have emerged to promote renewable energy utilization and power grid stability. This study proposes a multi-stakeholder hierarchical V2G coordination based on deep reinforcement learning (DRL) and the Proof of Stake algorithm. Furthermore, the multi-stakeholders include the power grid, EV aggregators (EVAs), and users, and the proposed strategy can achieve multi-stakeholder benefits. On the grid side, load fluctuations and renewable energy consumption are considered, while on the EVA side, energy constraints and charging costs are considered. The three critical battery conditioning parameters of battery SOX are considered on the user side, including state of charge, state of power, and state of health. Compared with four typical baselines, the multi-stakeholder hierarchical coordination strategy can enhance renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVA, and reduce charging costs and battery degradation under realistic operating conditions.
With the continuous advancement of robot teleoperation technology, shared control is used to reduce the physical and mental load of the operator in teleoperation system. This paper proposes an alternating shared control framework for object grasping that considers both operator's preferences through their manual manipulation and the constraints of the follower robot. The switching between manual mode and automatic mode enables the operator to intervene the task according to their wishes. The generation of the grasping pose takes into account the current state of the operator's hand pose, as well as the manipulability of the robot. The object grasping experiment indicates that the use of the proposed grasping pose selection strategy leads to smoother follower movements when switching from manual mode to automatic mode.
Neural networks have proven to be effective at solving machine learning tasks but it is unclear whether they learn any relevant causal relationships, while their black-box nature makes it difficult for modellers to understand and debug them. We propose a novel method overcoming these issues by allowing a two-way interaction whereby neural-network-empowered machines can expose the underpinning learnt causal graphs and humans can contest the machines by modifying the causal graphs before re-injecting them into the machines. The learnt models are guaranteed to conform to the graphs and adhere to expert knowledge, some of which can also be given up-front. By building a window into the model behaviour and enabling knowledge injection, our method allows practitioners to debug networks based on the causal structure discovered from the data and underpinning the predictions. Experiments with real and synthetic tabular data show that our method improves predictive performance up to 2.4x while producing parsimonious networks, up to 7x smaller in the input layer, compared to SOTA regularised networks.
We propose an unconditionally energy-stable, orthonormality-preserving, component-wise splitting iterative scheme for the Kohn-Sham gradient flow based model in the electronic structure calculation. We first study the scheme discretized in time but still continuous in space. The component-wise splitting iterative scheme changes one wave function at a time, similar to the Gauss-Seidel iteration for solving a linear equation system. Rigorous mathematical derivations are presented to show our proposed scheme indeed satisfies the desired properties. We then study the fully-discretized scheme, where the space is further approximated by a conforming finite element subspace. For the fully-discretized scheme, not only the preservation of orthogonality and normalization (together we called orthonormalization) can be quickly shown using the same idea as for the semi-discretized scheme, but also the highlight property of the scheme, i.e., the unconditional energy stability can be rigorously proven. The scheme allows us to use large time step sizes and deal with small systems involving only a single wave function during each iteration step. Several numerical experiments are performed to verify the theoretical analysis, where the number of iterations is indeed greatly reduced as compared to similar examples solved by the Kohn-Sham gradient flow based model in the literature.
Reconfigurable intelligent surfaces (RISs) allow controlling the propagation environment in wireless networks through reconfigurable elements. Recently, beyond diagonal RISs (BD-RISs) have been proposed as novel RIS architectures whose scattering matrix is not limited to being diagonal. However, BDRISs have been studied assuming continuous-value scattering matrices, which are hard to implement in practice. In this paper, we address this problem by proposing two solutions to realize discrete-value group and fully connected RISs. First, we propose scalar-discrete RISs, in which each entry of the RIS impedance matrix is independently discretized. Second, we propose vector-discrete RISs, where the entries in each group of the RIS impedance matrix are jointly discretized. In both solutions, the codebook is designed offline such as to minimize the distortion caused in the RIS impedance matrix by the discretization operation. Numerical results show that vector-discrete RISs achieve higher performance than scalar discrete RISs at the cost of increased optimization complexity. Furthermore, fewer resolution bits per impedance are necessary to achieve the performance upper bound as the group size of the group connected architecture increases. In particular, only a single resolution bit is sufficient in fully connected RISs to approximately achieve the performance upper bound.
Image-level weakly supervised semantic segmentation (WSSS) is a fundamental yet challenging computer vision task facilitating scene understanding and automatic driving. Most existing methods resort to classification-based Class Activation Maps (CAMs) to play as the initial pseudo labels, which tend to focus on the discriminative image regions and lack customized characteristics for the segmentation task. To alleviate this issue, we propose a novel activation modulation and recalibration (AMR) scheme, which leverages a spotlight branch and a compensation branch to obtain weighted CAMs that can provide recalibration supervision and task-specific concepts. Specifically, an attention modulation module (AMM) is employed to rearrange the distribution of feature importance from the channel-spatial sequential perspective, which helps to explicitly model channel-wise interdependencies and spatial encodings to adaptively modulate segmentation-oriented activation responses. Furthermore, we introduce a cross pseudo supervision for dual branches, which can be regarded as a semantic similar regularization to mutually refine two branches. Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label. Experiments also reveal that our scheme is plug-and-play and can be incorporated with other approaches to boost their performance.
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.