Large intelligent surface (LIS) has gained momentum as a potential 6G-enabling technology that expands the benefits of massive multiple-input multiple-output (MIMO). On the other hand, orthogonal space-division multiplexing (OSDM) may give a promising direction for efficient exploitation of the spatial resources, analogous as what is achieved with orthogonal frequency-division multiplexing (OFDM) in the frequency domain. To this end, we study how to enforce channels orthogonality in a panel-based LIS scenario. Our proposed method consists of having a subset of active LIS-panels coherently serving a set of users, and another subset of LIS-panels operating in semi-passive mode by implementing a receive and re-transmit (RRTx) process. This results in an inter-symbol interference (ISI) channel, where we characterize the semi-passive processing required to achieve simultaneous orthogonality in time and space. We then employ the remaining degrees of freedom (DoFs) from the orthogonality constraint to minimize the semi-passive processing power, where we derive a closed-form global minimizer, allowing for efficient implementation of the proposed scheme.
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can proactively reshape the characteristics of wireless channel environments. In RIS-assisted communication systems, the acquisition of channel state information (CSI) and the optimization of reflecting coefficients constitute major design challenges. To address these issues, codebook-based solutions have been developed recently, which, however, are mostly environment-agnostic. In this paper, a novel environment-aware codebook protocol is proposed, which can significantly reduce both pilot overhead and computational complexity, while maintaining expected communication performance. Specifically, first of all, a channel training framework is introduced to divide the training phase into several blocks. In each block, we directly estimate the composite end-to-end channel and focus only on the transmit beamforming. Second, we propose an environment-aware codebook generation scheme, which first generates a group of channels based on statistical CSI, and then obtains their corresponding RIS configuration by utilizing the alternating optimization (AO) method offline. In each online training block, the RIS is configured based on the corresponding codeword in the environment-aware codebook, and the optimal codeword resulting in the highest sum rate is adopted for assisting in the downlink data transmission. Third, we analyze the theoretical performance of the environment-aware codebook-based protocol taking into account the channel estimation errors. Finally, numerical simulations are provided to verify our theoretical analysis and the performance of the proposed scheme. In particular, the simulation results demonstrate that our protocol is more competitive than conventional environment-agnostic codebooks.
Increasing rate of progress in hardware and artificial intelligence (AI) solutions is enabling a range of software systems to be deployed closer to their users, increasing application of edge software system paradigms. Edge systems support scenarios in which computation is placed closer to where data is generated and needed, and provide benefits such as reduced latency, bandwidth optimization, and higher resiliency and availability. Users who operate in highly-uncertain and resource-constrained environments, such as first responders, law enforcement, and soldiers, can greatly benefit from edge systems to support timelier decision making. Unfortunately, understanding how different architecture approaches for edge systems impact priority quality concerns is largely neglected by industry and research, yet crucial for national and local safety, optimal resource utilization, and timely decision making. Much of industry is focused on the hardware and networking aspects of edge systems, with very little attention to the software that enables edge capabilities. This paper presents our work to fill this gap, defining a reference architecture for edge systems in highly-uncertain environments, and showing examples of how it has been implemented in practice.
Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, student behavior detection, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. We then analyze the practical challenges in EDM and propose targeted solutions. Finally, we point out emerging trends and future directions in this research area.
Many robotic applications require to grasp objects not arbitrarily but at a very specific object part. This is especially important for manipulation tasks beyond simple pick-and-place scenarios or in robot-human interactions, such as object handovers. We propose AnyPart, a practical system that combines open-vocabulary object detection, open-vocabulary part segmentation and 6DOF grasp pose prediction to infer a grasp pose on a specific part of an object in 800 milliseconds. We contribute two new datasets for the task of open-vocabulary part-based grasping, a hand-segmented dataset containing 1014 object-part segmentations, and a dataset of real-world scenarios gathered during our robot trials for individual objects and table-clearing tasks. We evaluate AnyPart on a mobile manipulator robot using a set of 28 common household objects over 360 grasping trials. AnyPart is capable of producing successful grasps 69.52 %, when ignoring robot-based grasp failures, AnyPart predicts a grasp location on the correct part 88.57 % of the time.
Unbalanced optimal transport (UOT) has recently gained much attention due to its flexible framework for handling un-normalized measures and its robustness properties. In this work, we explore learning (structured) sparse transport plans in the UOT setting, i.e., transport plans have an upper bound on the number of non-sparse entries in each column (structured sparse pattern) or in the whole plan (general sparse pattern). We propose novel sparsity-constrained UOT formulations building on the recently explored maximum mean discrepancy based UOT. We show that the proposed optimization problem is equivalent to the maximization of a weakly submodular function over a uniform matroid or a partition matroid. We develop efficient gradient-based discrete greedy algorithms and provide the corresponding theoretical guarantees. Empirically, we observe that our proposed greedy algorithms select a diverse support set and we illustrate the efficacy of the proposed approach in various applications.
The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combined strengths of these diverse data sources. Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science to integrate various data modalities into a single framework adaptively. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks, including sensors never seen during pretraining. DOFA's innovative design offers a promising leap towards more accurate, efficient, and unified Earth observation analysis, showcasing remarkable adaptability and performance in harnessing the potential of multimodal Earth observation data.
Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model output by finding correctly pronounced tokens from its pre-built datastore during the decoding phase. Moreover, we utilize x-vector to dynamically adjust kNN interpolation parameters for data sparsity issue. This approach was validated using KeSpeech and MagicData corpora under in-domain and all-domain settings. Our method consistently performs comparably to fine-tuning without the associated performance degradation during speaker changes. Furthermore, in the all-domain setting, our method achieves state-of-the-art results, reducing the CER in both single speaker and multi-speaker test scenarios.
The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly compact tensor train representation, is still lacking. This has prevented practitioners from deploying the full potential of tensor networks for visual data. To this end, we propose 'Prolongation Upsampling Tensor Train (PuTT)', a novel method for learning tensor train representations in a coarse-to-fine manner. Our method involves the prolonging or `upsampling' of a learned tensor train representation, creating a sequence of 'coarse-to-fine' tensor trains that are incrementally refined. We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. For full results see our project webpage: //sebulo.github.io/PuTT_website/
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.