We report on our effort to create a corpus dataset of different social context situations in an office setting for further disciplinary and interdisciplinary research in computer vision, psychology, and human-robot-interaction. For social robots to be able to behave appropriately, they need to be aware of the social context they act in. Consider, for example, a robot with the task to deliver a personal message to a person. If the person is arguing with an office mate at the time of message delivery, it might be more appropriate to delay playing the message as to respect the recipient's privacy and not to interfere with the current situation. This can only be done if the situation is classified correctly and in a second step if an appropriate behavior is chosen that fits the social situation. Our work aims to enable robots accomplishing the task of classifying social situations by creating a dataset composed of semantically annotated video scenes of office situations from television soap operas. The dataset can then serve as a basis for conducting research in both computer vision and human-robot interaction.
We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we propose a way to reduce the replication to $s+1$ instead of $2s+1$ in the presence of $s$ malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of $s$ additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.
This work focuses on the synergy of rate-splitting multiple access (RSMA) and beyond diagonal reconfigurable intelligent surface (BD-RIS) to enlarge the coverage, improve the performance, and save on antennas. Specifically, we employ a multi-sector BD-RIS modeled as a prism, which can achieve highly directional full-space coverage, in a multiuser multiple input single output communication system. With the multi-sector BD-RIS aided RSMA model, we jointly design the transmit precoder and BD-RIS matrix under the imperfect channel state information (CSI) conditions. The robust design is performed by solving a stochastic average sum-rate maximization problem. With sample average approximation and weighted minimum mean square error-rate relationship, the stochastic problem is transformed into a deterministic one with multiple blocks, each of which is iteratively designed. Simulation results show that multi-sector BD-RIS aided RSMA outperforms space division multiple access schemes. More importantly, synergizing multi-sector BD-RIS with RSMA is an efficient strategy to reduce the number of active antennas at the transmitter and the number of passive antennas in BD-RIS.
Many biological processes display oscillatory behavior based on an approximately 24 hour internal timing system specific to each individual. One process of particular interest is gene expression, for which several circadian transcriptomic studies have identified associations between gene expression during a 24 hour period and an individual's health. A challenge with analyzing data from these studies is that each individual's internal timing system is offset relative to the 24 hour day-night cycle, where day-night cycle time is recorded for each collected sample. Laboratory procedures can accurately determine each individual's offset and determine the internal time of sample collection. However, these laboratory procedures are labor-intensive and expensive. In this paper, we propose a corrected score function framework to obtain a regression model of gene expression given internal time when the offset of each individual is too burdensome to determine. A feature of this framework is that it does not require the probability distribution generating offsets to be symmetric with a mean of zero. Simulation studies validate the use of this corrected score function framework for cosinor regression, which is prevalent in circadian transcriptomic studies. Illustrations with three real circadian transcriptomic data sets further demonstrate that the proposed framework consistently mitigates bias relative to using a score function that does not account for this offset.
Fraud detection aims to discover fraudsters deceiving other users by, for example, leaving fake reviews or making abnormal transactions. Graph-based fraud detection methods consider this task as a classification problem with two classes: frauds or normal. We address this problem using Graph Neural Networks (GNNs) by proposing a dynamic relation-attentive aggregation mechanism. Based on the observation that many real-world graphs include different types of relations, we propose to learn a node representation per relation and aggregate the node representations using a learnable attention function that assigns a different attention coefficient to each relation. Furthermore, we combine the node representations from different layers to consider both the local and global structures of a target node, which is beneficial to improving the performance of fraud detection on graphs with heterophily. By employing dynamic graph attention in all the aggregation processes, our method adaptively computes the attention coefficients for each node. Experimental results show that our method, DRAG, outperforms state-of-the-art fraud detection methods on real-world benchmark datasets.
Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.
We propose a novel data-driven semi-confirmatory factor analysis (SCFA) model that addresses the absence of model specification and handles the estimation and inference tasks with high-dimensional data. Confirmatory factor analysis (CFA) is a prevalent and pivotal technique for statistically validating the covariance structure of latent common factors derived from multiple observed variables. In contrast to other factor analysis methods, CFA offers a flexible covariance modeling approach for common factors, enhancing the interpretability of relationships between the common factors, as well as between common factors and observations. However, the application of classic CFA models faces dual barriers: the lack of a prerequisite specification of "non-zero loadings" or factor membership (i.e., categorizing the observations into distinct common factors), and the formidable computational burden in high-dimensional scenarios where the number of observed variables surpasses the sample size. To bridge these two gaps, we propose the SCFA model by integrating the underlying high-dimensional covariance structure of observed variables into the CFA model. Additionally, we offer computationally efficient solutions (i.e., closed-form uniformly minimum variance unbiased estimators) and ensure accurate statistical inference through closed-form exact variance estimators for all model parameters and factor scores. Through an extensive simulation analysis benchmarking against standard computational packages, SCFA exhibits superior performance in estimating model parameters and recovering factor scores, while substantially reducing the computational load, across both low- and high-dimensional scenarios. It exhibits moderate robustness to model misspecification. We illustrate the practical application of the SCFA model by conducting factor analysis on a high-dimensional gene expression dataset.
In contrast to conventional reconfigurable intelligent surface (RIS), simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) has been proposed recently to enlarge the serving area from 180o to 360o coverage. This work considers the performance of a STAR-RIS aided full-duplex (FD) non-orthogonal multiple access (NOMA) communication systems. The STAR-RIS is implemented at the cell-edge to assist the cell-edge users, while the cell-center users can communicate directly with a FD base station (BS). We first introduce new user clustering schemes for the downlink and uplink transmissions. Then, based on the proposed transmission schemes closed-form expressions of the ergodic rates in the downlink and uplink modes are derived taking into account the system impairments caused by the self interference at the FD-BS and the imperfect successive interference cancellation (SIC). Moreover, an optimization problem to maximize the total sum-rate is formulated and solved by optimizing the amplitudes and the phase-shifts of the STAR-RIS elements and allocating the transmit power efficiently. The performance of the proposed user clustering schemes and the optimal STAR-RIS design are investigated through numerical results
In semi-supervised domain adaptation, a few labeled samples per class in the target domain guide features of the remaining target samples to aggregate around them. However, the trained model cannot produce a highly discriminative feature representation for the target domain because the training data is dominated by labeled samples from the source domain. This could lead to disconnection between the labeled and unlabeled target samples as well as misalignment between unlabeled target samples and the source domain. In this paper, we propose a novel approach called Cross-domain Adaptive Clustering to address this problem. To achieve both inter-domain and intra-domain adaptation, we first introduce an adversarial adaptive clustering loss to group features of unlabeled target data into clusters and perform cluster-wise feature alignment across the source and target domains. We further apply pseudo labeling to unlabeled samples in the target domain and retain pseudo-labels with high confidence. Pseudo labeling expands the number of ``labeled" samples in each class in the target domain, and thus produces a more robust and powerful cluster core for each class to facilitate adversarial learning. Extensive experiments on benchmark datasets, including DomainNet, Office-Home and Office, demonstrate that our proposed approach achieves the state-of-the-art performance in semi-supervised domain adaptation.
Social relations are often used to improve recommendation quality when user-item interaction data is sparse in recommender systems. Most existing social recommendation models exploit pairwise relations to mine potential user preferences. However, real-life interactions among users are very complicated and user relations can be high-order. Hypergraph provides a natural way to model complex high-order relations, while its potentials for improving social recommendation are under-explored. In this paper, we fill this gap and propose a multi-channel hypergraph convolutional network to enhance social recommendation by leveraging high-order user relations. Technically, each channel in the network encodes a hypergraph that depicts a common high-order user relation pattern via hypergraph convolution. By aggregating the embeddings learned through multiple channels, we obtain comprehensive user representations to generate recommendation results. However, the aggregation operation might also obscure the inherent characteristics of different types of high-order connectivity information. To compensate for the aggregating loss, we innovatively integrate self-supervised learning into the training of the hypergraph convolutional network to regain the connectivity information with hierarchical mutual information maximization. The experimental results on multiple real-world datasets show that the proposed model outperforms the SOTA methods, and the ablation study verifies the effectiveness of the multi-channel setting and the self-supervised task. The implementation of our model is available via //github.com/Coder-Yu/RecQ.
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.