Researchers frequently wish to assess the equality or inequality of groups, but this comes with the challenge of adequately adjusting for multiple comparisons. Statistically, all possible configurations of equality and inequality constraints can be uniquely represented as partitions of the groups, where any number of groups are equal if they are in the same partition. In a Bayesian framework, one can adjust for multiple comparisons by constructing a suitable prior distribution over all possible partitions. Inspired by work on variable selection in regression, we propose a class of flexible beta-binomial priors for Bayesian multiple comparison adjustment. We compare this prior setup to the Dirichlet process prior suggested by Gopalan and Berry (1998) and multiple comparison adjustment methods that do not specify a prior over partitions directly. Our approach to multiple comparison adjustment not only allows researchers to assess all pairwise (in)equalities, but in fact all possible (in)equalities among all groups. As a consequence, the space of possible partitions grows quickly - for ten groups, there are already 115,975 possible partitions - and we set up a stochastic search algorithm to efficiently explore the space. Our method is implemented in the Julia package EqualitySampler, and we illustrate it on examples related to the comparison of means, variances, and proportions.
In this paper, we propose an orthogonal block wise Kaczmarz (POBK) algorithm based on preprocessing techniques to solve large-scale sparse linear systems $Ax=f$. Firstly, the Reverse Cuthill McKee Algorithm (RCM) algorithm is used to preprocess the linear system, and then a new partitioning strategy is proposed to divide orthogonal blocks into one category, in order to accelerate the convergence rate of the Kaczmarz algorithm. The convergence of the POBK algorithm has been theoretically proven, and a theoretical analysis of its faster convergence is also provided. In addition, the experimental results confirm that this algorithm is far superior to GRBK, RBK(k), and GREBK(k) algorithms in both iteration steps (IT) and CPU time aspects.
We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}
sEMG pattern recognition algorithms have been explored extensively in decoding movement intent, yet are known to be vulnerable to changing recording conditions, exhibiting significant drops in performance across subjects, and even across sessions. Multi-channel surface EMG, also referred to as high-density sEMG (HD-sEMG) systems, have been used to improve performance with the information collected through the use of additional electrodes. However, a lack of robustness is ever present due to limited datasets and the difficulties in addressing sources of variability, such as electrode placement. In this study, we propose training on a collection of input channel subsets and augmenting our training distribution with data from different electrode locations, simultaneously targeting electrode shift and reducing input dimensionality. Our method increases robustness against electrode shift and results in significantly higher intersession performance across subjects and classification algorithms.
We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we propose a way to reduce the replication to $s+1$ instead of $2s+1$ in the presence of $s$ malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of $s$ additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.
This work focuses on the synergy of rate-splitting multiple access (RSMA) and beyond diagonal reconfigurable intelligent surface (BD-RIS) to enlarge the coverage, improve the performance, and save on antennas. Specifically, we employ a multi-sector BD-RIS modeled as a prism, which can achieve highly directional full-space coverage, in a multiuser multiple input single output communication system. With the multi-sector BD-RIS aided RSMA model, we jointly design the transmit precoder and BD-RIS matrix under the imperfect channel state information (CSI) conditions. The robust design is performed by solving a stochastic average sum-rate maximization problem. With sample average approximation and weighted minimum mean square error-rate relationship, the stochastic problem is transformed into a deterministic one with multiple blocks, each of which is iteratively designed. Simulation results show that multi-sector BD-RIS aided RSMA outperforms space division multiple access schemes. More importantly, synergizing multi-sector BD-RIS with RSMA is an efficient strategy to reduce the number of active antennas at the transmitter and the number of passive antennas in BD-RIS.
We introduce a new class of hybrid preconditioners for solving parametric linear systems of equations. The proposed preconditioners are constructed by hybridizing the deep operator network, namely DeepONet, with standard iterative methods. Exploiting the spectral bias, DeepONet-based components are harnessed to address low-frequency error components, while conventional iterative methods are employed to mitigate high-frequency error components. Our preconditioning framework comprises two distinct hybridization approaches: direct preconditioning (DP) and trunk basis (TB) approaches. In the DP approach, DeepONet is used to approximate an action of an inverse operator to a vector during each preconditioning step. In contrast, the TB approach extracts basis functions from the trained DeepONet to construct a map to a smaller subspace, in which the low-frequency component of the error can be effectively eliminated. Our numerical results demonstrate that utilizing the TB approach enhances the convergence of Krylov methods by a large margin compared to standard non-hybrid preconditioning strategies. Moreover, the proposed hybrid preconditioners exhibit robustness across a wide range of model parameters and problem resolutions.
Many biological processes display oscillatory behavior based on an approximately 24 hour internal timing system specific to each individual. One process of particular interest is gene expression, for which several circadian transcriptomic studies have identified associations between gene expression during a 24 hour period and an individual's health. A challenge with analyzing data from these studies is that each individual's internal timing system is offset relative to the 24 hour day-night cycle, where day-night cycle time is recorded for each collected sample. Laboratory procedures can accurately determine each individual's offset and determine the internal time of sample collection. However, these laboratory procedures are labor-intensive and expensive. In this paper, we propose a corrected score function framework to obtain a regression model of gene expression given internal time when the offset of each individual is too burdensome to determine. A feature of this framework is that it does not require the probability distribution generating offsets to be symmetric with a mean of zero. Simulation studies validate the use of this corrected score function framework for cosinor regression, which is prevalent in circadian transcriptomic studies. Illustrations with three real circadian transcriptomic data sets further demonstrate that the proposed framework consistently mitigates bias relative to using a score function that does not account for this offset.
Fraud detection aims to discover fraudsters deceiving other users by, for example, leaving fake reviews or making abnormal transactions. Graph-based fraud detection methods consider this task as a classification problem with two classes: frauds or normal. We address this problem using Graph Neural Networks (GNNs) by proposing a dynamic relation-attentive aggregation mechanism. Based on the observation that many real-world graphs include different types of relations, we propose to learn a node representation per relation and aggregate the node representations using a learnable attention function that assigns a different attention coefficient to each relation. Furthermore, we combine the node representations from different layers to consider both the local and global structures of a target node, which is beneficial to improving the performance of fraud detection on graphs with heterophily. By employing dynamic graph attention in all the aggregation processes, our method adaptively computes the attention coefficients for each node. Experimental results show that our method, DRAG, outperforms state-of-the-art fraud detection methods on real-world benchmark datasets.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.