In this paper, we find a necessary and sufficient condition for multi-twisted Reed-Solomon codes to be MDS. In particular, we introduce a new class of MDS double-twisted Reed-Solomon codes $\mathcal{C}_{\bm \alpha, \bm t, \bm h, \bm \eta}$ with twists $\bm t = (1, 2)$ and hooks $\bm h = (0, 1)$ over the finite field $\mathbb{F}_q$, providing a non-trivial example over $\mathbb{F}_{16}$ and enumeration over the finite fields of size up to 17. Moreover, we obtain necessary conditions for the existence of multi-twisted Reed-Solomon codes with small dimensional hull. Consequently, we derive conditions for the existence of MDS multi-twisted Reed-Solomon codes with small dimensional hull.
We explore different aspects of cognitive diversity and its effect on the success of group deliberation. To evaluate this, we use 500 dialogues from small, online groups discussing the Wason Card Selection task - the DeliData corpus. Leveraging the corpus, we perform quantitative analysis evaluating three different measures of cognitive diversity. First, we analyse the effect of group size as a proxy measure for diversity. Second, we evaluate the effect of the size of the initial idea pool. Finally, we look into the content of the discussion by analysing discussed solutions, discussion patterns, and how conversational probing can improve those characteristics. Despite the reputation of groups for compounding bias, we show that small groups can, through dialogue, overcome intuitive biases and improve individual decision-making. Across a large sample and different operationalisations, we consistently find that greater cognitive diversity is associated with more successful group deliberation. Code and data used for the analysis are available in the anonymised repository: //anonymous.4open.science/ r/cogsci24-FD6D
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results.
In this paper we take 4 different features of the SAT solver CaDiCaL, blocked clause elimination, vivification, on-the-fly self subsumption, and increasing the bound of variable elimination over the SAT Competitions benchmarks between 2009 and 2022. We study these features by both activating them one-by-one and deactivating them one-by-one. We have three hypothesis regarding the experiments: (i) disabling features is always harmful; (ii) the life span of the techniques is limited; and (iii) features simulate each other. Our experiments cannot confirm any of the hypothesis.
In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important cluster-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.
In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at //github.com/redakhoufache/Distributed-NPLBM.
In this paper, we consider interpolation by \textit{completely monotonous} polynomials (CMPs for short), that is, polynomials with non-negative real coefficients. In particular, given a finite set $S\subset \mathbb{R}_{>0} \times \mathbb{R}_{\geq 0}$, we consider \textit{the minimal polynomial} of $S$, introduced by Berg [1985], which is `minimal,' in the sense that it is eventually majorized by all the other CMPs interpolating $S$. We give an upper bound of the degree of the minimal polynomial of $S$ when it exists. Furthermore, we give another algorithm for computing the minimal polynomial of given $S$ which utilizes an order structure on sign sequences. Applying the upper bound above, we also analyze the computational complexity of algorithms for computing minimal polynomials including ours.
In this paper, we propose a novel approach to test the equality of high-dimensional mean vectors of several populations via the weighted $L_2$-norm. We establish the asymptotic normality of the test statistics under the null hypothesis. We also explain theoretically why our test statistics can be highly useful in weakly dense cases when the nonzero signal in mean vectors is present. Furthermore, we compare the proposed test with existing tests using simulation results, demonstrating that the weighted $L_2$-norm-based test statistic exhibits favorable properties in terms of both size and power.
In this paper, we provide an analysis of a recently proposed multicontinuum homogenization technique. The analysis differs from those used in classical homogenization methods for several reasons. First, the cell problems in multicontinuum homogenization use constraint problems and can not be directly substituted into the differential operator. Secondly, the problem contains high contrast that remains in the homogenized problem. The homogenized problem averages the microstructure while containing the small parameter. In this analysis, we first based on our previous techniques, CEM-GMsFEM, to define a CEM-downscaling operator that maps the multicontinuum quantities to an approximated microscopic solution. Following the regularity assumption of the multicontinuum quantities, we construct a downscaling operator and the homogenized multicontinuum equations using the information of linear approximation of the multicontinuum quantities. The error analysis is given by the residual estimate of the homogenized equations and the well-posedness assumption of the homogenized equations.
This paper investigates the signal detection problem in colored noise with an unknown covariance matrix. In particular, we focus on detecting an unknown non-random signal by capitalizing on the leading eigenvalue of the whitened sample covariance matrix as the test statistic (a.k.a. Roy's largest root test). Since the unknown signal is non-random, the whitened sample covariance matrix turns out to have a non-central $F$-distribution. This distribution assumes a singular or non-singular form depending on whether the number of observations $p\lessgtr$ the system dimensionality $m$. Therefore, we statistically characterize the leading eigenvalue of the singular and non-singular $F$-matrices by deriving their cumulative distribution functions (c.d.f.). Subsequently, they have been utilized in deriving the corresponding receiver operating characteristic (ROC) profiles. We also extend our analysis into the high dimensional domain. It turns out that, when the signal is sufficiently strong, the maximum eigenvalue can reliably detect it in this regime. Nevertheless, weak signals cannot be detected in the high dimensional regime with the leading eigenvalue.
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum