In this paper, we consider classes of decision tables closed under removal of attributes (columns) and changing of decisions attached to rows. For decision tables from closed classes, we study lower bounds on the minimum cardinality of reducts, which are minimal sets of attributes that allow us to recognize, for a given row, the decision attached to it. We assume that the number of rows in decision tables from the closed class is not bounded from above by a constant. We divide the set of such closed classes into two families. In one family, only standard lower bounds $\Omega (\log $ ${\rm cl}(T))$ on the minimum cardinality of reducts for decision tables hold, where ${\rm cl}(T)$ is the number of decision classes in the table $T$. In another family, these bounds can be essentially tightened up to $\Omega ({\rm cl}(T)^{1/q})$ for some natural $q$.
In this paper, we consider an active reconfigurable intelligent surface (RIS) to assist the multiuser downlink transmission in the presence of practical hardware impairments (HWIs), including the HWIs at the transceivers and the phase noise at the active RIS. The active RIS is deployed to amplify the incident signals to alleviate the multiplicative fading effect, which is a limitation in the conventional passive RIS-aided wireless systems. We aim to maximize the sum rate through jointly designing the transmit beamforming at the base station (BS), the amplification factors and the phase shifts at the active RIS. To tackle this challenging optimization problem effectively, we decouple it into two tractable subproblems. Subsequently, each subproblem is transformed into a second order cone programming problem. The block coordinate descent framework is applied to tackle them, where the transmit beamforming and the reflection coefficients are alternately designed. In addition, another efficient algorithm is presented to reduce the computational complexity. Specifically, by exploiting the majorization-minimization approach, each subproblem is reformulated into a tractable surrogate problem, whose closed-form solutions are obtained by Lagrange dual decomposition approach and element-wise alternating sequential optimization method. Simulation results validate the effectiveness of our developed algorithms, and reveal that the HWIs significantly limit the system performance of active RIS-empowered wireless communications. Furthermore, the active RIS noticeably boosts the sum rate under the same total power budget, compared with the passive RIS.
Numerous robust estimators exist as alternatives to the maximum likelihood estimator (MLE) when a completely observed ground-up loss severity sample dataset is available. However, the options for robust alternatives to MLE become significantly limited when dealing with grouped loss severity data, with only a handful of methods like least squares, minimum Hellinger distance, and optimal bounded influence function available. This paper introduces a novel robust estimation technique, the Method of Truncated Moments (MTuM), specifically designed to estimate the tail index of a Pareto distribution from grouped data. Inferential justification of MTuM is established by employing the central limit theorem and validating them through a comprehensive simulation study.
In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.
In this paper, we propose an efficient multi-stage algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. Our numerical results demonstrate that the proposed Multi-Stage GT (MSGT) algorithm can obtain the optimal Maximum A Posteriori (MAP) performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least $25\%$ compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed MSGT algorithm that guarantees its efficiency.
The wayward quality of continuous prompts stresses the importance of their interpretability as unexpected and unpredictable behaviors appear following training, especially in the context of large language models automating people-sensitive tasks such as resume screening. In this paper we present a novel method of constructing continuous prompts via discrete prompt embeddings and evaluate improvements to continuous prompt interpretability and inference accuracy. For a set of manually designed discrete prompts $\mathcal{D}$, which we tokenize and embed each into tensor form, we train a model to predict the weights such that the linear combinations of those prompts correspond to higher performance on natural language understanding tasks.
In this paper, we propose a new deinterleaving method for mixtures of discrete renewal Markov chains. This method relies on the maximization of a penalized likelihood score. It exploits all available information about both the sequence of the different symbols and their arrival times. A theoretical analysis is carried out to prove that minimizing this score allows to recover the true partition of symbols in the large sample limit, under mild conditions on the component processes. This theoretical analysis is then validated by experiments on synthetic data. Finally, the method is applied to deinterleave pulse trains received from different emitters in a RESM (Radar Electronic Support Measurements) context and we show that the proposed method competes favorably with state-of-the-art methods on simulated warfare datasets.
Random binning is a widely utilized tool in information theory, finding applications in various domains. In this paper, we focus on the output statistics of random binning (OSRB) using the Tsallis divergence $T_\alpha$. Our investigation encompasses all values of $\alpha$ within the range of $(0,\infty)$. The proofs provided in this paper cover both the achievability and converse aspects. To accommodate the unbounded nature of $T_\infty$, we analyze the OSRB framework using the R\'enyi's divergence criterion with the order of infinity, denoted as $D_\infty$. During our exploration of OSRB, we encounter a specific form of R\'enyi's conditional entropy and delve into its properties. Additionally, we demonstrate the effectiveness of this framework in establishing achievability results for wiretap channel, where Tsallis divergence serves as a security measure.
Dehumanization is a mental process that enables the exclusion and ill treatment of a group of people. In this paper, we present two data sets of dehumanizing text, a large, automatically collected corpus and a smaller, manually annotated data set. Both data sets include a combination of political discourse and dialogue from movie subtitles. Our methods give us a broad and varied amount of dehumanization data to work with, enabling further exploratory analysis and automatic classification of dehumanization patterns. Both data sets will be publicly released.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.