This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.
The recent advances in natural language processing have predominantly favored well-resourced English-centric models, resulting in a significant gap with low-resource languages. In this work, we introduce the language model TURNA, which is developed for the low-resource language Turkish and is capable of both natural language understanding and generation tasks. TURNA is pretrained with an encoder-decoder architecture based on the unified framework UL2 with a diverse corpus that we specifically curated for this purpose. We evaluated TURNA with three generation tasks and five understanding tasks for Turkish. The results show that TURNA outperforms several multilingual models in both understanding and generation tasks, and competes with monolingual Turkish models in understanding tasks. TURNA is made available at //huggingface.co/boun-tabi-LMG/TURNA .
In this work, we introduce a novel approach to programming education - in-IDE courses implemented for IntelliJ-based IDEs via the JetBrains Academy Plugin. The primary objective of this approach is to address the challenge of familiarizing students with industrial technologies by moving all theory and practical materials to a professional IDE. This approach allows students to immediately use modern industrial tools as they are fully integrated into the learning process. We have already applied this approach in over 40 courses, and it successfully educates students across diverse topics such as Plugin Development, Algorithms, Data Analysis, and Language mastery in various programming languages, including Kotlin, Java, C++, and Python. Along with the paper, we are providing the community not only with a new way of learning and a set of ready-made courses but also a collection of helpful resources to assist educators in getting started with the plugin. Finally, we describe in detail an IDE plugin development course that demonstrates how the in-IDE approach covers complex topics easily.
In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations. The escalating complexity, involving system prompts, model-specific formats, instructions, and more, calls for a shift to a structured, modular, and customizable solution. Addressing this need, we present Unitxt, an innovative library for customizable textual data preparation and evaluation tailored to generative language models. Unitxt natively integrates with common libraries like HuggingFace and LM-eval-harness and deconstructs processing flows into modular components, enabling easy customization and sharing between practitioners. These components encompass model-specific formats, task prompts, and many other comprehensive dataset processing definitions. The Unitxt-Catalog centralizes these components, fostering collaboration and exploration in modern textual data workflows. Beyond being a tool, Unitxt is a community-driven platform, empowering users to build, share, and advance their pipelines collaboratively. Join the Unitxt community at //github.com/IBM/unitxt!
This paper investigates the unsourced random access (URA) problem with a massive multiple-input multiple-output receiver that serves wireless devices in the near-field of radiation. We employ an uncoupled transmission protocol without appending redundancies to the slot-wise encoded messages. To exploit the channel sparsity for block length reduction while facing the collapsed sparse structure in the angular domain of near-field channels, we propose a sparse channel sampling method that divides the angle-distance (polar) domain based on the maximum permissible coherence. Decoding starts with retrieving active codewords and channels from each slot. We address the issue by leveraging the structured channel sparsity in the spatial and polar domains and propose a novel turbo-based recovery algorithm. Furthermore, we investigate an off-grid compressed sensing method to refine discretely estimated channel parameters over the continuum that improves the detection performance. Afterward, without the assistance of redundancies, we recouple the separated messages according to the similarity of the users' channel information and propose a modified K-medoids method to handle the constraints and collisions involved in channel clustering. Simulations reveal that via exploiting the channel sparsity, the proposed URA scheme achieves high spectral efficiency and surpasses existing multi-slot-based schemes. Moreover, with more measurements provided by the overcomplete channel sampling, the near-field-suited scheme outperforms its counterpart of the far-field.
In causal inference with panel data under staggered adoption, the goal is to estimate and derive confidence intervals for potential outcomes and treatment effects. We propose a computationally efficient procedure, involving only simple matrix algebra and singular value decomposition. We derive non-asymptotic bounds on the entrywise error, establishing its proximity to a suitably scaled Gaussian variable. Despite its simplicity, our procedure turns out to be instance-optimal, in that our theoretical scaling matches a local instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. Using our insights, we develop a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage guarantees. Our analysis is based on a general inferential toolbox for the SVD algorithm applied to the matrix denoising model, which might be of independent interest.
In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at //github.com/Regan-Zhang/PICI.
This paper studies transfer learning for estimating the mean of random functions based on discretely sampled data, where, in addition to observations from the target distribution, auxiliary samples from similar but distinct source distributions are available. The paper considers both common and independent designs and establishes the minimax rates of convergence for both designs. The results reveal an interesting phase transition phenomenon under the two designs and demonstrate the benefits of utilizing the source samples in the low sampling frequency regime. For practical applications, this paper proposes novel data-driven adaptive algorithms that attain the optimal rates of convergence within a logarithmic factor simultaneously over a large collection of parameter spaces. The theoretical findings are complemented by a simulation study that further supports the effectiveness of the proposed algorithms
With the advances of data-driven machine learning research, a wide variety of prediction problems have been tackled. It has become critical to explore how machine learning and specifically deep learning methods can be exploited to analyse healthcare data. A major limitation of existing methods has been the focus on grid-like data; however, the structure of physiological recordings are often irregular and unordered which makes it difficult to conceptualise them as a matrix. As such, graph neural networks have attracted significant attention by exploiting implicit information that resides in a biological system, with interactive nodes connected by edges whose weights can be either temporal associations or anatomical junctions. In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare. We provide an overview of these methods in a systematic manner, organized by their domain of application including functional connectivity, anatomical structure and electrical-based analysis. We also outline the limitations of existing techniques and discuss potential directions for future research.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.
This paper proposes a recommender system to alleviate the cold-start problem that can estimate user preferences based on only a small number of items. To identify a user's preference in the cold state, existing recommender systems, such as Netflix, initially provide items to a user; we call those items evidence candidates. Recommendations are then made based on the items selected by the user. Previous recommendation studies have two limitations: (1) the users who consumed a few items have poor recommendations and (2) inadequate evidence candidates are used to identify user preferences. We propose a meta-learning-based recommender system called MeLU to overcome these two limitations. From meta-learning, which can rapidly adopt new task with a few examples, MeLU can estimate new user's preferences with a few consumed items. In addition, we provide an evidence candidate selection strategy that determines distinguishing items for customized preference estimation. We validate MeLU with two benchmark datasets, and the proposed model reduces at least 5.92% mean absolute error than two comparative models on the datasets. We also conduct a user study experiment to verify the evidence selection strategy.