This paper discusses various types of constraints, difficulties and solutions to overcome the challenges regarding university course allocation problem. A hybrid evolutionary algorithm has been defined combining Local Repair Algorithm and Modified Genetic Algorithm to generate the best course assignment. After analyzing the collected dataset, all the necessary constraints were formulated. These constraints manage to cover the aspects needed to be kept in mind while preparing clash free and efficient class schedules for every faculty member. The goal is to generate an optimized solution which will fulfill those constraints while maintaining time efficiency and also reduce the workload of handling this task manually. The proposed algorithm was compared with some base level optimization algorithms to show the better efficiency in terms of accuracy and time.
Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.
This paper proposes Asynchronous Triggered Gradient Tracking, i.e., a distributed optimization algorithm to solve consensus optimization over networks with asynchronous communication. As a building block, we devise the continuous-time counterpart of the recently proposed (discrete-time) distributed gradient tracking called Continuous Gradient Tracking. By using a Lyapunov approach, we prove exponential stability of the equilibrium corresponding to agents' estimates being consensual to the optimal solution, with arbitrary initialization of the local estimates. Then, we propose two triggered versions of the algorithm. In the first one, the agents continuously integrate their local dynamics and exchange with neighbors their current local variables in a synchronous way. In Asynchronous Triggered Gradient Tracking, we propose a totally asynchronous scheme in which each agent sends to neighbors its current local variables based on a triggering condition that depends on a locally verifiable condition. The triggering protocol preserves the linear convergence of the algorithm and avoids the Zeno behavior, i.e., an infinite number of triggering events over a finite interval of time is excluded. By using the stability analysis of Continuous Gradient Tracking as a preparatory result, we show exponential stability of the equilibrium point holds for both triggered algorithms and any estimate initialization. Finally, the simulations validate the effectiveness of the proposed methods on a data analytics problem, showing also improved performance in terms of inter-agent communication.
Incidence estimation of HIV infection can be performed using recent infection testing algorithm (RITA) results from a cross-sectional sample. This allows practitioners to understand population trends in the HIV epidemic without having to perform longitudinal follow-up on a cohort of individuals. The utility of the approach is limited by its precision, driven by the (low) sensitivity of the RITA at identifying recent infection. By utilizing results of previous HIV tests that individuals may have taken, we consider an enhanced RITA with increased sensitivity (and specificity). We use it to propose an enhanced estimator for incidence estimation. We prove the theoretical properties of the enhanced estimator and illustrate its numerical performance in simulation studies. We apply the estimator to data from a cluster-randomized trial to study the effect of community-level HIV interventions on HIV incidence. We demonstrate that the enhanced estimator provides a more precise estimate of HIV incidence compared to the standard estimator.
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized.
While recent studies on semi-supervised learning have shown remarkable progress in leveraging both labeled and unlabeled data, most of them presume a basic setting of the model is randomly initialized. In this work, we consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm that can utilize both powerful pre-trained models from source domain as well as labeled/unlabeled data in the target domain. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples. Examples involved in the consistency regularization are adaptively selected according to their potential contributions to the target task. We conduct extensive experiments on several popular benchmarks including CUB-200-2011, MIT Indoor-67, MURA, by fine-tuning the ImageNet pre-trained ResNet-50 model. Results show that our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch. Moreover, our algorithm is orthogonal to existing methods and thus able to gain additional improvements on top of MixMatch and FixMatch. Our code is available at //github.com/SHI-Labs/Semi-Supervised-Transfer-Learning.
This paper surveys the machine learning literature and presents machine learning as optimization models. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. Particularly, mathematical optimization models are presented for commonly used machine learning approaches for regression, classification, clustering, and deep neural networks as well new emerging applications in machine teaching and empirical model learning. The strengths and the shortcomings of these models are discussed and potential research directions are highlighted.
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.