Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ design parameters. It is most common in practice to take $\rho=0$ (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with $\rho \in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided $0<\rho<1$: $\bullet$ Parameter estimates converge with probability one, and also in $L_p$ for any $p\ge 1$. $\bullet$ The MSE may converge very slowly for small $\rho$, of order $O(\alpha_n^2)$ even with averaging. $\bullet$ For linear stochastic approximation the source of slow convergence is identified: for any $\rho\in (0,1)$, averaging results in estimates for which the error $\textit{covariance}$ vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the $\textit{bias}$ converges to zero at rate $O(\alpha_n)$. This is the first paper to obtain such strong conclusions while allowing for $\rho \le 1/2$. A major conclusion is that the choice of $\rho =0$ or even $\rho<1/2$ is justified only in select settings -- In general, bias may preclude fast convergence.
Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.
Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose a FAir Classification approach with a Two-player game (FACT). We first formulate the training of a classifier with data augmentation as a fair optimization problem, which can be further written as an adversarial two-player game. Following this formulation, we propose a novel multiplicative weight optimization algorithm, for which we theoretically prove that it can converge to a solution that is fair over classes. Interestingly, our formulation also reveals that this fairness issue over classes is not due to data augmentation only, but is in fact a general phenomenon. Our empirical experiments demonstrate that the performance of our learned classifiers is indeed more fairly distributed over classes in five datasets, with only limited impact on the average accuracy.
In the dynamic field of Software Engineering (SE), where practice is constantly evolving and adapting to new technologies, conducting research is a daunting quest. This poses a challenge for researchers: how to stay relevant and effective in their studies? Empirical Software Engineering (ESE) has emerged as a contending force aiming to critically evaluate and provide knowledge that informs practice in adopting new technologies. Empirical research requires a rigorous process of collecting and analyzing data to obtain evidence-based findings. Challenges to this process are numerous, and many researchers, novice and experienced, found difficulties due to many complexities involved in designing their research. The core of this chapter is to teach foundational skills in research design, essential for educating software engineers and researchers in ESE. It focuses on developing a well-structured research design, which includes defining a clear area of investigation, formulating relevant research questions, and choosing appropriate methodologies. While the primary focus is on research design, this chapter also covers aspects of research scoping and selecting research methods. This approach prepares students to handle the complexities of the ever-changing technological landscape in SE, making it a critical component of their educational curriculum.
Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature representations. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos, where complex spatiotemporal relationships may be captured. Most of the existing fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complimentary nature of audio-visual (A-V) modalities. We introduce a cross-attentional fusion approach to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. Our new cross-attentional A-V fusion model efficiently leverages the inter-modal relationships. In particular, it computes cross-attention weights to focus on the more contributive features across individual modalities, and thereby combine contributive feature representations, which are then fed to fully connected layers for the prediction of valence and arousal. The effectiveness of the proposed approach is validated experimentally on videos from the RECOLA and Fatigue (private) data-sets. Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches. Code is available: \url{//github.com/praveena2j/Cross-Attentional-AV-Fusion}
Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.
Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed data in recommendation systems. Additionally, adding data from other domains can worsen the long-tail characteristics of the entire dataset, making it harder to train CDR models effectively. Recent studies have shown that hyperbolic methods are particularly suitable for modeling long-tail distributions, which has led us to explore hyperbolic representations for users and items in CDR scenarios. However, due to the distinct characteristics of the different domains, applying hyperbolic representation learning to CDR tasks is quite challenging. In this paper, we introduce a new framework called Hyperbolic Contrastive Learning (HCTS), designed to capture the unique features of each domain while enabling efficient knowledge transfer between domains. We achieve this by embedding users and items from each domain separately and mapping them onto distinct hyperbolic manifolds with adjustable curvatures for prediction. To improve the representations of users and items in the target domain, we develop a hyperbolic contrastive learning module for knowledge transfer. Extensive experiments on real-world datasets demonstrate that hyperbolic manifolds are a promising alternative to Euclidean space for CDR tasks.
Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however, call for a multi-modal approach and, therefore, for multi-tasking models. Multi-task learning (MTL) aims to leverage useful information across tasks to improve the generalization capability of a model. This thesis is concerned with multi-task learning in the context of computer vision. First, we review existing approaches for MTL. Next, we propose several methods that tackle important aspects of multi-task learning. The proposed methods are evaluated on various benchmarks. The results show several advances in the state-of-the-art of multi-task learning. Finally, we discuss several possibilities for future work.
Geometric deep learning (GDL), which is based on neural network architectures that incorporate and process symmetry information, has emerged as a recent paradigm in artificial intelligence. GDL bears particular promise in molecular modeling applications, in which various molecular representations with different symmetry properties and levels of abstraction exist. This review provides a structured and harmonized overview of molecular GDL, highlighting its applications in drug discovery, chemical synthesis prediction, and quantum chemistry. Emphasis is placed on the relevance of the learned molecular features and their complementarity to well-established molecular descriptors. This review provides an overview of current challenges and opportunities, and presents a forecast of the future of GDL for molecular sciences.
The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.