Semantic communication is an emerging research topic that has gained a wide range of attention recently. Despite this growing interest, there remains a notable absence of a comprehensive and widely-accepted framework for characterizing semantic communication. This paper introduces a new conceptualization of semantic communication and formulates two fundamental problems, which we term language exploitation and language design. Our contention is that the challenge of language design can be effectively situated within the broader framework of joint source-channel coding theory, underpinned by a comprehensive end-to-end distortion metric. To tackle the language exploitation problem, we put forth three approaches: semantic encoding, semantic decoding, and a synergistic combination of both in the form of combined semantic encoding and decoding. Furthermore, we establish the semantic distortion-cost region as a critical framework for assessing the language exploitation problem. For each of the three proposed approaches, the achievable distortion cost region is characterized. Overall, this paper aims to shed light on the intricate dynamics of semantic communication, paving the way for a deeper understanding of this evolving field.
Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model.
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at //github.com/ksandeshk/LENS.
Particle Swarm Optimization (PSO) has emerged as a powerful metaheuristic global optimization approach over the past three decades. Its appeal lies in its ability to tackle complex multidimensional problems that defy conventional algorithms. However, PSO faces challenges, such as premature stagnation in single-objective scenarios and the need to strike a balance between exploration and exploitation. Hybridizing PSO by integrating its cooperative nature with established optimization techniques from diverse paradigms offers a promising solution. In this paper, we investigate various strategies for synergizing gradient-based optimizers with PSO. We introduce different hybridization principles and explore several approaches, including sequential decoupled hybridization, coupled hybridization, and adaptive hybridization. These strategies aim to enhance the efficiency and effectiveness of PSO, ultimately improving its ability to navigate intricate optimization landscapes. By combining the strengths of gradient-based methods with the inherent social dynamics of PSO, we seek to address the critical objectives of intelligent exploration and exploitation in complex optimization tasks. Our study delves into the comparative merits of these hybridization techniques and offers insights into their application across different problem domains.
In the global context, while mixed reality has been an emerging concept for years, recent technological and scientific advancements have now made it poised to revolutionize industries and daily life by offering enhanced functionalities and improved services. Besides reviewing the highly cited papers in the last 20 years among over a thousand research papers on mixed reality, this systematic review provides the state-of-the-art applications and utilities of the mixed reality by primarily scrutinizing the associated papers in 2022 and 2023. Focusing on the potentials that this technology have in providing digitally supported simulations and other utilities in the era of large language models, highlighting the potential and limitations of the innovative solutions and also bringing focus to emerging research directions, such as telemedicine, remote control and optimization of direct volume rendering. The paper's associated repository is publicly accessible at //aizierjiang.github.io/mr.
Distinguishing the cause and effect from bivariate observational data is the foundational problem that finds applications in many scientific disciplines. One solution to this problem is assuming that cause and effect are generated from a structural causal model, enabling identification of the causal direction after estimating the model in each direction. The heteroscedastic noise model is a type of structural causal model where the cause can contribute to both the mean and variance of the noise. Current methods for estimating heteroscedastic noise models choose the Gaussian likelihood as the optimization objective which can be suboptimal and unstable when the data has a non-Gaussian distribution. To address this limitation, we propose a novel approach to estimating this model with Student's $t$-distribution, which is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape. This adaptability is beneficial for capturing the parameters of the noise distribution in heteroscedastic noise models. Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
A smooth T-surface can be thought of as a generalization of a surface of revolution in such a way that the axis of rotation is not fixed at one point but rather traces a smooth path on the base plane. Furthermore, the action, by which the aforementioned surface is obtained does not need to be merely rotation but any ``suitable" planar equiform transformation applied to the points of a certain smooth profile curve. In analogy to the smooth setting, if the axis footpoints sweep a polyline on the base plane and if the profile curve is discretely chosen then a T-hedra (discrete T-surface) with trapezoidal faces is obtained. The goal of this article is to reconstruct a T-hedron from an already given point cloud of a T-surface. In doing so, a kinematic approach is taken into account, where the algorithm at first tries to find the aforementioned axis direction associated with the point cloud. Then the algorithm finds a polygonal path through which the axis footpoint moves. Finally, by properly cutting the point cloud with the planes passing through the axis and its footpoints, it reconstructs the surface. The presented method is demonstrated on base of examples. From an applied point of view, the straightforwardness of the generation of these surfaces predestines them for building and design processes. In fact, one can find many built objects belonging to the sub-classes of T-surfaces such as \emph{surfaces of revolution} and \emph{moulding surfaces}. Furthermore, the planarity of the faces of the discrete version paves the way for steel/glass construction in industry. Finally, these surfaces are also suitable for transformable designs as they allow an isometric deformation.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review the different neural architectures in which attention has been incorporated, and also show how attention improves interpretability of neural models. Finally, we discuss some applications in which modeling attention has a significant impact. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.