We examine visual representations of data that make use of combinations of both 2D and 3D data mappings. Combining 2D and 3D representations is a common technique that allows viewers to understand multiple facets of the data with which they are interacting. While 3D representations focus on the spatial character of the data or the dedicated 3D data mapping, 2D representations often show abstract data properties and take advantage of the unique benefits of mapping to a plane. Many systems have used unique combinations of both types of data mappings effectively. Yet there are no systematic reviews of the methods in linking 2D and 3D representations. We systematically survey the relationships between 2D and 3D visual representations in major visualization publications -- IEEE VIS, IEEE TVCG, and EuroVis -- from 2012 to 2022. We closely examined 105 papers where 2D and 3D representations are connected visually, interactively, or through animation. These approaches are designed based on their visual environment, the relationships between their visual representations, and their possible layouts. Through our analysis, we introduce a design space as well as provide design guidelines for effectively linking 2D and 3D visual representations.
Arora & Ge introduced a noise-free polynomial system to compute the secret of a Learning With Errors (LWE) instance via linearization. Albrecht et al. later utilized the Arora-Ge polynomial model to study the complexity of Gr\"obner basis computations on LWE polynomial systems under the assumption of semi-regularity. In this paper we revisit the Arora-Ge polynomial and prove that it satisfies a genericity condition recently introduced by Caminata & Gorla, called being in generic coordinates. For polynomial systems in generic coordinates one can always estimate the complexity of DRL Gr\"obner basis computations in terms of the Castelnuovo-Mumford regularity and henceforth also via the Macaulay bound. Moreover, we generalize the Gr\"obner basis algorithm of Semaev & Tenti to arbitrary polynomial systems with a finite degree of regularity. In particular, existence of this algorithm yields another approach to estimate the complexity of DRL Gr\"obner basis computations in terms of the degree of regularity. In practice, the degree of regularity of LWE polynomial systems is not known, though one can always estimate the lowest achievable degree of regularity. Consequently, from a designer's worst case perspective this approach yields sub-exponential complexity estimates for general, binary secret and binary error LWE. In recent works by Dachman-Soled et al. the hardness of LWE in the presence of side information was analyzed. Utilizing their framework we discuss how hints can be incorporated into LWE polynomial systems and how they affect the complexity of Gr\"obner basis computations.
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
This paper proposes an innovative Attention-GAN framework for enhancing cybersecurity, focusing on anomaly detection. In response to the challenges posed by the constantly evolving nature of cyber threats, the proposed approach aims to generate diverse and realistic synthetic attack scenarios, thereby enriching the dataset and improving threat identification. Integrating attention mechanisms with Generative Adversarial Networks (GANs) is a key feature of the proposed method. The attention mechanism enhances the model's ability to focus on relevant features, essential for detecting subtle and complex attack patterns. In addition, GANs address the issue of data scarcity by generating additional varied attack data, encompassing known and emerging threats. This dual approach ensures that the system remains relevant and effective against the continuously evolving cyberattacks. The KDD Cup and CICIDS2017 datasets were used to validate this model, which exhibited significant improvements in anomaly detection. It achieved an accuracy of 99.69% on the KDD dataset and 97.93% on the CICIDS2017 dataset, with precision, recall, and F1-scores above 97%, demonstrating its effectiveness in recognizing complex attack patterns. This study contributes significantly to cybersecurity by providing a scalable and adaptable solution for anomaly detection in the face of sophisticated and dynamic cyber threats. The exploration of GANs for data augmentation highlights a promising direction for future research, particularly in situations where data limitations restrict the development of cybersecurity systems. The attention-GAN framework has emerged as a pioneering approach, setting a new benchmark for advanced cyber-defense strategies.
We examine the possibility of approximating Maximum Vertex-Disjoint Shortest Paths. In this problem, the input is an edge-weighted (directed or undirected) $n$-vertex graph $G$ along with $k$ terminal pairs $(s_1,t_1),(s_2,t_2),\ldots,(s_k,t_k)$. The task is to connect as many terminal pairs as possible by pairwise vertex-disjoint paths such that each path is a shortest path between the respective terminals. Our work is anchored in the recent breakthrough by Lochet [SODA '21], which demonstrates the polynomial-time solvability of the problem for a fixed value of $k$. Lochet's result implies the existence of a polynomial-time $ck$-approximation for Maximum Vertex-Disjoint Shortest Paths, where $c \leq 1$ is a constant. Our first result suggests that this approximation algorithm is, in a sense, the best we can hope for. More precisely, assuming the gap-ETH, we exclude the existence of an $o(k)$-approximations within $f(k) \cdot $poly($n$) time for any function $f$ that only depends on $k$. Our second result demonstrates the infeasibility of achieving an approximation ratio of $n^{\frac{1}{2}-\varepsilon}$ in polynomial time, unless P = NP. It is not difficult to show that a greedy algorithm selecting a path with the minimum number of arcs results in a $\lceil\sqrt{\ell}\rceil$-approximation, where $\ell$ is the number of edges in all the paths of an optimal solution. Since $\ell \leq n$, this underscores the tightness of the $n^{\frac{1}{2}-\varepsilon}$-inapproximability bound. Additionally, we establish that Maximum Vertex-Disjoint Shortest Paths is fixed-parameter tractable when parameterized by $\ell$ but does not admit a polynomial kernel. Our hardness results hold for undirected graphs with unit weights, while our positive results extend to scenarios where the input graph is directed and features arbitrary (non-negative) edge weights.
With the incorporation of the UNet architecture, diffusion probabilistic models have become a dominant force in image generation tasks. One key design in UNet is the skip connections between the encoder and decoder blocks. Although skip connections have been shown to improve training stability and model performance, we reveal that such shortcuts can be a limiting factor for the complexity of the transformation. As the sampling steps decrease, the generation process and the role of the UNet get closer to the push-forward transformations from Gaussian distribution to the target, posing a challenge for the network's complexity. To address this challenge, we propose Skip-Tuning, a simple yet surprisingly effective training-free tuning method on the skip connections. Our method can achieve 100% FID improvement for pretrained EDM on ImageNet 64 with only 19 NFEs (1.75), breaking the limit of ODE samplers regardless of sampling steps. Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1.58) with only 39 NFEs (1.57). Comprehensive exploratory experiments are conducted to shed light on the surprising effectiveness. We observe that while Skip-Tuning increases the score-matching losses in the pixel space, the losses in the feature space are reduced, particularly at intermediate noise levels, which coincide with the most effective range accounting for image quality improvement.
Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.
Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.