Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$\times$800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods. More demos and code are available at //guanjunwu.github.io/4dgs/.
The transition to 4th generation district heating creates a growing need for scalable, automated design tools that accurately capture the spatial and temporal details of heating network operation. This paper presents an automated design approach for the optimal design of district heating networks that combines scalable density-based topology optimization with a multi-period approach. In this way, temporal variations in demand, supply, and heat losses can be taken into account while optimizing the network design based on a nonlinear physics model. The transition of the automated design approach from worst-case to multi-period shows a design progression from separate branched networks to a single integrated meshed network topology connecting all producers. These integrated topologies emerge without imposing such structures a priori. They increase network connectivity, and allow for more flexible shifting of heat loads between different producers and heat consumers, resulting in more cost-effective use of heat. In a case study, this integrated design resulted in an increase in waste heat share of 42.8 % and a subsequent reduction in project cost of 17.9 %. We show how producer unavailability can be accounted for in the automated design at the cost of a 3.1 % increase in the cost of backup capacity. The resulting optimized network designs of this approach connect multiple low temperature heat sources in a single integrated network achieving high waste heat utilization and redundancy, highlighting the applicability of the approach to next-generation district heating networks.
The loss function plays an important role in optimizing the performance of a learning system. A crucial aspect of the loss function is the assignment of sample weights within a mini-batch during loss computation. In the context of continual learning (CL), most existing strategies uniformly treat samples when calculating the loss value, thereby assigning equal weights to each sample. While this approach can be effective in certain standard benchmarks, its optimal effectiveness, particularly in more complex scenarios, remains underexplored. This is particularly pertinent in training "in the wild," such as with self-training, where labeling is automated using a reference model. This paper introduces the Online Meta-learning for Sample Importance (OMSI) strategy that approximates sample weights for a mini-batch in an online CL stream using an inner- and meta-update mechanism. This is done by first estimating sample weight parameters for each sample in the mini-batch, then, updating the model with the adapted sample weights. We evaluate OMSI in two distinct experimental settings. First, we show that OMSI enhances both learning and retained accuracy in a controlled noisy-labeled data stream. Then, we test the strategy in three standard benchmarks and compare it with other popular replay-based strategies. This research aims to foster the ongoing exploration in the area of self-adaptive CL.
Refreshable tactile displays (RTDs) are predicted to soon become a viable option for the provision of accessible graphics for people who are blind or have low vision (BLV). This new technology for the tactile display of braille and graphics, usually using raised pins, makes it easier to generate and access a large number of graphics. However, it differs from existing tactile graphics in terms of scale, height and fidelity. Here, we share the perspectives of four key stakeholders -- blind touch readers, vision specialist teachers, accessible format producers and assistive technology providers -- to explore the potential uses, advantages and needs relating to the introduction of RTDs. We also provide advice on what role the data visualisation community can take to help ensure that people who are BLV are best able to benefit from the introduction of affordable RTDs.
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates $\times 19.4$ and $\times 52.9$ compression ratios on pre-trained VGG dense and pruned models, respectively, with $<0.4\%$ accuracy degradation. Code is available in \url{//github.com/ehaleva/RIQ}.
Numerical computation is essential to many areas of artificial intelligence (AI), whose computing demands continue to grow dramatically, yet their continued scaling is jeopardized by the slowdown in Moore's law. Multi-function multi-way analog (MFMWA) technology, a computing architecture comprising arrays of memristors supporting in-memory computation of matrix operations, can offer tremendous improvements in computation and energy, but at the expense of inherent unpredictability and noise. We devise novel randomized algorithms tailored to MFMWA architectures that mitigate the detrimental impact of imperfect analog computations while realizing their potential benefits across various areas of AI, such as applications in computer vision. Through analysis, measurements from analog devices, and simulations of larger systems, we demonstrate orders of magnitude reduction in both computation and energy with accuracy similar to digital computers.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.