Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.
Piecewise Polynomials (PPs) are utilized in several engineering disciplines, like trajectory planning, to approximate position profiles given in the form of a set of points. While the approximation target along with domain-specific requirements, like Ck -continuity, can be formulated as a system of equations and a result can be computed directly, such closed-form solutions posses limited flexibility with respect to polynomial degrees, polynomial bases or adding further domain-specific requirements. Sufficiently complex optimization goals soon call for the use of numerical methods, like gradient descent. Since gradient descent lies at the heart of training Artificial Neural Networks (ANNs), modern Machine Learning (ML) frameworks like TensorFlow come with a set of gradient-based optimizers potentially suitable for a wide range of optimization problems beyond the training task for ANNs. Our approach is to utilize the versatility of PP models and combine it with the potential of modern ML optimizers for the use in function approximation in 1D trajectory planning in the context of electronic cam design. We utilize available optimizers of the ML framework TensorFlow directly, outside of the scope of ANNs, to optimize model parameters of our PP model. In this paper, we show how an orthogonal polynomial basis contributes to improving approximation and continuity optimization performance. Utilizing Chebyshev polynomials of the first kind, we develop a novel regularization approach enabling clearly improved convergence behavior. We show that, using this regularization approach, Chebyshev basis performs better than power basis for all relevant optimizers in the combined approximation and continuity optimization setting and demonstrate usability of the presented approach within the electronic cam domain.
Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on simplicial complexes named Binarized Simplicial Convolutional Neural Networks (Bi-SCNN) based on the combination of simplicial convolution with a binary-sign forward propagation strategy. The usage of the Hodge Laplacian on a binary-sign forward propagation enables Bi-SCNN to efficiently and effectively represent simplicial features that have higher-order structures than traditional graph node representations. Compared to the previous Simplicial Convolutional Neural Networks, the reduced model complexity of Bi-SCNN shortens the execution time without sacrificing the prediction performance and is less prone to the over-smoothing effect. Experimenting with real-world citation and ocean-drifter data confirmed that our proposed Bi-SCNN is efficient and accurate.
Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.
In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. However, timing constraints are important in many applications like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time temporal equilibrium logic, in which temporal operators are constrained by intervals over natural numbers. The resulting Metric Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. To this end, we define a translation of metric formulas into monadic first-order formulas and give a correspondence between their models in Metric Equilibrium Logic and Monadic Quantified Equilibrium Logic, respectively. Interestingly, our translation provides a blue print for implementation in terms of ASP modulo difference constraints.
Diabetic Retinopathy (DR), a prevalent complication in diabetes patients, can lead to vision impairment due to lesions formed on the retina. Detecting DR at an advanced stage often results in irreversible blindness. The traditional process of diagnosing DR through retina fundus images by ophthalmologists is not only time-intensive but also expensive. While classical transfer learning models have been widely adopted for computer-aided detection of DR, their high maintenance costs can hinder their detection efficiency. In contrast, Quantum Transfer Learning offers a more effective solution to this challenge. This approach is notably advantageous because it operates on heuristic principles, making it highly optimized for the task. Our proposed methodology leverages this hybrid quantum transfer learning technique to detect DR. To construct our model, we utilize the APTOS 2019 Blindness Detection dataset, available on Kaggle. We employ the ResNet-18, ResNet34, ResNet50, ResNet101, ResNet152 and Inception V3, pre-trained classical neural networks, for the initial feature extraction. For the classification stage, we use a Variational Quantum Classifier. Our hybrid quantum model has shown remarkable results, achieving an accuracy of 97% for ResNet-18. This demonstrates that quantum computing, when integrated with quantum machine learning, can perform tasks with a level of power and efficiency unattainable by classical computers alone. By harnessing these advanced technologies, we can significantly improve the detection and diagnosis of Diabetic Retinopathy, potentially saving many from the risk of blindness. Keywords: Diabetic Retinopathy, Quantum Transfer Learning, Deep Learning
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
Graph Neural Networks (GNNs) have been successfully used in many problems involving graph-structured data, achieving state-of-the-art performance. GNNs typically employ a message-passing scheme, in which every node aggregates information from its neighbors using a permutation-invariant aggregation function. Standard well-examined choices such as the mean or sum aggregation functions have limited capabilities, as they are not able to capture interactions among neighbors. In this work, we formalize these interactions using an information-theoretic framework that notably includes synergistic information. Driven by this definition, we introduce the Graph Ordering Attention (GOAT) layer, a novel GNN component that captures interactions between nodes in a neighborhood. This is achieved by learning local node orderings via an attention mechanism and processing the ordered representations using a recurrent neural network aggregator. This design allows us to make use of a permutation-sensitive aggregator while maintaining the permutation-equivariance of the proposed GOAT layer. The GOAT model demonstrates its increased performance in modeling graph metrics that capture complex information, such as the betweenness centrality and the effective size of a node. In practical use-cases, its superior modeling capability is confirmed through its success in several real-world node classification benchmarks.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data. Recently, there has been increased interest in designing GNN that scales to large graphs. Most existing methods use "graph sampling" or "layer-wise sampling" techniques to reduce training time. However, these methods still suffer from degrading performance and scalability problems when applying to graphs with billions of edges. This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes. Theoretical analysis shows that GBP is the first method that achieves sub-linear time complexity for both the precomputation and the training phases. An extensive empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time. Most notably, GBP can deliver superior performance on a graph with over 60 million nodes and 1.8 billion edges in less than half an hour on a single machine.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.