Dynamic Movement Primitives (DMP) have found remarkable applicability and success in various robotic tasks, which can be mainly attributed to their generalization, modulation and robustness properties. Nevertheless, the spatial generalization of DMP can be problematic in some cases, leading to excessive or unnatural spatial scaling. Moreover, incorporating intermediate points (via-points) to adjust the DMP trajectory, is not adequately addressed. In this work we propose an improved online spatial generalization, that remedies the shortcomings of the classical DMP generalization, and moreover allows the incorporation of dynamic via-points. This is achieved by designing an online adaptation scheme for the DMP weights which is proved to minimize the distance from the demonstrated acceleration profile to retain the shape of the demonstration, subject to dynamic via-point and initial/final state constraints. Extensive comparative simulations with the classical and other DMP variants are conducted, while experimental results validate the applicability and efficacy of the proposed method.
Angiography is widely used to detect, diagnose, and treat cerebrovascular diseases. While numerous techniques have been proposed to segment the vascular network from different imaging modalities, deep learning (DL) has emerged as a promising approach. However, existing DL methods often depend on proprietary datasets and extensive manual annotation. Moreover, the availability of pre-trained networks specifically for medical domains and 3D volumes is limited. To overcome these challenges, we propose a few-shot learning approach called VesselShot for cerebrovascular segmentation. VesselShot leverages knowledge from a few annotated support images and mitigates the scarcity of labeled data and the need for extensive annotation in cerebral blood vessel segmentation. We evaluated the performance of VesselShot using the publicly available TubeTK dataset for the segmentation task, achieving a mean Dice coefficient (DC) of 0.62(0.03).
Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contain noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech. Meanwhile, it was shown that self-supervised pre-trained models exhibit excellent noise robustness on many speech tasks, implying that the learned representation has a better tolerance for noise perturbations. In this work, we therefore explore pre-trained models to improve the noise robustness of TTS models. Based on HIFI-GAN we first propose a representation-to-waveform vocoder, which aims to learn to map the representation of pre-trained models to the waveform. We then propose a text-to-representation Fastspeech2 model, which aims to learn to map text to pre-trained model representations. Experimental results on the LJSpeech and LibriTTS datasets show that our method outperforms those using speech enhancement methods in both subjective and objective metrics. Audio samples are available at: //zqs01.github.io/rep2wav/.
An important challenge in Machine Learning compilers like XLA is multi-pass optimization and analysis. There has been recent interest chiefly in XLA target-dependent optimization on the graph-level, subgraph-level, and kernel-level phases. We specifically focus on target-independent optimization XLA HLO pass ordering: our approach aims at finding the optimal sequence of compiler optimization passes, which is decoupled from target-dependent optimization. However, there is little domain specific study in pass ordering for XLA HLO. To this end, we propose introducing deep Reinforcement Learning (RL) based search for optimal XLA HLO pass ordering. We also propose enhancements to the deep RL algorithms to further improve optimal search performance and open the research direction for domain-specific guidance for RL. We create an XLA Gym experimentation framework as a tool to enable RL algorithms to interact with the compiler for passing optimizations and thereby train agents. Overall, in our experimentation we observe an average of $13.3\%$ improvement in operation count reduction on a benchmark of GPT-2 training graphs and $10.4\%$ improvement on a diverse benchmark including GPT-2, BERT, and ResNet graphs using the proposed approach over the compiler's default phase ordering.
Recently, instruction-following Large Language Models (LLMs) , represented by ChatGPT, have exhibited exceptional performance in general Natural Language Processing (NLP) tasks. However, the unique characteristics of E-commerce data pose significant challenges to general LLMs. An LLM tailored specifically for E-commerce scenarios, possessing robust cross-dataset/task generalization capabilities, is a pressing necessity. To solve this issue, in this work, we proposed the first e-commerce instruction dataset EcomInstruct, with a total of 2.5 million instruction data. EcomInstruct scales up the data size and task diversity by constructing atomic tasks with E-commerce basic data types, such as product information, user reviews. Atomic tasks are defined as intermediate tasks implicitly involved in solving a final task, which we also call Chain-of-Task tasks. We developed EcomGPT with different parameter scales by training the backbone model BLOOMZ with the EcomInstruct. Benefiting from the fundamental semantic understanding capabilities acquired from the Chain-of-Task tasks, EcomGPT exhibits excellent zero-shot generalization capabilities. Extensive experiments and human evaluations demonstrate that EcomGPT outperforms ChatGPT in term of cross-dataset/task generalization on E-commerce tasks.
Machine learning (ML) has made BigCloneBench popular for semantic clone detection tools. However, BigCloneBench only has a few Java semantic clones. In addition, due to the design principles of how the benchmark was created, imbalance issues have been identified, including the ambiguity in the definition of semantic clones. Thus, ML-based clone detection algorithms trained on BigCloneBench may overlook semantic clones or report incorrect results. The SemanticCloneBench features Stack Overflow clones of several languages. However, it lacks samples for ML-based clone detection. There is also a marked lack of cross-language clone benchmarks. The widely used CLCDSA dataset lacks reusable examples that can't be used in real-world software systems, making it inadequate for ML-based clone detection. The OpenAI GPT-3 model has shown outstanding text production, including code generation and summarization. In this paper, we used the GPT-3 model to generate a complete benchmark for both semantic and cross-language clones. Using SemanticCloneBench's genuine language clones, we tested several prompts to see which yielded better results using GPT-3 question formulation. Then, we used NiCad to filter Type-1 and Type-2 clones from GPT-3 output. We used a GUI-assisted Clone Validator tool to manually validate all clone pairings with nine judges. Functionality testing and CloneCognition verified our benchmark has no syntactic clones. Later, we validated SourcererCC, Oreo and CLCDSA tools on our benchmark. The poor performance of these tools suggests GPTCloneBench has no syntactic clone. From 77,207 Clone pairs of SemanticCloneBench/GPT-3 output, we created a benchmark with 37,149 genuine semantic clone pairs, 19,288 false semantic pairs, and 20,770 cross-language clones across four languages (Java, C, C#, and Python).
Modeling attacks, in which an adversary uses machine learning techniques to model a hardware-based Physically Unclonable Function (PUF) pose a great threat to the viability of these hardware security primitives. In most modeling attacks, a random subset of challenge-response-pairs (CRPs) are used as the labeled data for the machine learning algorithm. Here, for the arbiter-PUF, a delay based PUF which may be viewed as a linear threshold function with random weights (due to manufacturing imperfections), we investigate the role of active learning in Support Vector Machine (SVM) learning. We focus on challenge selection to help SVM algorithm learn ``fast'' and learn ``slow''. Our methods construct challenges rather than relying on a sample pool of challenges as in prior work. Using active learning to learn ``fast'' (less CRPs revealed, higher accuracies) may help manufacturers learn the manufactured PUFs more efficiently, or may form a more powerful attack when the attacker may query the PUF for CRPs at will. Using active learning to select challenges from which learning is ``slow'' (low accuracy despite a large number of revealed CRPs) may provide a basis for slowing down attackers who are limited to overhearing CRPs.
In recent years the use of FPGAs to accelerate scientific applications has grown, with numerous applications demonstrating the benefit of FPGAs for high performance workloads. However, whilst High Level Synthesis (HLS) has significantly lowered the barrier to entry in programming FPGAs by enabling programmers to use C++, a major challenge is that most often these codes are not originally written in C++. Instead, Fortran is the lingua franca of scientific computing and-so it requires a complex and time consuming initial step to convert into C++ even before considering the FPGA. In this paper we describe work enabling Fortran for AMD Xilinx FPGAs by connecting the LLVM Flang front end to AMD Xilinx's LLVM back end. This enables programmers to use Fortran as a first-class language for programming FPGAs, and as we demonstrate enjoy all the tuning and optimisation opportunities that HLS C++ provides. Furthermore, we demonstrate that certain language features of Fortran make it especially beneficial for programming FPGAs compared to C++. The result of this work is a lowering of the barrier to entry in using FPGAs for scientific computing, enabling programmers to leverage their existing codebase and language of choice on the FPGA directly.
We present a technical enhancement within the p4est software for parallel adaptive mesh refinement. In p4est primitives are stored as octants in three and quadrants in two dimensions. While, classically, they are encoded by the native approach using its spatial and refinement level, any other mathematically equivalent encoding might be used instead. Recognizing this, we add two alternative representations to the classical, explicit version, based on a long monotonic index and 128-bit AVX quad integers, respectively. The first one requires changes in logic for low-level quadrant manipulating algorithms, while the other exploits data level parallelism and requires algorithms to be adapted to SIMD instructions. The resultant algorithms and data structures lead to higher performance and lesser memory usage in comparison with the standard baseline. We benchmark selected algorithms on a cluster with two Intel(R) Xeon(R) Gold 6130 Skylake family CPUs per node, which provides support for AVX2 extensions, 192 GB RAM per node, and up to 512 computational cores in total.
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.
Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept. In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.