Crypto wallets are a key touch-point for cryptocurrency use. People use crypto wallets to make transactions, manage crypto assets, and interact with decentralized apps (dApps). However, as is often the case with emergent technologies, little attention has been paid to understanding and improving accessibility barriers in crypto wallet software. We present a series of user studies that explored how both blind and sighted individuals use MetaMask, one of the most popular non-custodial crypto wallets. We uncovered inter-related accessibility, learnability, and security issues with MetaMask. We also report on an iterative redesign of MetaMask to make it more accessible for blind users. This process involved multiple evaluations with 44 novice crypto wallet users, including 20 sighted users, 23 blind users, and one user with low vision. Our study results show notable improvements for accessibility after two rounds of design iterations. Based on the results, we discuss design implications for creating more accessible and secure crypto wallets for blind users.
While software engineers are optimistically adopting crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied by a rigorous understanding of crypto-detectors' effectiveness at finding crypto-API misuses in practice. This demo paper presents the technical details and usage scenarios of our tool, namely Mutation Analysis for evaluating Static Crypto-API misuse detectors (MASC). We developed $12$ generalizable, usage based mutation operators and three mutation scopes, namely Main Scope, Similarity Scope, and Exhaustive Scope, which can be used to expressively instantiate compilable variants of the crypto-API misuse cases. Using MASC, we evaluated nine major crypto-detectors, and discovered $19$ unique, undocumented flaws. We designed MASC to be configurable and user-friendly; a user can configure the parameters to change the nature of generated mutations. Furthermore, MASC comes with both Command Line Interface and Web-based front-end, making it practical for users of different levels of expertise.
In many applications, piecewise continuous functions are commonly interpolated over meshes. However, accurate high-order manipulations of such functions can be challenging due to potential spurious oscillations known as the Gibbs phenomena. To address this challenge, we propose a novel approach, Robust Discontinuity Indicators (RDI), which can efficiently and reliably detect both C^{0} and C^{1} discontinuities for node-based and cell-averaged values. We present a detailed analysis focusing on its derivation and the dual-thresholding strategy. A key advantage of RDI is its ability to handle potential inaccuracies associated with detecting discontinuities on non-uniform meshes, thanks to its innovative discontinuity indicators. We also extend the applicability of RDI to handle general surfaces with boundaries, features, and ridge points, thereby enhancing its versatility and usefulness in various scenarios. To demonstrate the robustness of RDI, we conduct a series of experiments on non-uniform meshes and general surfaces, and compare its performance with some alternative methods. By addressing the challenges posed by the Gibbs phenomena and providing reliable detection of discontinuities, RDI opens up possibilities for improved approximation and analysis of piecewise continuous functions, such as in data remap.
We prove that an $m$ out of $n$ bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that $m$ out of $n$ bootstrap works for continuous as well as for discrete and independent data; furthermore, simulations indicate that it also performs well for discrete and dependent data, and that it outperforms alternative estimation methods.
Higher-order functions and imperative references are language features supported by many mainstream languages. Their combination enables the ability to package references to code blocks with the captured state from their environment. Higher-order imperative programs are expressive and useful, but complicate formal specification and reasoning due to the use of yet-to-be-instantiated function parameters, especially when their invocations may mutate memory captured by or reachable from their arguments. Existing state-of-the-art works for verifying higher-order imperative behaviors are restricted in two ways: achieving strong theoretical results without automated implementations, or achieving automation with the help of strong assumptions from dedicated type systems (e.g. Rust). To enable an automated verification solution for imperative languages without the above restrictions, we introduce Higher-order Staged Separation Logic (HSSL), an extension of Hoare logic for call-by-value higher-order functions with ML-like local references. In this paper, we design a novel staged specification logic, prove its soundness, develop a new automated higher-order verifier, Heifer, for a core OCaml-like language, report on experimental results, and present various case studies investigating its capabilities.
Object detection is a fundamental task in computer vision and image processing. Current deep learning based object detectors have been highly successful with abundant labeled data. But in real life, it is not guaranteed that each object category has enough labeled samples for training. These large object detectors are easy to overfit when the training data is limited. Therefore, it is necessary to introduce few-shot learning and zero-shot learning into object detection, which can be named low-shot object detection together. Low-Shot Object Detection (LSOD) aims to detect objects from a few or even zero labeled data, which can be categorized into few-shot object detection (FSOD) and zero-shot object detection (ZSD), respectively. This paper conducts a comprehensive survey for deep learning based FSOD and ZSD. First, this survey classifies methods for FSOD and ZSD into different categories and discusses the pros and cons of them. Second, this survey reviews dataset settings and evaluation metrics for FSOD and ZSD, then analyzes the performance of different methods on these benchmarks. Finally, this survey discusses future challenges and promising directions for FSOD and ZSD.
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.