We propose a secure compilation chain for statically verified partial programs with input-output (IO). The source language is an F* subset in which a verified IO-performing program interacts with its IO-performing context via a higher-order interface that includes refinement types as well as pre- and post-conditions about past IO events. The target language is a smaller F* subset in which the compiled program is linked with an adversarial context via an interface without refinement types or pre- and post-conditions. To bridge this interface gap and make compilation and linking secure we propose a novel combination of higher-order contracts and reference monitoring for recording and controlling IO operations. During compilation we use contracts to convert the logical assumptions the program makes about the context into dynamic checks on each context-program boundary crossing. These boundary checks can depend on information about past IO events stored in the monitor's state, yet these checks cannot stop the adversarial target context before it performs dangerous IO operations. So, additionally, our linking forces the context to perform all IO via a secure IO library that uses reference monitoring to dynamically enforce an access control policy before each IO operation. We propose a novel way to model in F* that the context cannot directly access the IO operations and the monitor's internal state, based on F*'s recent support for flag-based effect polymorphism. We prove in F* that enforcing the access control policy on the context in combination with static verification of the program soundly enforces a global trace property. Moreover, we prove in F* that our secure compilation chain satisfies by construction Robust Relational Hyperproperty Preservation, a very strong secure compilation criterion. Finally, we illustrate our secure compilation chain at work on a simple web server example.
In device-to-device (D2D) coded caching problems, it is possible that not all users will make file requests in the delivery phase. Hence, we propose a new D2D centralized coded caching problem, named the 3-user D2D coded caching with two random requesters and one sender (2RR1S), where in the delivery phase, any two of the three users will make file requests, and the user that does not make any file request is the designated sender. We find the optimal caching and delivery scheme, denoted as the 2RRIS scheme, for any number of files N by proving matching converse and achievability results. It is shown that coded cache placement is needed to achieve the optimal performance. Furthermore, the optimal rate-memory tradeoff has a uniform expression for N>=4 and different expressions for N=2 and 3. To examine the usefulness of the proposed model and scheme, we adapt the 2RR1S scheme to two scenarios. The first one is the 3-user D2D coded caching model proposed by Ji et al. By characterizing the optimal rate-memory tradeoff for the 3-user D2D coded caching when N=2, which was previously unknown, we show that the adapted 2RR1S scheme is in fact optimal for the 3-user D2D coded caching problem when N=2 and the cache size is medium. The benefit comes from coded cache placement which is missing from existing D2D coded caching schemes. The second scenario is where in the delivery phase, each user makes a file request randomly and independently with the same probability p. We call this model the request-random D2D coded caching problem. Adapting the 2RR1S scheme to this scenario, we show the superiority of our adapted scheme over other existing D2D coded caching schemes for medium to large cache size.
Almost 50 years after the invention of SQL, injection attacks are still top-tier vulnerabilities of today's ICT systems. Consequently, SQLi detection is still an active area of research, where the most recent works incorporate machine learning techniques into the proposed solutions. In this work, we highlight the shortcomings of the previous ML-based results focusing on four aspects: the evaluation methods, the optimization of the model parameters, the distribution of utilized datasets, and the feature selection. Since no single work explored all of these aspects satisfactorily, we fill this gap and provide an in-depth and comprehensive empirical analysis. Moreover, we cross-validate the trained models by using data from other distributions. This aspect of ML models (trained for SQLi detection) was never studied. Yet, the sensitivity of the model's performance to this is crucial for any real-life deployment. Finally, we validate our findings on a real-world industrial SQLi dataset.
Background: Several researchers report the impact of gender on software development teams, especially in relation to women. In general, women are under-represented on these teams and face challenges and difficulties in their workplaces. When it comes to women who are mothers, these challenges can be amplified and directly impact these women's professional lives, both in industry and academia. However, little is known about women ICT practitioners' perceptions of the challenges of maternity in their professional careers. Objective: This paper investigates mothers' challenges and difficulties in global software development teams. Method: We conducted a survey with women ICT practitioners who work in academia and global technology companies. We surveyed 141 mothers from different countries and employed mixed methods to analyze the data. Results: Our findings reveal that women face sociocultural challenges, including work-life balance issues, bad jokes, and moral harassment. Furthermore, few women occupy leadership positions in software teams, and most reported that they did not have a support network during and after maternity leave, feeling overloaded. The surveyed women suggested a set of actions to reduce the challenges they face in their workplaces, such as: i) changing culture; ii) creating a code of conduct for men; iii) more empathy; iv) creating childcare within companies; and v) creating opportunities/programs for women in the software industry and academia. Conclusion: Adding to the underrepresentation of ICT roles, women also face many challenges in one important phase of women's lives, maternity. Our findings explore these challenges and can help organizations in developing policies to minimize them. Furthermore, it can help raise awareness of co-workers and bosses, toward a more friendly and inclusive workplace.
Experiments on physical continuum robot are the gold standard for evaluations. Currently, as no commercial continuum robot platform is available, a large variety of early-stage prototypes exists. These prototypes are developed by individual research groups and are often used for a single publication. Thus, a significant amount of time is devoted to creating proprietary hardware and software hindering the development of a common platform, and shifting away scarce time and efforts from the main research challenges. We address this problem by proposing an open-source actuation module, which can be used to build different types of continuum robots. It consists of a high-torque brushless electric motor, a high resolution optical encoder, and a low-gear-ratio transmission. For this letter, we create three different types of continuum robots. In addition, we illustrate, for the first time, that continuum robots built with our actuation module can proprioceptively detect external forces. Consequently, our approach opens untapped and under-investigated research directions related to the dynamics and advanced control of continuum robots, where sensing the generalized flow and effort is mandatory. Besides that, we democratize continuum robots research by providing open-source software and hardware with our initiative called the Open Continuum Robotics Project, to increase the accessibility and reproducibility of advanced methods.
Let $\mathcal{T}$ be a set of $n$ flat (planar) semi-algebraic regions in $\mathbb{R}^3$ of constant complexity (e.g., triangles, disks), which we call plates. We wish to preprocess $\mathcal{T}$ into a data structure so that for a query object $\gamma$, which is also a plate, we can quickly answer various intersection queries, such as detecting whether $\gamma$ intersects any plate of $\mathcal{T}$, reporting all the plates intersected by $\gamma$, or counting them. We also consider two simpler cases of this general setting: (i) the input objects are plates and the query objects are constant-degree parametrized algebraic arcs in $\mathbb{R}^3$ (arcs, for short), or (ii) the input objects are arcs and the query objects are plates in $\mathbb{R}^3$. Besides being interesting in their own right, the data structures for these two special cases form the building blocks for handling the general case. By combining the polynomial-partitioning technique with additional tools from real algebraic geometry, we present many different data structures for intersection queries, which also provide trade-offs between their size and query time. The performance of these data structures depends on the complexity of the input and query objects. For example, if $\mathcal{T}$ is a set of plates and the query objects are algebraic arcs, we obtain a data structure that uses $O^*(n^{4/3})$ storage (where the $O^*(\cdot)$ notation hides subpolynomial factors) and answers an arc-intersection query in $O^*(n^{2/3})$ time. Alternatively, for a parameter $s\in [n^{4/3}, n^t]$ where $t\ge 3$ is the number of real parameters needed to specify a query arc, the query time can be decreased to $O^*((n^t/s)^{\tfrac{2}{3(t-1)}})$ by increasing the storage to $O^*(s)$.
Context: AI-assisted code generation tools have become increasingly prevalent in software engineering, offering the ability to generate code from natural language prompts or partial code inputs. Notable examples of these tools include GitHub Copilot, Amazon CodeWhisperer, and OpenAI's ChatGPT. Objective: This study aims to compare the performance of these prominent code generation tools in terms of code quality metrics, such as Code Validity, Code Correctness, Code Security, Code Reliability, and Code Maintainability, to identify their strengths and shortcomings. Method: We assess the code generation capabilities of GitHub Copilot, Amazon CodeWhisperer, and ChatGPT using the benchmark HumanEval Dataset. The generated code is then evaluated based on the proposed code quality metrics. Results: Our analysis reveals that the latest versions of ChatGPT, GitHub Copilot, and Amazon CodeWhisperer generate correct code 65.2%, 46.3%, and 31.1% of the time, respectively. In comparison, the newer versions of GitHub CoPilot and Amazon CodeWhisperer showed improvement rates of 18% for GitHub Copilot and 7% for Amazon CodeWhisperer. The average technical debt, considering code smells, was found to be 8.9 minutes for ChatGPT, 9.1 minutes for GitHub Copilot, and 5.6 minutes for Amazon CodeWhisperer. Conclusions: This study highlights the strengths and weaknesses of some of the most popular code generation tools, providing valuable insights for practitioners. By comparing these generators, our results may assist practitioners in selecting the optimal tool for specific tasks, enhancing their decision-making process.
Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: //github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.
We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.
Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.