Python is known to be used by beginners to professional programmers. Python provides functionality to its community of users through PyPI libraries, which allows developers to reuse functionalities to an application. However, it is unknown the extent to which these PyPI libraries require proficient code in their implementation. We conjecture that PyPI contributors may decide to implement more advanced Pythonic code, or stick with more basic Python code. Are complex codes only committed by few contributors, or only to specific files? The new idea in this paper is to confirm who and where complex code is implemented. Hence, we present a visualization to show the relationship between proficient code, contributors, and files. Analyzing four PyPI projects, we are able to explore which files contain more elegant code, and which contributors committed to these files. Our results show that most files contain more basic competency files, and that not every contributor contributes competent code. We show how~our visualization is able to summarize such information, and opens up different possibilities for understanding how to make elegant contributions.
This paper presents the Functional Machine Calculus (FMC) as a simple model of higher-order computation with "reader/writer" effects: higher-order mutable store, input/output, and probabilistic and non-deterministic computation. The FMC derives from the lambda-calculus by taking the standard operational perspective of a call-by-name stack machine as primary, and introducing two natural generalizations. One, "locations", introduces multiple stacks, which each may represent an effect and so enable effect operators to be encoded into the abstraction and application constructs of the calculus. The second, "sequencing", is known from kappa-calculus and concatenative programming languages, and introduces the imperative notions of "skip" and "sequence". This enables the encoding of reduction strategies, including call-by-value lambda-calculus and monadic constructs. The encoding of effects into generalized abstraction and application means that standard results from the lambda-calculus may carry over to effects. The main result is confluence, which is possible because encoded effects reduce algebraically rather than operationally. Reduction generates the familiar algebraic laws for state, and unlike in the monadic setting, reader/writer effects combine seamlessly. A system of simple types confers termination of the machine.
An assisted living facility (ALF) is a place where someone can live, have access to social supports such as transportation, and receive assistance with the activities of daily living such as toileting and dressing. Despite the important role of ALFs, they are not required to be certified with Medicare and there is no public national database of these facilities. We present the first public dataset of ALFs in the United States, covering all 50 states and DC with 44,638 facilities and over 1.2 million beds. This dataset can help provide answers to existing public health questions as well as help those in need find a facility. The dataset was validated by replicating the results of a nationwide study of ALFs that uses closed data [4], where the prevalence of ALFs is assessed with respect to county-level socioeconomic variables related to health disparity such as race, disability, and income. To showcase the value of this dataset, we also propose a novel metric to assess access to community-based care. We calculate the average distance an individual in need must travel in order to reach an ALF. The dataset and all relevant code are available at github.com/antonstengel/assisted-living-data.
This paper provided empirical knowledge of the user experience for using collaborative visualization in a distributed asymmetrical setting through controlled user studies. With the ability to access various computing devices, such as Virtual Reality (VR) head-mounted displays, scenarios emerge when collaborators have to or prefer to use different computing environments in different places. However, we still lack an understanding of using VR in an asymmetric setting for collaborative visualization. To get an initial understanding and better inform the designs for asymmetric systems, we first conducted a formative study with 12 pairs of participants. All participants collaborated in asymmetric (PC-VR) and symmetric settings (PC-PC and VR-VR). We then improved our asymmetric design based on the key findings and observations from the first study. Another ten pairs of participants collaborated with enhanced PC-VR and PC-PC conditions in a follow-up study. We found that a well-designed asymmetric collaboration system could be as effective as a symmetric system. Surprisingly, participants using PC perceived less mental demand and effort in the asymmetric setting (PC-VR) compared to the symmetric setting (PC-PC). We provided fine-grained discussions about the trade-offs between different collaboration settings.
Artificial intelligence is finding its way into medical imaging, usually focusing on image reconstruction or enhancing analytical reconstructed images. However, optimizations along the complete processing chain, from detecting signals to computing data, enable significant improvements. Thus, we present an approach toward detector optimization using boosted learning by exploiting the concept of residual physics. In our work, we improve the coincidence time resolution (CTR) of positron emission tomography (PET) detectors. PET enables imaging of metabolic processes by detecting {\gamma}-photons with scintillation detectors. Current research exploits light-sharing detectors, where the scintillation light is distributed over and digitized by an array of readout channels. While these detectors demonstrate excellent performance parameters, e.g., regarding spatial resolution, extracting precise timing information for time-of-flight (TOF) becomes more challenging due to deteriorating effects called time skews. Conventional correction methods mainly rely on analytical formulations, theoretically capable of covering all time skew effects, e.g., caused by signal runtimes or physical effects. However, additional effects are involved for light-sharing detectors, so finding suitable analytical formulations can become arbitrarily complicated. The residual physics-based strategy uses gradient tree boosting (GTB) and a physics-informed data generation mimicking an actual imaging process by shifting a radiation source. We used clinically relevant detectors with a height of 19 mm, coupled to digital photosensor arrays. All trained models improved the CTR significantly. Using the best model, we achieved CTRs down to 198 ps (185 ps) for energies ranging from 300 keV to 700 keV (450 keV to 550 keV).
Factorial analyses offer a powerful nonparametric means to detect main or interaction effects among multiple treatments. For survival outcomes, e.g. from clinical trials, such techniques can be adopted for comparing reasonable quantifications of treatment effects. The key difficulty to solve in survival analysis concerns the proper handling of censoring. So far, all existing factorial analyses for survival data were developed under the independent censoring assumption, which is too strong for many applications. As a solution, the central aim of this article is to develop new methods in factorial survival analyses under quite general dependent censoring regimes. This will be accomplished by combining existing results for factorial survival analyses with techniques developed for survival copula models. As a result, we will present an appealing F-test that exhibits sound performance in our simulation study. The new methods are illustrated in real data analysis. We implement the proposed method in an R function surv.factorial(.) in the R package compound.Cox.
In this paper, we study two problems: determining action model equivalence and minimizing the event space of an action model under certain structural relationships. The Kripke model equivalence is perfectly caught by the structural relationship called bisimulation. In this paper, we propose the generalized action emulation perfectly catching the action model equivalence. Previous structural relationships sufficient for the action model equivalence, i.e. the bisimulation, the propositional action emulation, the action emulation, and the action emulation of canonical action models, can be described by various restricted versions of the generalized action emulation. We summarize four critical properties of the atom set over preconditions, and prove that any formula set satisfying these properties can be used to restrict the generalized action emulation to determine the action model equivalence by an iteration algorithm. We also construct a new formula set with these four properties, which is generally more efficient than the atom set. The technique of the partition refinement has been used to minimize the world space of a Kripke model under the bisimulation. Applying the partition refinement to action models allows one to minimize their event spaces under the bisimulation. The propositional action emulation is weaker than bismulation but still sufficient for the action model equivalence. We prove that it is PSPACE-complete to minimize the event space of an action model under the propositional action emulation, and provide a PSPACE algorithm for it. Finally, we prove that minimize the event space under the action model equivalence is PSPACE-hard, and propose a computable method based on the canonical formulas of modal logics to solve this problem.
Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights.
Aggregate measures of family planning are used to monitor demand for and usage of contraceptive methods in populations globally, for example as part of the FP2030 initiative. Family planning measures for low- and middle-income countries are typically based on data collected through cross-sectional household surveys. Recently proposed measures account for sexual activity through assessment of the distribution of time-between-sex (TBS) in the population of interest. In this paper, we propose a statistical approach to estimate the distribution of TBS using data typically available in low- and middle-income countries, while addressing two major challenges. The first challenge is that timing of sex information is typically limited to women's time-since-last-sex (TSLS) data collected in the cross-sectional survey. In our proposed approach, we adopt the current duration method to estimate the distribution of TBS using the available TSLS data, from which the frequency of sex at the population level can be derived. Furthermore, the observed TSLS data are subject to reporting issues because they can be reported in different units and may be rounded off. To apply the current duration approach and account for these data reporting issues, we develop a flexible Bayesian model, and provide a detailed technical description of the proposed modeling approach.
Satellite imagery is gaining popularity as a valuable tool to lower the impact on natural resources and increase profits for farmers. The purpose of this study is twofold: to mine the scientific literature for revealing the structure of this research domain and to investigate to what extent scientific results are able to reach a wider public. To fulfill these, respectively, a Web of Science and a Twitter dataset were retrieved and analysed. Regarding academic literature, different performances of the various countries were observed: the USA and China resulted as the leading actors, both in terms of published papers and employed researchers. Among the categorised keywords, "resolution", "Landsat", "yield", "wheat" and "multispectral" are the most used. Then, analysing the semantic network of the words used in the various abstracts, the different facets of the research in satellite remote sensing were detected. It emerged the importance of retrieving meteorological parameters through remote sensing and the broad use of vegetation indexes. As emerging topics, classification tasks for land use assessment and crop recognition stand out, together with the use of hyperspectral sensors. Regarding the interaction of academia with the public, the analysis showed that it is practically absent on Twitter: most of the activity therein is due to private companies advertising their business. Therefore, there is still a communication gap between academia and actors from other societal sectors.
Most of the internet today is composed of digital media that includes videos and images. With pixels becoming the currency in which most transactions happen on the internet, it is becoming increasingly important to have a way of browsing through this ocean of information with relative ease. YouTube has 400 hours of video uploaded every minute and many million images are browsed on Instagram, Facebook, etc. Inspired by recent advances in the field of deep learning and success that it has gained on various problems like image captioning and, machine translation , word2vec , skip thoughts, etc, we present DeepSeek a natural language processing based deep learning model that allows users to enter a description of the kind of images that they want to search, and in response the system retrieves all the images that semantically and contextually relate to the query. Two approaches are described in the following sections.