Despite the massive popularity of the Asian Handicap (AH) football (soccer) betting market, its efficiency has not been adequately studied by the relevant literature. This paper combines rating systems with Bayesian networks and presents the first published model specifically developed for prediction and assessment of the efficiency of the AH betting market. The results are based on 13 English Premier League seasons and are compared to the traditional market, where the bets are for win, lose or draw. Different betting situations have been examined including a) both average and maximum (best available) market odds, b) all possible betting decision thresholds between predicted and published odds, c) optimisations for both return-on-investment and profit, and d) simple stake adjustments to investigate how the variance of returns changes when targeting equivalent profit in both traditional and AH markets. While the AH market is found to share the inefficiencies of the traditional market, the findings reveal both interesting differences as well as similarities between the two.
Due to the limited battery and computing resource, offloading unmanned aerial vehicles (UAVs)' computation tasks to ground infrastructure, e.g., vehicles, is a fundamental framework. Under such an open and untrusted environment, vehicles are reluctant to share their computing resource unless provisioning strong incentives, privacy protection, and fairness guarantee. Precisely, without strategy-proofness guarantee, the strategic vehicles can overclaim participation costs so as to conduct market manipulation. Without the fairness provision, vehicles can deliberately abort the assigned tasks without any punishments, and UAVs can refuse to pay by the end, causing an exchange dilemma. Lastly, the strategy-proofness and fairness provision typically require transparent payment/task results exchange under public audit, which may disclose sensitive information of vehicles and make the privacy preservation a foremost issue. To achieve the three design goals, we propose SEAL, an integrated framework to address strategy-proof, fair, and privacy-preserving UAV computation offloading. SEAL deploys a strategy-proof reverse combinatorial auction mechanism to optimize UAVs' task offloading under practical constraints while ensuring economic-robustness and polynomial-time efficiency. Based on smart contracts and hashchain micropayment, SEAL implements a fair on-chain exchange protocol to realize the atomic completion of batch payments and computing results in multi-round auctions. In addition, a privacy-preserving off-chain auction protocol is devised with the assistance of the trusted processor to efficiently protect vehicles' bid privacy. Using rigorous theoretical analysis and extensive simulations, we validate that SEAL can effectively prevent vehicles from manipulating, ensure privacy protection and fairness, improve the offloading efficiency.
[Background] The MVP concept has influenced the way in which development teams apply Software Engineering practices. However, the overall understanding of this influence of MVPs on SE practices is still poor. [Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs and to gather practitioner insights on the identified practices. [Method] We conducted a systematic mapping study and discussed its results in two focus groups sessions involving twelve industry practitioners that extensively use MVPs in their projects to capture their perceptions on the findings of the mapping study. [Results] We identified 33 papers published between 2013 and 2020 and observed some trends related to MVP ideation and evaluation practices. For instance, regarding ideation, we found six different approaches and mainly informal end-user involvement practices. Regarding evaluation, there is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis. However, there is still limited research related to MVP technical feasibility assessment and effort estimation. Practitioners of the focus group sessions reinforced the confidence in our results regarding ideation and evaluation practices, being aware of most of the identified practices. They also reported how they deal with the technical feasibility assessments and effort estimation in practice. [Conclusion] Our analysis suggests that there are opportunities for solution proposals and evaluation studies to address literature gaps concerning technical feasibility assessment and effort estimation. Overall, more effort needs to be invested into empirically evaluating the existing MVP-related practices.
We consider four main goals when fitting spatial linear models: 1) estimating covariance parameters, 2) estimating fixed effects, 3) kriging (making point predictions), and 4) block-kriging (predicting the average value over a region). Each of these goals can present different challenges when analyzing large spatial data sets. Current research uses a variety of methods, including spatial basis functions (reduced rank), covariance tapering, etc, to achieve these goals. However, spatial indexing, which is very similar to composite likelihood, offers some advantages. We develop a simple framework for all four goals listed above by using indexing to create a block covariance structure and nearest-neighbor predictions while maintaining a coherent linear model. We show exact inference for fixed effects under this block covariance construction. Spatial indexing is very fast, and simulations are used to validate methods and compare to another popular method. We study various sample designs for indexing and our simulations showed that indexing leading to spatially compact partitions are best over a range of sample sizes, autocorrelation values, and generating processes. Partitions can be kept small, on the order of 50 samples per partition. We use nearest-neighbors for kriging and block kriging, finding that 50 nearest-neighbors is sufficient. In all cases, confidence intervals for fixed effects, and prediction intervals for (block) kriging, have appropriate coverage. Some advantages of spatial indexing are that it is available for any valid covariance matrix, can take advantage of parallel computing, and easily extends to non-Euclidean topologies, such as stream networks. We use stream networks to show how spatial indexing can achieve all four goals, listed above, for very large data sets, in a matter of minutes, rather than days, for an example data set.
Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost stands at roughly \$1.3K/TB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by studying the DNA coverage depth problem, which aims to reduce the required number of reads to retrieve information from the storage system. Under this framework, our main goal is to understand how to optimally pair an error-correcting code with a given retrieval algorithm to minimize the sequencing coverage depth, while guaranteeing retrieval of the information with high probability. Additionally, we study the DNA coverage depth problem under the random-access setup.
Coordinated inauthentic behavior is used as a tool on social media to shape public opinion by elevating or suppressing topics using systematic engagements -- e.g. through *likes* or similar reactions. In an honest world, reactions may be informative to users when selecting on what to spend their attention: through the wisdom of crowds, summed reactions may help identifying relevant and high-quality content. This is nullified by coordinated inauthentic liking. To restore wisdom-of-crowds effects, it is therefore desirable to separate the inauthentic agents from the wise crowd, and use only the latter as a voting *jury* on the relevance of a post. To this end, we design two *jury selection procedures* (JSPs) that discard agents classified as inauthentic. Using machine learning techniques, both cluster on binary vote data -- one using a Gaussian Mixture Model (GMM JSP), one the k-means algorithm (KM JSP) -- and label agents by logistic regression. We evaluate the jury selection procedures with an agent-based model, and show that the GMM JSP detects more inauthentic agents, but both JSPs select juries with vastly increased correctness of vote by majority. This proof of concept provides an argument for the release of reactions data from social media platforms through a direct use-case in the fight against online misinformation.
A contiguous area cartogram is a geographic map in which the area of each region is proportional to numerical data (e.g., population size) while keeping neighboring regions connected. In this study, we investigated whether value-to-area legends (square symbols next to the values represented by the squares' areas) and grid lines aid map readers in making better area judgments. We conducted an experiment to determine the accuracy, speed, and confidence with which readers infer numerical data values for the mapped regions. We found that, when only informed about the total numerical value represented by the whole cartogram without any legend, the distribution of estimates for individual regions was centered near the true value with substantial spread. Legends with grid lines significantly reduced the spread but led to a tendency to underestimate the values. Comparing differences between regions or between cartograms revealed that legends and grid lines slowed the estimation without improving accuracy. However, participants were more likely to complete the tasks when legends and grid lines were present, particularly when the area units represented by these features could be interactively selected. We recommend considering the cartogram's use case and purpose before deciding whether to include grid lines or an interactive legend.
The Check-All-That-Apply (CATA) method was compared to the Adapted-Pivot-Test (APT) method, a recently published method based on pair comparisons between a coded wine and a reference sample, called pivot, and using a set list of attributes as in CATA. Both methods were compared using identical wines, correspondence analyses and Chi-square test of independence, and very similar questionnaires. The list of attributes used for describing the wines was established in a prior analysis by a subset of the panel. The results showed that CATA was more robust and more descriptive than the APT with 50 to 60 panelists. The p-value of the Chi-square test of independence between wines and descriptors dropped below 0.05 around 50 panelists with the CATA method, when it never dropped below 0.8 with the APT. The discussion highlights differences in settings and logistics which render the CATA more robust and easier to run. One of the objectives was also to propose an easy setup for university and food industry laboratories. Practical applications: Our results describe a practical way of teaching and performing the CATA method with university students and online tools, as well as in extension courses. It should have applications with consumer studies for the characterization of various food products. Additionally, we provide an improved R script for correspondence analyses used in descriptive analyses and a Chi-square test to estimate the number of panelists leading to robust results. Finally, we give a set of data that could be useful for sensory and statistics teaching.
Wildfire propagation is a highly stochastic process where small changes in environmental conditions (such as wind speed and direction) can lead to large changes in observed behaviour. A traditional approach to quantify uncertainty in fire-front progression is to generate probability maps via ensembles of simulations. However, use of ensembles is typically computationally expensive, which can limit the scope of uncertainty analysis. To address this, we explore the use of a spatio-temporal neural-based modelling approach to directly estimate the likelihood of fire propagation given uncertainty in input parameters. The uncertainty is represented by deliberately perturbing the input weather forecast during model training. The computational load is concentrated in the model training process, which allows larger probability spaces to be explored during deployment. Empirical evaluations indicate that the proposed model achieves comparable fire boundaries to those produced by the traditional SPARK simulation platform, with an overall Jaccard index (similarity score) of 67.4% on a set of 35 simulated fires. When compared to a related neural model (emulator) which was employed to generate probability maps via ensembles of emulated fires, the proposed approach produces competitive Jaccard similarity scores while being approximately an order of magnitude faster.
Designs for screening experiments usually include factors with two levels only. Adding a few four-level factors allows for the inclusion of multi-level categorical factors or quantitative factors with possible quadratic or third-order effects. Three examples motivated us to generate a large catalog of designs with two-level factors as well as four-level factors. To create the catalog, we considered three methods. In the first method, we select designs using a search table, and in the second method, we use a procedure that selects candidate designs based on the properties of their projections into fewer factors. The third method is actually a benchmark method, in which we use a general orthogonal array enumeration algorithm. We compare the efficiencies of the new methods for generating complete sets of non-isomorphic designs. Finally, we use the most efficient method to generate a catalog of designs with up to three four-level factors and up to 20 two-level factors for run sizes 16, 32, 64, and 128. In some cases, a complete enumeration was infeasible. For these cases, we used a bounded enumeration strategy instead. We demonstrate the usefulness of the catalog by revisiting the motivating examples.
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.