Power laws have been found to describe a wide variety of natural (physical, biological, astronomic, meteorological, geological) and man-made (social, financial, computational) phenomena over a wide range of magnitudes, although their underlying mechanisms are not always clear. In statistics, power law distribution is often found to fit data exceptionally well when the normal (Gaussian) distribution fails. Nevertheless, predicting power law phenomena is notoriously difficult because some of its idiosyncratic properties such as lack of well-defined average value, and potentially unbounded variance. TPL (Taylor power law), a power law first discovered to characterize the spatial and/or temporal distribution of biological populations and recently extended to describe the spatiotemporal heterogeneities (distributions) of human microbiomes and other natural and artificial systems such as fitness distribution in computational (artificial) intelligence. The power law with exponential cutoff (PLEC) is a variant of power-law function that tapers off the exponential growth of power-law function ultimately and can be particularly useful for certain predictive problems such as biodiversity estimation and turning-point prediction for COVID-19 infection/fatality. Here, we propose coupling (integration) of TPL and PLEC to offer improved prediction quality of certain power-law phenomena. The coupling takes advantages of variance prediction using TPL and the asymptote estimation using PLEC and delivers confidence interval for the asymptote. We demonstrate the integrated approach to the estimation of potential (dark) biodiversity and turning point of COVID-19 fatality. We expect this integrative approach should have wide applications given the duel relationship between power law and normal statistical distributions.
Biodiversity, the variation within and between species and ecosystems, is essential for human well-being and the equilibrium of the planet. It is critical for the sustainable development of human society and is an important global challenge. Biodiversity research has become increasingly data-intensive and it deals with heterogeneous and distributed data made available by global and regional initiatives, such as GBIF, ILTER, LifeWatch, BODC, PANGAEA, and TERN, that apply different data management practices. In particular, a variety of metadata and semantic resources have been produced by these initiatives to describe biodiversity observations, introducing interoperability issues across data management systems. To address these challenges, the InteroperAble Descriptions of Observable Property Terminology WG (I-ADOPT WG) was formed by a group of international terminology providers and data center managers in 2019 with the aim to build a common approach to describe what is observed, measured, calculated, or derived. Based on an extensive analysis of existing semantic representations of variables, the WG has recently published the I-ADOPT framework ontology to facilitate interoperability between existing semantic resources and support the provision of machine-readable variable descriptions whose components are mapped to FAIR vocabulary terms. The I-ADOPT framework ontology defines a set of high level semantic components that can be used to describe a variety of patterns commonly found in scientific observations. This contribution will focus on how the I-ADOPT framework can be applied to represent variables commonly used in the biodiversity domain.
The ability to confront new questions, opportunities, and challenges is of fundamental importance to human progress and the resilience of human societies, yet the capacity of science to meet new demands remains poorly understood. Here we deploy a new measurement framework to investigate the scientific response to the COVID-19 pandemic and the adaptability of science as a whole. We find that science rapidly shifted to engage COVID-19 following the advent of the virus, with scientists across all fields making large jumps from their prior research streams. However, this adaptive response reveals a pervasive "pivot penalty", where the impact of the new research steeply declines the further the scientists move from their prior work. The pivot penalty is severe amidst COVID-19 research, but it is not unique to COVID-19. Rather it applies nearly universally across the sciences, and has been growing in magnitude over the past five decades. While further features condition pivoting, including a scientist's career stage, prior expertise and impact, collaborative scale, the use of new coauthors, and funding, we find that the pivot penalty persists and remains substantial regardless of these features, suggesting the pivot penalty acts as a fundamental friction that governs science's ability to adapt. The pivot penalty not only holds key implications for the design of the scientific system and human capacity to confront emergent challenges through scientific advance, but may also be relevant to other social and economic systems, where shifting to meet new demands is central to survival and success.
We present a barrier method for treating frictional contact on interfaces embedded in finite elements. The barrier treatment has several attractive features, including: (i) it does not introduce any additional degrees of freedom or iterative steps, (ii) it is free of inter-penetration, (iii) it avoids an ill-conditioned matrix system, and (iv) it allows one to control the solution accuracy directly. We derive the contact pressure from a smooth barrier energy function that is designed to satisfy the non-penetration constraint. Likewise, we make use of a smoothed friction law in which the stick-slip transition is described by a continuous function of the slip displacement. We discretize the formulation using the extended finite element method to embed interfaces inside elements, and devise an averaged surface integration scheme that effectively provides stable solutions without traction oscillations. Subsequently, we develop a way to tailor the parameters of the barrier method to embedded interfaces, such that the method can be used without parameter tuning. We verify and investigate the proposed method through numerical examples with various levels of complexity. The numerical results demonstrate that the proposed method is remarkably robust for challenging frictional contact problems, while requiring low cost comparable to that of the penalty method.
In this paper, we develop a provably energy stable and conservative discontinuous spectral element method for the shifted wave equation in second order form. The proposed method combines the advantages and central ideas of very successful numerical techniques, the summation-by-parts finite difference method, the spectral method and the discontinuous Galerkin method. We prove energy-stability, discrete conservation principle, and derive error estimates in the energy norm for the (1+1)-dimensions shifted wave equation in second order form. The energy-stability results, discrete conservation principle, and the error estimates generalise to multiple dimensions using tensor products of quadrilateral and hexahedral elements. Numerical experiments, in (1+1)-dimensions and (2+1)-dimensions, verify the theoretical results and demonstrate optimal convergence of $L^2$ numerical errors at subsonic, sonic and supersonic regimes.
When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution calibration tends to be infeasible, requiring sample complexity exponential in the number of classes $C$. In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers. When all possible decision makers are under consideration, decision calibration is the same as distribution calibration. However, when we only consider decision makers choosing between a bounded number of actions (e.g. polynomial in $C$), our main result shows that decisions calibration becomes feasible -- we design a recalibration algorithm that requires sample complexity polynomial in the number of actions and the number of classes. We validate our recalibration algorithm empirically: compared to existing methods, decision calibration improves decision-making on skin lesion and ImageNet classification with modern neural network predictors.
The Panopticon (which means "watcher of everything") is a well-known structure of continuous surveillance and discipline proposed by Bentham in 1785. This device was, later, used by Foucault and other philosophers as a paradigm and metaphor for the study of constitutional power and knowledge as well as a model of individuals' deprivation of freedom. Nowadays, technological achievements have given rise to new, non-physical (unlike prisons), means of constant surveillance that transcend physical boundaries. This, combined with the confession of some governmental institutions that they actually collaborate with these Internet giants to collect or deduce information about people, creates a worrisome situation of several co-existing Panopticons that can act separately or in close collaboration. Thus, they can only be detected and identified through the expense of (perhaps considerable) effort. In this paper we provide a theoretical framework for studying the detectability status of Panopticons that fall under two theoretical, but not unrealistic, definitions. We show, using Oracle Turing Machines, that detecting modern day, ICT-based, Panopticons is an undecidable problem. Furthermore, we show that for each sufficiently expressive formal system, we can effectively construct a Turing Machine for which it is impossible to prove, within the formal system, either that it is a Panopticon or it is not a Panopticon.
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.
In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 - 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams' covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.
Predictive models of student success in Massive Open Online Courses (MOOCs) are a critical component of effective content personalization and adaptive interventions. In this article we review the state of the art in predictive models of student success in MOOCs and present a categorization of MOOC research according to the predictors (features), prediction (outcomes), and underlying theoretical model. We critically survey work across each category, providing data on the raw data source, feature engineering, statistical model, evaluation method, prediction architecture, and other aspects of these experiments. Such a review is particularly useful given the rapid expansion of predictive modeling research in MOOCs since the emergence of major MOOC platforms in 2012. This survey reveals several key methodological gaps, which include extensive filtering of experimental subpopulations, ineffective student model evaluation, and the use of experimental data which would be unavailable for real-world student success prediction and intervention, which is the ultimate goal of such models. Finally, we highlight opportunities for future research, which include temporal modeling, research bridging predictive and explanatory student models, work which contributes to learning theory, and evaluating long-term learner success in MOOCs.