A key challenge when trying to understand innovation is that it is a dynamic, ongoing process, which can be highly contingent on ephemeral factors such as culture, economics, or luck. This means that any analysis of the real-world process must necessarily be historical - and thus probably too late to be most useful - but also cannot be sure what the properties of the web of connections between innovations is or was. Here I try to address this by designing and generating a set of synthetic innovation web "dictionaries" that can be used to host sampled innovation timelines, probe the overall statistics and behaviours of these processes, and determine the degree of their reliance on the structure or generating algorithm. Thus, inspired by the work of Fink, Reeves, Palma and Farr (2017) on innovation in language, gastronomy, and technology, I study how new symbol discovery manifests itself in terms of additional "word" vocabulary being available from dictionaries generated from a finite number of symbols. Several distinct dictionary generation models are investigated using numerical simulation, with emphasis on the scaling of knowledge as dictionary generators and parameters are varied, and the role of which order the symbols are discovered in.
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred and re-evaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.
Threshold selection is a fundamental problem in any threshold-based extreme value analysis. While models are asymptotically motivated, selecting an appropriate threshold for finite samples can be difficult through standard methods. Inference can also be highly sensitive to the choice of threshold. Too low a threshold choice leads to bias in the fit of the extreme value model, while too high a choice leads to unnecessary additional uncertainty in the estimation of model parameters. In this paper, we develop a novel methodology for automated threshold selection that directly tackles this bias-variance trade-off. We also develop a method to account for the uncertainty in this threshold choice and propagate this uncertainty through to high quantile inference. Through a simulation study, we demonstrate the effectiveness of our method for threshold selection and subsequent extreme quantile estimation. We apply our method to the well-known, troublesome example of the River Nidd dataset.
Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions. Two regression methods based on fusion penalties are presented. The first one is a generalization of the variable fusion methodology based on the 1-nearest neighbor. The second one, called group fusion lasso, assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. A finite sample numerical simulation and an application on EEG data are presented.
Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et al. (2017) used a calibration kernel to boost the sample weights around the observed data, resulting in a concentrated loss function. However, the use of calibration kernels may increase the variances of both the empirical loss and its gradient, making the training inefficient. To improve the stability of SNPE, this paper proposes to use an adaptive calibration kernel and several variance reduction techniques. The proposed method greatly speeds up the process of training, and provides a better approximation of the posterior than the original SNPE method and some existing competitors as confirmed by numerical experiments.
We construct an efficient class of increasingly high-order (up to 17th-order) essentially non-oscillatory schemes with multi-resolution (ENO-MR) for solving hyperbolic conservation laws. The candidate stencils for constructing ENO-MR schemes range from first-order one-point stencil increasingly up to the designed very high-order stencil. The proposed ENO-MR schemes adopt a very simple and efficient strategy that only requires the computation of the highest-order derivatives of a part of candidate stencils. Besides simplicity and high efficiency, ENO-MR schemes are completely parameter-free and essentially scale-invariant. Theoretical analysis and numerical computations show that ENO-MR schemes achieve designed high-order convergence in smooth regions which may contain high-order critical points (local extrema) and retain ENO property for strong shocks. In addition, ENO-MR schemes could capture complex flow structures very well.
Adjustment of statistical significance levels for repeated analysis in group sequential trials has been understood for some time. Similarly, methods for adjustment accounting for testing multiple hypotheses are common. There is limited research on simultaneously adjusting for both multiple hypothesis testing and multiple analyses of one or more hypotheses. We address this gap by proposing adjusted-sequential p-values that reject an elementary hypothesis when its adjusted-sequential p-values are less than or equal to the family-wise Type I error rate (FWER) in a group sequential design. We also propose sequential p-values for intersection hypotheses as a tool to compute adjusted sequential p-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests, comparing adjusted sequential p-values to a desired FWER for inference on each elementary hypothesis tested.
The responsibility of a method/function is to perform some desired computations and disseminate the results to its caller through various deliverables, including object fields and variables in output instructions. Based on this definition of responsibility, this paper offers a new algorithm to refactor long methods to those with a single responsibility. We propose a backward slicing algorithm to decompose a long method into slightly overlapping slices. The slices are computed for each output instruction, representing the outcome of a responsibility delegated to the method. The slices will be non-overlapping if the slicing criteria address the same output variable. The slices are further extracted as independent methods, invoked by the original method if certain behavioral preservations are made. The proposed method has been evaluated on the GEMS extract method refactoring benchmark and three real-world projects. On average, our experiments demonstrate at least a 29.6% improvement in precision and a 12.1% improvement in the recall of uncovering refactoring opportunities compared to the state-of-the-art approaches. Furthermore, our tool improves method-level cohesion metrics by an average of 20% after refactoring. Experimental results confirm the applicability of the proposed approach in extracting methods with a single responsibility.
We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $\theta$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimation using confidence sequences: we begin a new $(1-\alpha)$-confidence sequence at each time step, and proclaim a change when the intersection of all active confidence sequences becomes empty. We prove that the average run length is at least $1/\alpha$, resulting in a change detection scheme with minimal structural assumptions~(thus allowing for possibly dependent observations, and nonparametric distribution classes), but strong guarantees. Our approach bears an interesting parallel with the reduction from change detection to sequential testing of Lorden (1971) and the e-detector of Shin et al. (2022).
When modeling scientific and industrial problems, geometries are typically modeled by explicit boundary representations obtained from computer-aided design software. Unfitted (also known as embedded or immersed) finite element methods offer a significant advantage in dealing with complex geometries, eliminating the need for generating unstructured body-fitted meshes. However, current unfitted finite elements on nonlinear geometries are restricted to implicit (possibly high-order) level set geometries. In this work, we introduce a novel automatic computational pipeline to approximate solutions of partial differential equations on domains defined by explicit nonlinear boundary representations. For the geometrical discretization, we propose a novel algorithm to generate quadratures for the bulk and surface integration on nonlinear polytopes required to compute all the terms in unfitted finite element methods. The algorithm relies on a nonlinear triangulation of the boundary, a kd-tree refinement of the surface cells that simplify the nonlinear intersections of surface and background cells to simple cases that are diffeomorphically equivalent to linear intersections, robust polynomial root-finding algorithms and surface parameterization techniques. We prove the correctness of the proposed algorithm. We have successfully applied this algorithm to simulate partial differential equations with unfitted finite elements on nonlinear domains described by computer-aided design models, demonstrating the robustness of the geometric algorithm and showing high-order accuracy of the overall method.
We propose and compare methods for the analysis of extreme events in complex systems governed by PDEs that involve random parameters, in situations where we are interested in quantifying the probability that a scalar function of the system's solution is above a threshold. If the threshold is large, this probability is small and its accurate estimation is challenging. To tackle this difficulty, we blend theoretical results from large deviation theory (LDT) with numerical tools from PDE-constrained optimization. Our methods first compute parameters that minimize the LDT-rate function over the set of parameters leading to extreme events, using adjoint methods to compute the gradient of this rate function. The minimizers give information about the mechanism of the extreme events as well as estimates of their probability. We then propose a series of methods to refine these estimates, either via importance sampling or geometric approximation of the extreme event sets. Results are formulated for general parameter distributions and detailed expressions are provided when Gaussian distributions. We give theoretical and numerical arguments showing that the performance of our methods is insensitive to the extremeness of the events we are interested in. We illustrate the application of our approach to quantify the probability of extreme tsunami events on shore. Tsunamis are typically caused by a sudden, unpredictable change of the ocean floor elevation during an earthquake. We model this change as a random process, which takes into account the underlying physics. We use the one-dimensional shallow water equation to model tsunamis numerically. In the context of this example, we present a comparison of our methods for extreme event probability estimation, and find which type of ocean floor elevation change leads to the largest tsunamis on shore.