亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This book is written to offer a humble, but unified, treatment of e-values in hypothesis testing. The book is organized into three parts: Fundamental Concepts, Core Ideas, and Advanced Topics. The first part includes three chapters that introduce the basic concepts. The second part includes five chapters of core ideas such as universal inference, log-optimality, e-processes, operations on e-values, and e-values in multiple testing. The third part contains five chapters of advanced topics. We hope that, by putting the materials together in this book, the concept of e-values becomes more accessible for educational, research, and practical use.

相關內容

 Beginner's All-purpose Symbolic Instruction Code(初學者通用的符號指令代碼),剛開始被作者寫做 BASIC,后來被微軟廣泛地叫做 Basic 。

Basic general properties are considered for the Fisher-type information involving higher order derivatives. They are used to explore various properties of probability densities and to derive Stam-type inequalities.

Subclasses of TFNP (total functional NP) are usually defined by specifying a complete problem, which is necessarily in TFNP, and including all problems many-one reducible to it. We study two notions of how a TFNP problem can be reducible to an object, such as a complexity class, outside TFNP. This gives rise to subclasses of TFNP which capture some properties of that outside object. We show that well-known subclasses can arise in this way, for example PPA from reducibility to parity P and PLS from reducibility to P^NP. We study subclasses arising from PSPACE and the polynomial hierarchy, and show that they are characterized by the propositional proof systems Frege and constant-depth Frege, extending the known pairings between natural TFNP subclasses and proof systems. We study approximate counting from this point of view, and look for a subclass of TFNP that gives a natural home to combinatorial principles such as Ramsey which can be proved using approximate counting. We relate this to the recently-studied Long choice and Short choice problems.

Background: The standard regulatory approach to assess replication success is the two-trials rule, requiring both the original and the replication study to be significant with effect estimates in the same direction. The sceptical p-value was recently presented as an alternative method for the statistical assessment of the replicability of study results. Methods: We compare the statistical properties of the sceptical p-value and the two-trials rule. We illustrate the performance of the different methods using real-world evidence emulations of randomized, controlled trials (RCTs) conducted within the RCT DUPLICATE initiative. Results: The sceptical p-value depends not only on the two p-values, but also on sample size and effect size of the two studies. It can be calibrated to have the same Type-I error rate as the two-trials rule, but has larger power to detect an existing effect. In the application to the results from the RCT DUPLICATE initiative, the sceptical p-value leads to qualitatively similar results than the two-trials rule, but tends to show more evidence for treatment effects compared to the two-trials rule. Conclusion: The sceptical p-value represents a valid statistical measure to assess the replicability of study results and is especially useful in the context of real-world evidence emulations.

The goal of uplift modeling is to recommend actions that optimize specific outcomes by determining which entities should receive treatment. One common approach involves two steps: first, an inference step that estimates conditional average treatment effects (CATEs), and second, an optimization step that ranks entities based on their CATE values and assigns treatment to the top k within a given budget. While uplift modeling typically focuses on binary treatments, many real-world applications are characterized by continuous-valued treatments, i.e., a treatment dose. This paper presents a predict-then-optimize framework to allow for continuous treatments in uplift modeling. First, in the inference step, conditional average dose responses (CADRs) are estimated from data using causal machine learning techniques. Second, in the optimization step, we frame the assignment task of continuous treatments as a dose-allocation problem and solve it using integer linear programming (ILP). This approach allows decision-makers to efficiently and effectively allocate treatment doses while balancing resource availability, with the possibility of adding extra constraints like fairness considerations or adapting the objective function to take into account instance-dependent costs and benefits to maximize utility. The experiments compare several CADR estimators and illustrate the trade-offs between policy value and fairness, as well as the impact of an adapted objective function. This showcases the framework's advantages and flexibility across diverse applications in healthcare, lending, and human resource management. All code is available on github.com/SimonDeVos/UMCT.

Two sequential estimators are proposed for the odds p/(1-p) and log odds log(p/(1-p)) respectively, using independent Bernoulli random variables with parameter p as inputs. The estimators are unbiased, and guarantee that the variance of the estimation error divided by the true value of the odds, or the variance of the estimation error of the log odds, are less than a target value for any p in (0,1). The estimators are close to optimal in the sense of Wolfowitz's bound.

Since its introduction to the public, ChatGPT has had an unprecedented impact. While some experts praised AI advancements and highlighted their potential risks, others have been critical about the accuracy and usefulness of Large Language Models (LLMs). In this paper, we are interested in the ability of LLMs to identify causal relationships. We focus on the well-established GPT-4 (Turbo) and evaluate its performance under the most restrictive conditions, by isolating its ability to infer causal relationships based solely on the variable labels without being given any other context by humans, demonstrating the minimum level of effectiveness one can expect when it is provided with label-only information. We show that questionnaire participants judge the GPT-4 graphs as the most accurate in the evaluated categories, closely followed by knowledge graphs constructed by domain experts, with causal Machine Learning (ML) far behind. We use these results to highlight the important limitation of causal ML, which often produces causal graphs that violate common sense, affecting trust in them. However, we show that pairing GPT-4 with causal ML overcomes this limitation, resulting in graphical structures learnt from real data that align more closely with those identified by domain experts, compared to structures learnt by causal ML alone. Overall, our findings suggest that despite GPT-4 not being explicitly designed to reason causally, it can still be a valuable tool for causal representation, as it improves the causal discovery process of causal ML algorithms that are designed to do just that.

In a Monte-Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones. Sequential Monte-Carlo tests aim to save computational resources by generating these additional datasets sequentially one by one, and potentially stopping early. While earlier tests yield valid inference at a particular prespecified stopping rule, our work develops a new anytime-valid Monte-Carlo test that can be continuously monitored, yielding a p-value or e-value at any stopping time possibly not specified in advance. It generalizes the well-known method by Besag and Clifford, allowing it to stop at any time, but also encompasses new sequential Monte-Carlo tests that tend to stop sooner under the null and alternative without compromising power. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on whether a generated test statistic is greater or smaller than the observed one. The betting strategies are guided by the derivation of a simple log-optimal betting strategy, have closed form expressions for the wealth process, provable guarantees on resampling risk, and display excellent power in practice.

Ideally, all analyses of normally distributed data should include the full covariance information between all data points. In practice, the full covariance matrix between all data points is not always available. Either because a result was published without a covariance matrix, or because one tries to combine multiple results from separate publications. For simple hypothesis tests, it is possible to define robust test statistics that will behave conservatively in the presence on unknown correlations. For model parameter fits, one can inflate the variance by a factor to ensure that things remain conservative at least up to a chosen confidence level. This paper describes a class of robust test statistics for simple hypothesis tests, as well as an algorithm to determine the necessary inflation factor for model parameter fits and Goodness of Fit tests and composite hypothesis tests. It then presents some example applications of the methods to real neutrino interaction data and model comparisons.

Delay Tolerant Networking (DTN) aims to address a myriad of significant networking challenges that appear in time-varying settings, such as mobile and satellite networks, wherein changes in network topology are frequent and often subject to environmental constraints. Within this paradigm, routing problems are often solved by extending classical graph-theoretic path finding algorithms, such as the Bellman-Ford or Floyd-Warshall algorithms, to the time-varying setting; such extensions are simple to understand, but they have strict optimality criteria and can exhibit non-polynomial scaling. Acknowledging this, we study time-varying shortest path problems on metric graphs whose vertices are traced by semi-algebraic curves. As an exemplary application, we establish a polynomial upper bound on the number of topological critical events encountered by a set of $n$ satellites moving along elliptic curves in low Earth orbit (per orbital period). Experimental evaluations on networks derived from STARLINK satellite TLE's demonstrate that not only does this geometric framework allow for routing schemes between satellites requiring recomputation an order of magnitude less than graph-based methods, but it also demonstrates metric spanner properties exist in metric graphs derived from real-world data, opening the door for broader applications of geometric DTN routing.

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

北京阿比特科技有限公司