苍井空无码免费换线,综合综合综合综合综合网,有大有硬又粗欧美一级视频,在线观看国产视频网站,国产欧美一区二区三区不

We consider a Bayesian estimator of sample size (BESS) and an application to oncology dose optimization clinical trials. BESS is built upon three pillars, Sample size, Evidence from observed data, and Confidence in posterior inference. It uses a simple logic of "given the evidence from data, a specific sample size can achieve a degree of confidence in the posterior inference." The key distinction between BESS and standard sample size estimation (SSE) is that SSE, typically based on Frequentist inference, specifies the true parameters values in its calculation while BESS assumes possible outcome from the observed data. As a result, the calibration of the sample size is not based on type I or type II error rates, but on posterior probabilities. We demonstrate that BESS leads to a more interpretable statement for investigators, and can easily accommodates prior information as well as sample size re-estimation. We explore its performance in comparison to the standard SSE and demonstrate its usage through a case study of oncology optimization trial. BESS can be applied to general hypothesis tests. An R tool is available at //ccte.uchicago.edu/BESS.

相關內容

估計/估計量

關注 3

MoDELS · 語言模型化 · 大語言模型 · Performer · 特化 ·

2024 年 6 月 3 日

Arrows of Time for Large Language Models

Vassilis Papadopoulos,Jérémie Wenger,Clément Hongler

from arxiv, Re-arranged and updated figures. Added experiments. 12 figures, 20 pages

We study the probabilistic modeling performed by Autoregressive Large Language Models (LLMs) through the angle of time directionality, addressing a question first raised in (Shannon, 1951). For large enough models, we empirically find a time asymmetry in their ability to learn natural language: a difference in the average log-perplexity when trying to predict the next token versus when trying to predict the previous one. This difference is at the same time subtle and very consistent across various modalities (language, model size, training time, ...). Theoretically, this is surprising: from an information-theoretic point of view, there should be no such difference. We provide a theoretical framework to explain how such an asymmetry can appear from sparsity and computational complexity considerations, and outline a number of perspectives opened by our results.

MoDELS · Learning · 因果因子 · INFORMS · 可辨認的 ·

2024 年 6 月 3 日

Targeted Reduction of Causal Models

Armin Keki?,Bernhard Sch?lkopf,Michel Besserve

Why does a phenomenon occur? Addressing this question is central to most scientific inquiries and often relies on simulations of scientific models. As models become more intricate, deciphering the causes behind phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal Representation Learning (CRL) offers a promising avenue to uncover interpretable causal patterns within these simulations through an interventional lens. However, developing general CRL frameworks suitable for practical applications remains an open challenge. We introduce Targeted Causal Reduction (TCR), a method for condensing complex intervenable models into a concise set of causal factors that explain a specific target phenomenon. We propose an information theoretic objective to learn TCR from interventional data of simulations, establish identifiability for continuous variables under shift interventions and present a practical algorithm for learning TCRs. Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems, illustrating its potential to assist scientists in the study of complex phenomena in a broad range of disciplines.

輸出 · 生成式人工智能 · Less · Chatbot · 變換 ·

2024 年 6 月 3 日

The Heterogeneous Productivity Effects of Generative AI

David Kreitmeir,Paul A. Raschky

We analyse the individual productivity effects of Italy's ban on ChatGPT, a generative pretrained transformer chatbot. We compile data on the daily coding output quantity and quality of over 36,000 GitHub users in Italy and other European countries and combine these data with the sudden announcement of the ban in a difference-in-differences framework. Among the affected users in Italy, we find a short-term increase in output quantity and quality for less experienced users and a decrease in productivity on more routine tasks for experienced users.

INFORMS · 信息幾何 · 離散數學 ·

2024 年 6 月 2 日

The Information Geometry of UMAP

Alexander Kolpakov,Aidan Rocke

from arxiv, 11 pages, 2 figures, 3 tables; Github repo (//github.com/sashakolpakov/info-geometry-umap)

Although UMAP was derived from Category Theory observations, its underlying mechanisms may be clarified using Information Geometry.

似然 · 極大似然 · 最大似然估計 · 估計/估計量 · 情景 ·

2024 年 6 月 2 日

Rational Maximum Likelihood Estimators of Kronecker Covariance Matrices

Mathias Drton,Alexandros Grosdos,Andrew McCormack

from arxiv, 16 pages, 2 tables

As is the case for many curved exponential families, the computation of maximum likelihood estimates in a multivariate normal model with a Kronecker covariance structure is typically carried out with an iterative algorithm, specifically, a block-coordinate ascent algorithm. In this article we highlight a setting, specified by a coprime relationship between the sample size and dimension of the Kronecker factors, where the likelihood equations have algebraic degree one and an explicit, easy-to-evaluate rational formula for the maximum likelihood estimator can be found. A partial converse of this result is provided that shows that outside of the aforementioned special setting and for large sample sizes, examples of data sets can be constructed for which the degree of the likelihood equations is larger than one.

穩健性 · 統計量 · 優化器 · MoDELS · 泛化理論 ·

2024 年 5 月 30 日

Statistical Properties of Robust Satisficing

Zhiyi Li,Yunbei Xu,Ruohan Zhan

The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforward path to deriving statistical guarantees compared to the seminal Distributionally Robust Optimization (DRO), resulting in a richer set of results. In particular, we establish two-sided confidence intervals for the optimal loss without the need to solve a minimax optimization problem explicitly. We further provide finite-sample generalization error bounds for the RS optimizer. Importantly, our results extend to scenarios involving distribution shifts, where discrepancies exist between the sampling and target distributions. Our numerical experiments show that the RS model consistently outperforms the baseline empirical risk minimization in small-sample regimes and under distribution shifts. Furthermore, compared to the DRO model, the RS model exhibits lower sensitivity to hyperparameter tuning, highlighting its practicability for robustness considerations.

核化 · Learning · 動力系統 · 潛在 · MoDELS ·

2024 年 5 月 30 日

Recurrent Deep Kernel Learning of Dynamical Systems

Nicolò Botteghi,Paolo Motta,Andrea Manzoni,Paolo Zunino,Mengwu Guo

Digital twins require computationally-efficient reduced-order models (ROMs) that can accurately describe complex dynamics of physical assets. However, constructing ROMs from noisy high-dimensional data is challenging. In this work, we propose a data-driven, non-intrusive method that utilizes stochastic variational deep kernel learning (SVDKL) to discover low-dimensional latent spaces from data and a recurrent version of SVDKL for representing and predicting the evolution of latent dynamics. The proposed method is demonstrated with two challenging examples -- a double pendulum and a reaction-diffusion system. Results show that our framework is capable of (i) denoising and reconstructing measurements, (ii) learning compact representations of system states, (iii) predicting system evolution in low-dimensional latent spaces, and (iv) quantifying modeling uncertainties.

語言模型化 · MoDELS · 泛化理論 · 可辨認的 · Continuity ·

2023 年 7 月 12 日

A Comprehensive Overview of Large Language Models

Humza Naveed,Asad Ullah Khan,Shi Qiu,Muhammad Saqib,Saeed Anwar,Muhammad Usman,Nick Barnes,Ajmal Mian

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Networking · 學成 · Principle · MoDELS · Networks ·

2021 年 6 月 18 日

The Principles of Deep Learning Theory

Daniel A. Roberts,Sho Yaida,Boris Hanin

from arxiv, 451 pages, to be published by Cambridge University Press

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

圖像分割 · 代價 · Performer · SCAN · Better ·

2018 年 1 月 31 日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Marc Bosch,Christopher M. Gifford,Austin G. Dress,Clare W. Lau,Jeffrey G. Skibo,Gordon A. Christie

from arxiv, Accepted BMVC 17

Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.