良辰好景知几何电视剧免费观看_看一级毛片久久久久久免费毛片_亚洲国产午夜精品理论电影_蜜臀久久99精品久久久久久酒店_黄色AV一本二本在线观看_七次郎在线观看国产在线_99久久综合狠狠综合久久一区

Reductionism is a viable strategy for designing and implementing practical programming languages, leading to solutions which are easier to extend, experiment with and formally analyze. We formally specify and implement an extensible programming language, based on a minimalistic first-order imperative core language plus strong abstraction mechanisms, reflection and self-modification features. The language can be extended to very high levels: by using Lisp-style macros and code-to-code transforms which automatically rewrite high-level expressions into core forms, we define closures and first-class continuations on top of the core. Non-self-modifying programs can be analyzed and formally reasoned upon, thanks to the language simple semantics. We formally develop a static analysis and prove a soundness property with respect to the dynamic semantics. We develop a parallel garbage collector suitable to multi-core machines to permit efficient execution of parallel programs.

相關內容

Extensibility

關注 5

iOS 8 提供的應用間和應用跟系統的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source:

語言模型化 · MoDELS · 推斷 · 多樣性 · 可約的 ·

2023 年 11 月 9 日

Chameleon: a heterogeneous and disaggregated accelerator system for retrieval-augmented language models

Wenqi Jiang,Marco Zeller,Roger Waleffe,Torsten Hoefler,Gustavo Alonso

A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems.

泛化理論 · MoDELS · 目標領域 · 情景 · 縮放 ·

2023 年 11 月 9 日

Generalization in medical AI: a perspective on developing scalable models

Joachim A. Behar,Jeremy Levy,Leo Anthony Celi

Over the past few years, research has witnessed the advancement of deep learning models trained on large datasets, some even encompassing millions of examples. While these impressive performance on their hidden test sets, they often underperform when assessed on external datasets. Recognizing the critical role of generalization in medical AI development, many prestigious journals now require reporting results both on the local hidden test set as well as on external datasets before considering a study for publication. Effectively, the field of medical AI has transitioned from the traditional usage of a single dataset that is split into train and test to a more comprehensive framework using multiple datasets, some of which are used for model development (source domain) and others for testing (target domains). However, this new experimental setting does not necessarily resolve the challenge of generalization. This is because of the variability encountered in intended use and specificities across hospital cultures making the idea of universally generalizable systems a myth. On the other hand, the systematic, and a fortiori recurrent re-calibration, of models at the individual hospital level, although ideal, may be overoptimistic given the legal, regulatory and technical challenges that are involved. Re-calibration using transfer learning may not even be possible in some instances where reference labels of target domains are not available. In this perspective we establish a hierarchical three-level scale system reflecting the generalization level of a medical AI algorithm. This scale better reflects the diversity of real-world medical scenarios per which target domain data for re-calibration of models may or not be available and if it is, may or not have reference labels systematically available.

優化器 · MoDELS · 操作 · 可辨認的 · 設計 ·

2023 年 11 月 8 日

Optimization approaches for the design and operation of open-loop shallow geothermal systems

S. Halilovic,F. B?ttcher,K. Zosseder,T. Hamacher

from arxiv, 16 pages, 3 figures; submitted to Advances in Geosciences

The optimization of open-loop shallow geothermal systems, which includes both design and operational aspects, is an important research area aimed at improving their efficiency and sustainability and the effective management of groundwater as a shallow geothermal resource. This paper investigates various approaches to address optimization problems arising from these research and implementation questions about GWHP systems. The identified optimization approaches are thoroughly analyzed based on criteria such as computational cost and applicability. Moreover, a novel classification scheme is introduced that categorizes the approaches according to the types of groundwater simulation model and the optimization algorithm used. Simulation models are divided into two types: numerical and simplified (analytical or data-driven) models, while optimization algorithms are divided into gradient-based and derivative-free algorithms. Finally, a comprehensive review of existing approaches in the literature is provided, highlighting their strengths and limitations and offering recommendations for both the use of existing approaches and the development of new, improved ones in this field.

語言模型化 · MoDELS · INFORMS · Better · 有向 ·

2023 年 11 月 8 日

Speech language models lack important brain-relevant semantics

Subba Reddy Oota,Emin ?elik,Fatma Deniz,Mariya Toneva

from arxiv, 23 pages, 16 figures

Despite known differences between reading and listening in the brain, recent work has shown that text-based language models predict both text-evoked and speech-evoked brain activity to an impressive degree. This poses the question of what types of information language models truly predict in the brain. We investigate this question via a direct approach, in which we eliminate information related to specific low-level stimulus features (textual, speech, and visual) in the language model representations, and observe how this intervention affects the alignment with fMRI brain recordings acquired while participants read versus listened to the same naturalistic stories. We further contrast our findings with speech-based language models, which would be expected to predict speech-evoked brain activity better, provided they model language processing in the brain well. Using our direct approach, we find that both text-based and speech-based language models align well with early sensory regions due to shared low-level features. Text-based models continue to align well with later language regions even after removing these features, while, surprisingly, speech-based models lose most of their alignment. These findings suggest that speech-based models can be further improved to better reflect brain-like language processing.

Learning · 吸引點 · MoDELS · 統計量 · Machine Learning ·

2023 年 11 月 7 日

Generative learning for nonlinear dynamics

William Gilpin

from arxiv, 23 pages, 4 figures

Modern generative machine learning models demonstrate surprising ability to create realistic outputs far beyond their training data, such as photorealistic artwork, accurate protein structures, or conversational text. These successes suggest that generative models learn to effectively parametrize and sample arbitrarily complex distributions. Beginning half a century ago, foundational works in nonlinear dynamics used tools from information theory to infer properties of chaotic attractors from time series, motivating the development of algorithms for parametrizing chaos in real datasets. In this perspective, we aim to connect these classical works to emerging themes in large-scale generative statistical learning. We first consider classical attractor reconstruction, which mirrors constraints on latent representations learned by state space models of time series. We next revisit early efforts to use symbolic approximations to compare minimal discrete generators underlying complex processes, a problem relevant to modern efforts to distill and interpret black-box statistical models. Emerging interdisciplinary works bridge nonlinear dynamics and learning theory, such as operator-theoretic methods for complex fluid flows, or detection of broken detailed balance in biological datasets. We anticipate that future machine learning techniques may revisit other classical concepts from nonlinear dynamics, such as transinformation decay and complexity-entropy tradeoffs.

MoDELS · FAST · 可約的 · 生成模型 · Performer ·

2023 年 11 月 7 日

Generalizing to new geometries with Geometry-Aware Autoregressive Models (GAAMs) for fast calorimeter simulation

Junze Liu,Aishik Ghosh,Dylan Smith,Pierre Baldi,Daniel Whiteson

Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. This proof-of-concept study motivates the design of a foundational model that will be a crucial tool for the study of future detectors, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.

IDC · ML · AUC · 相同 · Principle ·

2023 年 11 月 7 日

The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Daniela P. Schacherer,Markus D. Herrmann,David A. Clunie,Henning H?fener,William Clifford,William J. R. Longabaugh,Steve Pieper,Ron Kikinis,Andrey Fedorov,André Homeyer

from arxiv, 13 pages, 5 figures; improved manuscript, new experiments with P100 GPU

Background and Objectives: Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research. Methods: Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent. However, we observed small variations in AUC values of up to 0.045, indicating a practical limit to reproducibility. Conclusions: We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments.

近似 · 泛函 · 樣本 · 穩健性 · 稀疏 ·

2023 年 11 月 6 日

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

Ben Adcock,Simone Brugiapaglia,Nick Dexter,Sebastian Moraga

Sparse polynomial approximation has become indispensable for approximating smooth, high- or infinite-dimensional functions from limited samples. This is a key task in computational science and engineering, e.g., surrogate modelling in uncertainty quantification where the function is the solution map of a parametric or stochastic differential equation (DE). Yet, sparse polynomial approximation lacks a complete theory. On the one hand, there is a well-developed theory of best $s$-term polynomial approximation, which asserts exponential or algebraic rates of convergence for holomorphic functions. On the other, there are increasingly mature methods such as (weighted) $\ell^1$-minimization for computing such approximations. While the sample complexity of these methods has been analyzed with compressed sensing, whether they achieve best $s$-term approximation rates is not fully understood. Furthermore, these methods are not algorithms per se, as they involve exact minimizers of nonlinear optimization problems. This paper closes these gaps. Specifically, we consider the following question: are there robust, efficient algorithms for computing approximations to finite- or infinite-dimensional, holomorphic and Hilbert-valued functions from limited samples that achieve best $s$-term rates? We answer this affirmatively by introducing algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic, and physical discretization errors. We tackle both scalar- and Hilbert-valued functions, this being key to parametric or stochastic DEs. Our results involve significant developments of existing techniques, including a novel restarted primal-dual iteration for solving weighted $\ell^1$-minimization problems in Hilbert spaces. Our theory is supplemented by numerical experiments demonstrating the efficacy of these algorithms.

知識 (knowledge) · INFORMS · 語言表示 · MoDELS · Extensibility ·

2022 年 7 月 28 日

MLRIP: Pre-training a military language representation model with informative factual knowledge and professional knowledge base

Hui Li,Xuekang Yang,Xin Zhao,Lin Yu,Jiping Zheng,Wei Sun

from arxiv, 11 pages, 6 figures

Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.

模型評估 · MoDELS · 學成 · AIM · 特化 ·

2019 年 1 月 14 日

Interpretable machine learning: definitions, methods, and applications

W. James Murdoch,Chandan Singh,Karl Kumbier,Reza Abbasi-Asl,Bin Yu

from arxiv, 11 pages

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.