清纯唯美另类亚洲欧美综合,欧美日韩国产视频,日韩激情免费视频一区二区,国产精品亚洲专区在线播放

The diffusion of AI and big data is reshaping decision-making processes by increasing the amount of information that supports decisions while reducing direct interaction with data and empirical evidence. This paradigm shift introduces new sources of uncertainty, as limited data observability results in ambiguity and a lack of interpretability. The need for the proper analysis of data-driven strategies motivates the search for new models that can describe this type of bounded access to knowledge. This contribution presents a novel theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We provide a dynamical description of knowledge states by endowing our model with a structure to compare and combine them. Specifically, an update is represented through combinations, and its explainability is based on its consistency in different dimensional representations. We look at inequivalent knowledge representations in terms of multiplicity of inferences, preference relations, and information measures. Furthermore, we define a formal analogy with two scenarios that illustrate non-classical uncertainty in terms of ambiguity (Ellsberg's model) and reasoning about knowledge mediated by other agents observing data (Wigner's friend). Finally, we discuss some implications of the proposed model for data-driven strategies, with special attention to reasoning under uncertainty about business value dimensions and the design of measurement tools for their assessment.

相關內容

知識 (knowledge)

關注 12

通過學習、實踐或探索所獲得的認識、判斷或技能。

相關系數 · 樣本 · 可行 · 優化器 · Minimax ·

2024 年 1 月 10 日

Tests of Missing Completely At Random based on sample covariance matrices

Alberto Bordino,Thomas B. Berrett

from arxiv, 88 pages, 15 figures

We study the problem of testing whether the missing values of a potentially high-dimensional dataset are Missing Completely at Random (MCAR). We relax the problem of testing MCAR to the problem of testing the compatibility of a sequence of covariance matrices, motivated by the fact that this procedure is feasible when the dimension grows with the sample size. Tests of compatibility can be used to test the feasibility of positive semi-definite matrix completion problems with noisy observations, and thus our results may be of independent interest. Our first contributions are to define a natural measure of the incompatibility of a sequence of correlation matrices, which can be characterised as the optimal value of a Semi-definite Programming (SDP) problem, and to establish a key duality result allowing its practical computation and interpretation. By studying the concentration properties of the natural plug-in estimator of this measure, we introduce novel hypothesis tests that we prove have power against all distributions with incompatible covariance matrices. The choice of critical values for our tests rely on a new concentration inequality for the Pearson sample correlation matrix, which may be of interest more widely. By considering key examples of missingness structures, we demonstrate that our procedures are minimax rate optimal in certain cases. We further validate our methodology with numerical simulations that provide evidence of validity and power, even when data are heavy tailed.

PCA · Analysis · 降維 · 數據填補 · 可約的 ·

2024 年 1 月 10 日

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Tu T. Do,Mai Anh Vu,Tuan L. Vo,Hoang Thien Ly,Thu Nguyen,Steven A. Hicks,Michael A. Riegler,P?l Halvorsen,Binh T. Nguyen

Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.

Performer · 預訓練 · 周期的 · 解碼 · 標注 ·

2024 年 1 月 10 日

Toward distortion-aware change detection in realistic scenarios

Yitao Zhao,Heng-Chao Li,Nanqing Liu,Rui Wang

In the conventional change detection (CD) pipeline, two manually registered and labeled remote sensing datasets serve as the input of the model for training and prediction. However, in realistic scenarios, data from different periods or sensors could fail to be aligned as a result of various coordinate systems. Geometric distortion caused by coordinate shifting remains a thorny issue for CD algorithms. In this paper, we propose a reusable self-supervised framework for bitemporal geometric distortion in CD tasks. The whole framework is composed of Pretext Representation Pre-training, Bitemporal Image Alignment, and Down-stream Decoder Fine-Tuning. With only single-stage pre-training, the key components of the framework can be reused for assistance in the bitemporal image alignment, while simultaneously enhancing the performance of the CD decoder. Experimental results in 2 large-scale realistic scenarios demonstrate that our proposed method can alleviate the bitemporal geometric distortion in CD tasks.

特征提取器 · MoDELS · state-of-the-art · 數據集 · Vision ·

2024 年 1 月 9 日

Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Benedikt Roth,Valentin Koch,Sophia J. Wagner,Julia A. Schnabel,Carsten Marr,Tingying Peng

To handle the large scale of whole slide images in computational pathology, most approaches first tessellate the images into smaller patches, extract features from these patches, and finally aggregate the feature vectors with weakly-supervised learning. The performance of this workflow strongly depends on the quality of the extracted features. Recently, foundation models in computer vision showed that leveraging huge amounts of data through supervised or self-supervised learning improves feature quality and generalizability for a variety of tasks. In this study, we benchmark the most popular vision foundation models as feature extractors for histopathology data. We evaluate the models in two settings: slide-level classification and patch-level classification. We show that foundation models are a strong baseline. Our experiments demonstrate that by finetuning a foundation model on a single GPU for only two hours or three days depending on the dataset, we can match or outperform state-of-the-art feature extractors for computational pathology. These findings imply that even with little resources one can finetune a feature extractor tailored towards a specific downstream task and dataset. This is a considerable shift from the current state, where only few institutions with large amounts of resources and datasets are able to train a feature extractor. We publish all code used for training and evaluation as well as the finetuned models.

Branch · MoDELS · Engineering · Integration · 分解的 ·

2024 年 1 月 9 日

Modeling dynamic crack branching in unsaturated porous media through multi-phase micro-periporomechanics

Hossein Pashazad,Xiaoyu Song

Dynamic crack branching in unsaturated porous media holds significant relevance in various fields, including geotechnical engineering, geosciences, and petroleum engineering. This article presents a numerical investigation into dynamic crack branching in unsaturated porous media using a recently developed coupled micro-periporomechanics paradigm. This paradigm extends the periporomechanics model by incorporating the micro-rotation of the solid skeleton. Within this framework, each material point is equipped with three degrees of freedom: displacement, micro-rotation, and fluid pressure. Consistent with the Cosserat continuum theory, a length scale associated with the micro-rotation of material points is inherently integrated into the model. This study encompasses several key aspects: (1) Validation of the coupled micro-periporomechanics paradigm for effectively modeling crack branching in deformable porous media, (2) Examination of the transition from a single branch to multiple branches in porous media under drained conditions, (3) Simulation of single crack branching in unsaturated porous media under dynamic loading conditions, and (4) Investigation of multiple crack branching in unsaturated porous media under dynamic loading conditions. The numerical results obtained in this study are systematically analyzed to elucidate the factors that influence dynamic crack branching in porous media subjected to dynamic loading. Furthermore, the comprehensive numerical findings underscore the efficacy and robustness of the coupled micro-periporomechanics paradigm in accurately modeling dynamic crack branching in variably saturated porous media.

剪枝 · 簇 · 模型評估 · ImageNet (數據集) · Performer ·

2024 年 1 月 9 日

Effective pruning of web-scale datasets based on complexity of concept clusters

Amro Abbas,Evgenia Rusak,Kushal Tirumala,Wieland Brendel,Kamalika Chaudhuri,Ari S. Morcos

from arxiv, Oral at the DataComp Workshop, ICCV 2023

Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today's most effective pruning method on ImageNet clusters data samples into separate concepts according to their embedding and prunes away the most prototypical samples. We scale this approach to LAION and improve it by noting that the pruning rate should be concept-specific and adapted to the complexity of the concept. Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training. By filtering from the LAION dataset, we find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. More specifically, we are able to outperform the LAION-trained OpenCLIP-ViT-B32 model on ImageNet zero-shot accuracy by 1.1p.p. while only using 27.7% of the data and training compute. Despite a strong reduction in training cost, we also see improvements on ImageNet dist. shifts, retrieval tasks and VTAB. On the DataComp Medium benchmark, we achieve a new state-of-the-art ImageNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.

INFORMS · 特化 · 推薦系統 · 數據可用性 · Things ·

2024 年 1 月 9 日

An AI-based solution for the cold start and data sparsity problems in the recommendation systems

Shahriar Shakir Sumit

from arxiv, want to do experiment on proposed methods

In recent years, the amount of data available on the internet and the number of users who utilize the Internet have increased at an unparalleled pace. The exponential development in the quantity of digital information accessible and the number of Internet users has created the possibility for information overload, impeding fast access to items of interest on the Internet. Information retrieval systems like as Google, DevilFinder, and Altavista have partly overcome this challenge, but prioritizing and customization of information (where a system maps accessible material to a user's interests and preferences) were lacking. This has resulted in a higher-than-ever need for recommender systems. Recommender systems are information filtering systems that address the issue of information overload by filtering important information fragments from a huge volume of dynamically produced data based on the user's interests, favorite things, preferences and ratings on the desired item. Recommender systems can figure out if a person would like an item or not based on their profile.

圖 · 結點 · 鏈路預測 · 時間步 · Analysis ·

2024 年 1 月 8 日

Predicting the structure of dynamic graphs

Sevvandi Kandanaarachchi

Dynamic graph embeddings, inductive and incremental learning facilitate predictive tasks such as node classification and link prediction. However, predicting the structure of a graph at a future time step from a time series of graphs, allowing for new nodes has not gained much attention. In this paper, we present such an approach. We use time series methods to predict the node degree at future time points and combine it with flux balance analysis -- a linear programming method used in biochemistry -- to obtain the structure of future graphs. Furthermore, we explore the predictive graph distribution for different parameter values. We evaluate this method using synthetic and real datasets and demonstrate its utility and applicability.

INFORMS · 近似 · 線性的 · 泛函 · 相互獨立的 ·

2024 年 1 月 8 日

On the power of iid information for linear approximation

Mathias Sonnleitner,Mario Ullrich

from arxiv, 63 pages

This survey is concerned with the power of random information for approximation in the (deterministic) worst-case setting, with special emphasis on information consisting of functionals selected independently and identically distributed (iid) at random on a class of admissible information functionals. We present a general result based on a weighted least squares method and derive consequences for special cases. Improvements are available if the information is ``Gaussian'' or if we consider iid function values for Sobolev spaces. We include open questions to guide future research on the power of random information in the context of information-based complexity.

圖片分類 · Learning · tuning · 超參數 · 模型評估 ·

2024 年 1 月 7 日

Systematic comparison of semi-supervised and self-supervised learning for medical image classification

Zhe Huang,Ruijie Jiang,Shuchin Aeron,Michael C. Hughes

from arxiv, Semi-supervised Learning; Self-supervised Learning; Medical Imaging

In many medical image classification problems, labeled data is scarce while unlabeled data is more available. Semi-supervised learning and self-supervised learning are two different research directions that can improve accuracy by learning from extra unlabeled data. Recent methods from both directions have reported significant gains on traditional benchmarks. Yet past benchmarks do not focus on medical tasks and rarely compare self- and semi- methods together on equal footing. Furthermore, past benchmarks often handle hyperparameter tuning suboptimally. First, they may not tune hyperparameters at all, leading to underfitting. Second, when tuning does occur, it often unrealistically uses a labeled validation set much larger than the train set. Both cases make previously published rankings of methods difficult to translate to practical settings. This study contributes a systematic evaluation of self- and semi- methods with a unified experimental protocol intended to guide a practitioner with scarce overall labeled data and a limited compute budget. We answer two key questions: Can hyperparameter tuning be effective with realistic-sized validation sets? If so, when all methods are tuned well, which self- or semi-supervised methods reach the best accuracy? Our study compares 13 representative semi- and self-supervised methods to strong labeled-set-only baselines on 4 medical datasets. From 20000+ total GPU hours of computation, we provide valuable best practices to resource-constrained, results-focused practitioners.