斗破苍穹第四季25集免费观看,日韩精品大片一区二区三区四区,国产精品一区二区三区色噜噜,2020国产午夜福利免费看

Query cost estimation is a classical task for database management. Recently, researchers apply the AI-driven model to implement query cost estimation for achieving high accuracy. However, two defects of feature design lead to poor cost estimation accuracy-time efficiency. On the one hand, existing works only encode the query plan and data statistics while ignoring some other important variables, like storage structure, hardware, database knobs, etc. These variables also have significant impact on the query cost. On the other hand, due to the straightforward encoding design, existing works suffer heavy representation learning burden on ineffective dimensions of input. To meet the above two problems, we first propose an efficient feature engineering for query cost estimation, called QCFE. Specifically, we design a novel feature called feature snapshot to efficiently integrate the influences of the ignored variables. Further, we propose a difference-propagation feature reduction method for query cost estimation to filter the useless features. The experimental results demonstrate our QCFE could largely improve the time-accuracy efficiency on extensive benchmarks.

相關內容

估計/估計量

關注 3

似然 · 廣義線性模型 · 線性的 · 線性模型 · 模型評估 ·

2023 年 11 月 15 日

Posterior accuracy and calibration under misspecification in Bayesian generalized linear models

Maximilian Scholz,Paul-Christian Bürkner

Generalized linear models (GLMs) are popular for data-analysis in almost all quantitative sciences, but the choice of likelihood family and link function is often difficult. This motivates the search for likelihoods and links that minimize the impact of potential misspecification. We perform a large-scale simulation study on double-bounded and lower-bounded response data where we systematically vary both true and assumed likelihoods and links. In contrast to previous studies, we also study posterior calibration and uncertainty metrics in addition to point-estimate accuracy. Our results indicate that certain likelihoods and links can be remarkably robust to misspecification, performing almost on par with their respective true counterparts. Additionally, normal likelihood models with identity link (i.e., linear regression) often achieve calibration comparable to the more structurally faithful alternatives, at least in the studied scenarios. On the basis of our findings, we provide practical suggestions for robust likelihood and link choices in GLMs.

Performer · Learning · Markov · Machine Learning · 學習器 ·

2023 年 11 月 14 日

Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models

Gabriel Rodrigues Palma,Conor Thornberry,Seán Commins,Rafael de Andrade Moral

from arxiv, 25 pages

Theta oscillations, ranging from 4-8 Hz, play a significant role in spatial learning and memory functions during navigation tasks. Frontal theta oscillations are thought to play an important role in spatial navigation and memory. Electroencephalography (EEG) datasets are very complex, making any changes in the neural signal related to behaviour difficult to interpret. However, multiple analytical methods are available to examine complex data structure, especially machine learning based techniques. These methods have shown high classification performance and the combination with feature engineering enhances the capability of these methods. This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data. Based on the engineered features obtained from frontal theta EEG data during a spatial navigation task in two key trials (first, last) and between two conditions (learner and non-learner), we analysed the performance of six machine learning methods (Polynomial Support Vector Machines, Non-linear Support Vector Machines, Random Forests, K-Nearest Neighbours, Ridge, and Deep Neural Networks) on classifying learner and non-learner participants. We also analysed how different standardisation methods used to pre-process the EEG data contribute to classification performance. We compared the classification performance of each trial with data gathered from the same subjects, including solely coordinate-based features, such as idle time and average speed. We found that more machine learning methods perform better classification using coordinate-based data. However, only deep neural networks achieved an area under the ROC curve higher than 80% using the theta EEG data alone. Our findings suggest that standardising the theta EEG data and using deep neural networks enhances the classification of learner and non-learner subjects in a spatial learning task.

MoDELS · Performer · 最大后驗 · 查準率/準確率 · 穩健性 ·

2023 年 11 月 13 日

Centralized calibration of power system dynamic models using variational data assimilation

Ahmed Attia,D. Adrian Maldonado,Emil Constantinescu,Mihai Anitescu

from arxiv, 9 pages, 8 figures, and 1 table

This paper presents a novel centralized, variational data assimilation approach for calibrating transient dynamic models in electrical power systems, focusing on load model parameters. With the increasing importance of inverter-based resources, assessing power systems' dynamic performance under disturbances has become challenging, necessitating robust model calibration methods. The proposed approach expands on previous Bayesian frameworks by establishing a posterior distribution of parameters using an approximation around the maximum a posteriori value. We illustrate the efficacy of our method by generating events of varying intensity, highlighting its ability to capture the systems' evolution accurately and with associated uncertainty estimates. This research improves the precision of dynamic performance assessments in modern power systems, with potential applications in managing uncertainties and optimizing system operations.

估計/估計量 · Machine Learning · 學習器 · 梯度提升機 · Learning ·

2023 年 11 月 13 日

Machine learning for uncertainty estimation in fusing precipitation observations from satellites and ground-based gauges

Georgia Papacharalampous,Hristos Tyralis,Nikolaos Doulamis,Anastasios Doulamis

To form precipitation datasets that are accurate and, at the same time, have high spatial densities, data from satellites and gauges are often merged in the literature. However, uncertainty estimates for the data acquired in this manner are scarcely provided, although the importance of uncertainty quantification in predictive modelling is widely recognized. Furthermore, the benefits that machine learning can bring to the task of providing such estimates have not been broadly realized and properly explored through benchmark experiments. The present study aims at filling in this specific gap by conducting the first benchmark tests on the topic. On a large dataset that comprises 15-year-long monthly data spanning across the contiguous United States, we extensively compared six learners that are, by their construction, appropriate for predictive uncertainty quantification. These are the quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM) and quantile regression neural networks (QRNN). The comparison referred to the competence of the learners in issuing predictive quantiles at nine levels that facilitate a good approximation of the entire predictive probability distribution, and was primarily based on the quantile and continuous ranked probability skill scores. Three types of predictor variables (i.e., satellite precipitation variables, distances between a point of interest and satellite grid points, and elevation at a point of interest) were used in the comparison and were additionally compared with each other. This additional comparison was based on the explainable machine learning concept of feature importance. The results suggest that the order from the best to the worst of the learners for the task investigated is the following: LightGBM, QRF, GRF, GBM, QRNN and QR...

XAI · Performer · INFORMS · 控制器 · Less ·

2023 年 11 月 13 日

The benefits and costs of explainable artificial intelligence in visual quality control: Evidence from fault detection performance and eye movements

Romy Müller,David F. Reindel,Yannick D. Stadtfeld

Visual inspection tasks often require humans to cooperate with AI-based image classifiers. To enhance this cooperation, explainable artificial intelligence (XAI) can highlight those image areas that have contributed to an AI decision. However, the literature on visual cueing suggests that such XAI support might come with costs of its own. To better understand how the benefits and cost of XAI depend on the accuracy of AI classifications and XAI highlights, we conducted two experiments that simulated visual quality control in a chocolate factory. Participants had to decide whether chocolate moulds contained faulty bars or not, and were always informed whether the AI had classified the mould as faulty or not. In half of the experiment, they saw additional XAI highlights that justified this classification. While XAI speeded up performance, its effects on error rates were highly dependent on (X)AI accuracy. XAI benefits were observed when the system correctly detected and highlighted the fault, but XAI costs were evident for misplaced highlights that marked an intact area while the actual fault was located elsewhere. Eye movement analyses indicated that participants spent less time searching the rest of the mould and thus looked at the fault less often. However, we also observed large interindividual differences. Taken together, the results suggest that despite its potentials, XAI can discourage people from investing effort into their own information analysis.

MoDELS · 可理解性 · 多樣性 · 數據集 · Elevate ·

2023 年 11 月 12 日

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu,Zicheng Zhang,Erli Zhang,Chaofeng Chen,Liang Liao,Annan Wang,Kaixin Xu,Chunyi Li,Jingwen Hou,Guangtao Zhai,Geng Xue,Wenxiu Sun,Qiong Yan,Weisi Lin

from arxiv, 16 pages, 11 figures, page 12-16 as appendix

Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhance these models, we conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision. Each feedback follows a pathway that starts with a detailed description on the low-level visual appearance (*e.g. clarity, color, brightness* of an image, and ends with an overall conclusion, with an average length of 45 words. The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images with diverse low-level appearance. Moreover, to enable foundation models to robustly respond to diverse types of questions, we design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs. Experimental results indicate that the **Q-Instruct** consistently elevates low-level perception and understanding abilities across several foundational models. We anticipate that our datasets can pave the way for a future that general intelligence can perceive, understand low-level visual appearance and evaluate visual quality like a human. Our dataset, model zoo, and demo is published at: //q-future.github.io/Q-Instruct.

Performer · MoDELS · state-of-the-art · 語言模型化 · Better ·

2023 年 11 月 10 日

Is it indeed bigger better? The comprehensive study of claim detection LMs applied for disinformation tackling

Martin Hyben,Sebastian Kula,Ivan Srba,Robert Moro,Jakub Simko

from arxiv, 27 pages, 10 figures

This study compares the performance of (1) fine-tuned models and (2) extremely large language models on the task of check-worthy claim detection. For the purpose of the comparison we composed a multilingual and multi-topical dataset comprising texts of various sources and styles. Building on this, we performed a benchmark analysis to determine the most general multilingual and multi-topical claim detector. We chose three state-of-the-art models in the check-worthy claim detection task and fine-tuned them. Furthermore, we selected three state-of-the-art extremely large language models without any fine-tuning. We made modifications to the models to adapt them for multilingual settings and through extensive experimentation and evaluation. We assessed the performance of all the models in terms of accuracy, recall, and F1-score in in-domain and cross-domain scenarios. Our results demonstrate that despite the technological progress in the area of natural language processing, the models fine-tuned for the task of check-worthy claim detection still outperform the zero-shot approaches in a cross-domain settings.

WEB · MoDELS · 得分 · INFORMS · 語言模型化 ·

2023 年 11 月 10 日

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

Jianghao Chen,Pu Jian,Tengxiao Xi,Dongyi Yi,Qianlong Du,Chenglin Ding,Guibo Zhu,Chengqing Zong,Jinqiao Wang,Jiajun Zhang

During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%.

優化器 · Learning · 推斷 · 后驗推斷 · MoDELS ·

2023 年 11 月 9 日

Optimal simulation-based Bayesian decisions

Justin Alsing,Thomas D. P. Edwards,Benjamin Wandelt

from arxiv, 12 pages, 4 figures

We present a framework for the efficient computation of optimal Bayesian decisions under intractable likelihoods, by learning a surrogate model for the expected utility (or its distribution) as a function of the action and data spaces. We leverage recent advances in simulation-based inference and Bayesian optimization to develop active learning schemes to choose where in parameter and action spaces to simulate. This allows us to learn the optimal action in as few simulations as possible. The resulting framework is extremely simulation efficient, typically requiring fewer model calls than the associated posterior inference task alone, and a factor of $100-1000$ more efficient than Monte-Carlo based methods. Our framework opens up new capabilities for performing Bayesian decision making, particularly in the previously challenging regime where likelihoods are intractable, and simulations expensive.

學成 · Performer · 深度學習 · Processing（編程語言） · 圖像處理 ·

2018 年 7 月 31 日

Deep learning in agriculture: A survey

Andreas Kamilaris,Francesc X. Prenafeta-Boldu

Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.