好诱人的搜子好爽免费观看,精品自在线观看影片天天看

Inclinometer probes are devices that can be used to measure deformations within earthwork slopes. This paper demonstrates a novel application of Bayesian techniques to real-world inclinometer data, providing both anomaly detection and forecasting. Specifically, this paper details an analysis of data collected from inclinometer data across the entire UK rail network. Practitioners have effectively two goals when processing monitoring data. The first is to identify any anomalous or dangerous movements, and the second is to predict potential future adverse scenarios by forecasting. In this paper we apply Uncertainty Quantification (UQ) techniques by implementing a Bayesian approach to anomaly detection and forecasting for inclinometer data. Subsequently, both costs and risks may be minimised by quantifying and evaluating the appropriate uncertainties. This framework may then act as an enabler for enhanced decision making and risk analysis. We show that inclinometer data can be described by a latent autocorrelated Markov process derived from measurements. This can be used as the transition model of a non-linear Bayesian filter. This allows for the prediction of system states. This learnt latent model also allows for the detection of anomalies: observations that are far from their expected value may be considered to have `high surprisal', that is they have a high information content relative to the model encoding represented by the learnt latent model. We successfully apply the forecasting and anomaly detection techniques to a large real-world data set in a computationally efficient manner. Although this paper studies inclinometers in particular, the techniques are broadly applicable to all areas of engineering UQ and Structural Health Monitoring (SHM).

相關內容

異常檢測

關注 102

在數據挖掘中，異常檢測（英語：anomaly detection）對不符合預期模式或數據集中其他項目的項目、事件或觀測值的識別。通常異常項目會轉變成銀行欺詐、結構缺陷、醫療問題、文本錯誤等類型的問題。異常也被稱為離群值、新奇、噪聲、偏差和例外。特別是在檢測濫用與網絡入侵時，有趣性對象往往不是罕見對象，但卻是超出預料的突發活動。這種模式不遵循通常統計定義中把異常點看作是罕見對象，于是許多異常檢測方法（特別是無監督的方法）將對此類數據失效，除非進行了合適的聚集。相反，聚類分析算法可能可以檢測出這些模式形成的微聚類。有三大類異常檢測方法。[1] 在假設數據集中大多數實例都是正常的前提下，無監督異常檢測方法能通過尋找與其他數據最不匹配的實例來檢測出未標記測試數據的異常。監督式異常檢測方法需要一個已經被標記“正常”與“異常”的數據集，并涉及到訓練分類器（與許多其他的統計分類問題的關鍵區別是異常檢測的內在不均衡性）。半監督式異常檢測方法根據一個給定的正常訓練數據集創建一個表示正常行為的模型，然后檢測由學習模型生成的測試實例的可能性。

數據集 · 代碼 · HTTPS · 評論員 · 可理解性 ·

2023 年 8 月 24 日

npm-follower: A Complete Dataset Tracking the NPM Ecosystem

Donald Pinckney,Federico Cassano,Arjun Guha,Jonathan Bell

Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of packages. However, prior work on NPM dataset construction typically has two limitations: 1) only metadata is scraped, and 2) packages or versions that are deleted from NPM can not be scraped. Over 330,000 versions of packages were deleted from NPM between July 2022 and May 2023. This data is critical for researchers as it often pertains to important questions of security and malware. We present npm-follower, a dataset and crawling architecture which archives metadata and code of all packages and versions as they are published, and is thus able to retain data which is later deleted. The dataset currently includes over 35 million versions of packages, and grows at a rate of about 1 million versions per month. The dataset is designed to be easily used by researchers answering questions involving either metadata or program analysis. Both the code and dataset are available at //dependencies.science.

知識 (knowledge) · 圖 · 論文 · 知識圖譜 · INFORMS ·

2023 年 8 月 23 日

An approach based on Open Research Knowledge Graph for Knowledge Acquisition from scientific papers

Azanzi Jiomekong,Sanju Tiwari

A scientific paper can be divided into two major constructs which are Metadata and Full-body text. Metadata provides a brief overview of the paper while the Full-body text contains key-insights that can be valuable to fellow researchers. To retrieve metadata and key-insights from scientific papers, knowledge acquisition is a central activity. It consists of gathering, analyzing and organizing knowledge embedded in scientific papers in such a way that it can be used and reused whenever needed. Given the wealth of scientific literature, manual knowledge acquisition is a cumbersome task. Thus, computer-assisted and (semi-)automatic strategies are generally adopted. Our purpose in this research was two fold: curate Open Research Knowledge Graph (ORKG) with papers related to ontology learning and define an approach using ORKG as a computer-assisted tool to organize key-insights extracted from research papers. This approach was used to document the "epidemiological surveillance systems design and implementation" research problem and to prepare the related work of this paper. It is currently used to document "food information engineering", "Tabular data to Knowledge Graph Matching" and "Question Answering" research problems and "Neuro-symbolic AI" domain.

語言模型化 · MoDELS · GROUP · GPT-4 · 控制器 ·

2023 年 8 月 23 日

Devising and Detecting Phishing: large language models vs. Smaller Human Models

Fredrik Heiding,Bruce Schneier,Arun Vishwanath,Jeremy Bernstein

AI programs, built using large language models, make it possible to automatically create phishing emails based on a few data points about a user. They stand in contrast to traditional phishing emails that hackers manually design using general rules gleaned from experience. The V-Triad is an advanced set of rules for manually designing phishing emails to exploit our cognitive heuristics and biases. In this study, we compare the performance of phishing emails created automatically by GPT-4 and manually using the V-Triad. We also combine GPT-4 with the V-Triad to assess their combined potential. A fourth group, exposed to generic phishing emails, was our control group. We utilized a factorial approach, sending emails to 112 randomly selected participants recruited for the study. The control group emails received a click-through rate between 19-28%, the GPT-generated emails 30-44%, emails generated by the V-Triad 69-79%, and emails generated by GPT and the V-Triad 43-81%. Each participant was asked to explain for why they pressed or did not press a link in the email. These answers often contradict each other, highlighting the need for personalized content. The cues that make one person avoid phishing emails make another person fall for them. Next, we used four popular large language models (GPT, Claude, PaLM, and LLaMA) to detect the intention of phishing emails and compare the results to human detection. The language models demonstrated a strong ability to detect malicious intent, even in non-obvious phishing emails. They sometimes surpassed human detection, although often being slightly less accurate than humans.

Continuity · prototype · Learning · 在線 · 類別 ·

2023 年 8 月 23 日

Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement

Fushuo Huo,Wenchao Xu,Jingcai Guo,Haozhao Wang,Yunfeng Fan,Song Guo

This paper investigates a new, practical, but challenging problem named Non-exemplar Online Class-incremental continual Learning (NO-CL), which aims to preserve the discernibility of base classes without buffering data examples and efficiently learn novel classes continuously in a single-pass (i.e., online) data stream. The challenges of this task are mainly two-fold: (1) Both base and novel classes suffer from severe catastrophic forgetting as no previous samples are available for replay. (2) As the online data can only be observed once, there is no way to fully re-train the whole model, e.g., re-calibrate the decision boundaries via prototype alignment or feature distillation. In this paper, we propose a novel Dual-prototype Self-augment and Refinement method (DSR) for NO-CL problem, which consists of two strategies: 1) Dual class prototypes: vanilla and high-dimensional prototypes are exploited to utilize the pre-trained information and obtain robust quasi-orthogonal representations rather than example buffers for both privacy preservation and memory reduction. 2) Self-augment and refinement: Instead of updating the whole network, we optimize high-dimensional prototypes alternatively with the extra projection module based on self-augment vanilla prototypes, through a bi-level optimization problem. Extensive experiments demonstrate the effectiveness and superiority of the proposed DSR in NO-CL.

Automator · Analysis · MoDELS · 優化器 · 控制器 ·

2023 年 8 月 23 日

Resiliency Analysis of LLM generated models for Industrial Automation

Oluwatosin Ogundare,Gustavo Quiros Araya,Ioannis Akrotirianakis,Ankit Shukla

from arxiv, 8 Pages, Conference Manuscript

This paper proposes a study of the resilience and efficiency of automatically generated industrial automation and control systems using Large Language Models (LLMs). The approach involves modeling the system using percolation theory to estimate its resilience and formulating the design problem as an optimization problem subject to constraints. Techniques from stochastic optimization and regret analysis are used to find a near-optimal solution with provable regret bounds. The study aims to provide insights into the effectiveness and reliability of automatically generated systems in industrial automation and control, and to identify potential areas for improvement in their design and implementation.

編譯器 · 語言模型化 · DCC · Integration · MoDELS ·

2023 年 8 月 23 日

Integrating Large Language Models into the Debugging C Compiler for generating contextual error explanations

Andrew Taylor,Alexandra Vassar,Jake Renzella,Hammond Pearce

from arxiv, 7 pages, 2 figures

This paper introduces a method for Large Language Models (LLM) to produce enhanced compiler error explanations, in simple language, within our Debugging C Compiler (DCC). It is well documented that compiler error messages have been known to present a barrier for novices learning how to program. Although our initial use of DCC in introductory programming (CS1) has been instrumental in teaching C to novice programmers by providing safeguards to commonly occurring errors and translating the usually cryptic compiler error messages at both compile- and run-time, we proposed that incorporating LLM-generated explanations would further enhance the learning experience for novice programmers. Through an expert evaluation, we observed that LLM-generated explanations for compiler errors were conceptually accurate in 90% of compile-time errors, and 75% of run-time errors. Additionally, the new DCC-help tool has been increasingly adopted by students, with an average of 1047 unique runs per week, demonstrating a promising initial assessment of using LLMs to complement compiler output to enhance programming education for beginners. We release our tool as open-source to the community.

語音識別 · Learning · Performer · SimPLe · 未標記 ·

2023 年 8 月 23 日

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

Antoine Nzeyimana

from arxiv, 9 pages, 2 figures, 5 tables

Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in four successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.9% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda.

時間步 · 分離的 · SimPLe · 標量 · contrastive ·

2023 年 8 月 22 日

Multi-temporal decomposition for elastoplastic ratcheting solids

Jacinto Ulloa,Geert Degrande,José E. Andrade,Stijn Fran?ois

This paper presents a multi-temporal formulation for simulating elastoplastic solids under cyclic loading. We leverage the proper generalized decomposition (PGD) to decompose the displacements into multiple time scales, separating the spatial and intra-cyclic dependence from the inter-cyclic variation. In contrast with the standard incremental approach, which solves the (non-linear and computationally intensive) mechanical balance equations at every time step, the proposed PGD approach allows the mechanical balance equations to be solved exclusively for the small-time intra-cyclic response, while the large-time inter-cyclic response is described by simple scalar algebraic equations. Numerical simulations exhibiting complex cyclic responses, including a 2D problem and an application to a monopile foundation, demonstrate that PGD solutions with a limited number of space-time degrees of freedom may be obtained numerically, only requiring a few modes to accurately capture the reference response.

Microsoft Surface · Neural Networks · Networking · MoDELS · 損失函數（機器學習） ·

2021 年 5 月 28 日

Incorporating prior financial domain knowledge into neural networks for implied volatility surface prediction

Yu Zheng,Yongxin Yang,Bowei Chen

from arxiv, 8 pages, SIGKDD 2021

In this paper we develop a novel neural network model for predicting implied volatility surface. Prior financial domain knowledge is taken into account. A new activation function that incorporates volatility smile is proposed, which is used for the hidden nodes that process the underlying asset price. In addition, financial conditions, such as the absence of arbitrage, the boundaries and the asymptotic slope, are embedded into the loss function. This is one of the very first studies which discuss a methodological framework that incorporates prior financial domain knowledge into neural network architecture design and model training. The proposed model outperforms the benchmarked models with the option data on the S&P 500 index over 20 years. More importantly, the domain knowledge is satisfied empirically, showing the model is consistent with the existing financial theories and conditions related to implied volatility surface.

圖片分類 · 前饋網絡 · INTERACT · Networking · 前饋 ·

2021 年 5 月 7 日

ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron,Piotr Bojanowski,Mathilde Caron,Matthieu Cord,Alaaeldin El-Nouby,Edouard Grave,Armand Joulin,Gabriel Synnaeve,Jakob Verbeek,Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.