亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.

相關內容

Digital transformation in buildings accumulates massive operational data, which calls for smart solutions to utilize these data to improve energy performance. This study has proposed a solution, namely Deep Energy Twin, for integrating deep learning and digital twins to better understand building energy use and identify the potential for improving energy efficiency. Ontology was adopted to create parametric digital twins to provide consistency of data format across different systems in a building. Based on created digital twins and collected data, deep learning methods were used for performing data analytics to identify patterns and provide insights for energy optimization. As a demonstration, a case study was conducted in a public historic building in Norrk\"oping, Sweden, to compare the performance of state-of-the-art deep learning architectures in building energy forecasting.

We present quantitative logics with two-step semantics based on the framework of quantitative logics introduced by Arenas et al. (2020) and the two-step semantics defined in the context of weighted logics by Gastin & Monmege (2018). We show that some of the fragments of our logics augmented with a least fixed point operator capture interesting classes of counting problems. Specifically, we answer an open question in the area of descriptive complexity of counting problems by providing logical characterizations of two subclasses of #P, namely SpanL and TotP, that play a significant role in the study of approximable counting problems. Moreover, we define logics that capture FPSPACE and SpanPSPACE, which are counting versions of PSPACE.

Background & Objective: Biomedical text data are increasingly available for research. Tokenization is an initial step in many biomedical text mining pipelines. Tokenization is the process of parsing an input biomedical sentence (represented as a digital character sequence) into a discrete set of word/token symbols, which convey focused semantic/syntactic meaning. The objective of this study is to explore variation in tokenizer outputs when applied across a series of challenging biomedical sentences. Method: Diaz [2015] introduce 24 challenging example biomedical sentences for comparing tokenizer performance. In this study, we descriptively explore variation in outputs of eight tokenizers applied to each example biomedical sentence. The tokenizers compared in this study are the NLTK white space tokenizer, the NLTK Penn Tree Bank tokenizer, Spacy and SciSpacy tokenizers, Stanza/Stanza-Craft tokenizers, the UDPipe tokenizer, and R-tokenizers. Results: For many examples, tokenizers performed similarly effectively; however, for certain examples, there were meaningful variation in returned outputs. The white space tokenizer often performed differently than other tokenizers. We observed performance similarities for tokenizers implementing rule-based systems (e.g. pattern matching and regular expressions) and tokenizers implementing neural architectures for token classification. Oftentimes, the challenging tokens resulting in the greatest variation in outputs, are those words which convey substantive and focused biomedical/clinical meaning (e.g. x-ray, IL-10, TCR/CD3, CD4+ CD8+, and (Ca2+)-regulated). Conclusion: When state-of-the-art, open-source tokenizers from Python and R were applied to a series of challenging biomedical example sentences, we observed subtle variation in the returned outputs.

With the increasing adoption of AI-based systems across everyday life, the need to understand their decision-making mechanisms is correspondingly accelerating. The level at which we can trust the statistical inferences made from AI-based decision systems is an increasing concern, especially in high-risk systems such as criminal justice or medical diagnosis, where incorrect inferences may have tragic consequences. Despite their successes in providing solutions to problems involving real-world data, deep learning (DL) models cannot quantify the certainty of their predictions. And are frequently quite confident, even when their solutions are incorrect. This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text by employing techniques from topological and geometric data analysis. We create a graph of a model's prediction space and cluster the inputs into the graph's vertices by the similarity of features and prediction statistics. We then extract subgraphs demonstrating high-predictive accuracy for a given label. These subgraphs contain a wealth of information about features that the DL model has recognized as relevant to its decisions. We infer these features for a given label using a distance metric between probability measures, and demonstrate the stability of our method compared to the LIME interpretability method. This work demonstrates that we may gain insights into the decision mechanism of a DL model, which allows us to ascertain if the model is making its decisions based on information germane to the problem or identifies extraneous patterns within the data.

Few-weight codes over finite chain rings are associated with combinatorial objects such as strongly regular graphs (SRGs), strongly walk-regular graphs (SWRGs) and finite geometries, and are also widely used in data storage systems and secret sharing schemes. The first objective of this paper is to characterize all possible parameters of Plotkin-optimal two-homogeneous weight regular projective codes over finite chain rings, as well as their weight distributions. We show the existence of codes with these parameters by constructing an infinite family of two-homogeneous weight codes. The parameters of their Gray images have the same weight distribution as that of the two-weight codes of type SU1 in the sense of Calderbank and Kantor (Bull Lond Math Soc 18: 97-122, 1986). Further, we also construct three-homogeneous weight regular projective codes over finite chain rings combined with some known results. Finally, we study applications of our constructed codes in secret sharing schemes and graph theory. In particular, infinite families of SRGs and SWRGs with non-trivial parameters are obtained.

Local search is a powerful heuristic in optimization and computer science, the complexity of which was studied in the white box and black box models. In the black box model, we are given a graph $G = (V,E)$ and oracle access to a function $f : V \to \mathbb{R}$. The local search problem is to find a vertex $v$ that is a local minimum, i.e. with $f(v) \leq f(u)$ for all $(u,v) \in E$, using as few queries as possible. The query complexity is well understood on the grid and the hypercube, but much less is known beyond. We show the query complexity of local search on $d$-regular expanders with constant degree is $\Omega\left(\frac{\sqrt{n}}{\log{n}}\right)$, where $n$ is the number of vertices. This matches within a logarithmic factor the upper bound of $O(\sqrt{n})$ for constant degree graphs from Aldous (1983), implying that steepest descent with a warm start is an essentially optimal algorithm for expanders. The best lower bound known from prior work was $\Omega\left(\frac{\sqrt[8]{n}}{\log{n}}\right)$, shown by Santha and Szegedy (2004) for quantum and randomized algorithms. We obtain this result by considering a broader framework of graph features such as vertex congestion and separation number. We show that for each graph, the randomized query complexity of local search is $\Omega\left(\frac{n^{1.5}}{g}\right)$, where $g$ is the vertex congestion of the graph; and $\Omega\left(\sqrt[4]{\frac{s}{\Delta}}\right)$, where $s$ is the separation number and $\Delta$ is the maximum degree. For separation number the previous bound was $\Omega\left(\sqrt[8]{\frac{s}{\Delta}} /\log{n}\right)$, given by Santha and Szegedy for quantum and randomized algorithms. We also show a variant of the relational adversary method from Aaronson (2006), which is asymptotically at least as strong as the version in Aaronson (2006) for all randomized algorithms and strictly stronger for some problems.

Robust inference based on the minimization of statistical divergences has proved to be a useful alternative to classical techniques based on maximum likelihood and related methods. Basu et al. (1998) introduced the density power divergence (DPD) family as a measure of discrepancy between two probability density functions and used this family for robust estimation of the parameter for independent and identically distributed data. Ghosh et al. (2017) proposed a more general class of divergence measures, namely the S-divergence family and discussed its usefulness in robust parametric estimation through several asymptotic properties and some numerical illustrations. In this paper, we develop the results concerning the asymptotic breakdown point for the minimum S-divergence estimators (in particular the minimum DPD estimator) under general model setups. The primary result of this paper provides lower bounds to the asymptotic breakdown point of these estimators which are independent of the dimension of the data, in turn corroborating their usefulness in robust inference under high dimensional data.

We propose a data segmentation methodology for the high-dimensional linear regression problem where the regression parameters are allowed to undergo multiple changes. The proposed methodology, MOSEG, proceeds in two stages where the data is first scanned for multiple change points using a moving window-based procedure, which is followed by a location refinement stage. MOSEG enjoys computational efficiency thanks to the adoption of a coarse grid in the first stage, as well as achieving theoretical consistency in estimating both the total number and the locations of the change points without requiring independence or sub-Gaussianity. We also propose MOSEG$.$MS, a multiscale extension of MOSEG which, while comparable to MOSEG in terms of computational complexity, achieves theoretical consistency for a broader parameter space that permits multiscale change points. We demonstrate good performance of the proposed methods in comparative simulation studies and in an application to to predicting the equity premium.

In decommissioning projects of nuclear facilities, the radiological characterisation step aims to estimate the quantity and spatial distribution of different radionuclides. To carry out the estimation, measurements are performed on site to obtain preliminary information. The usual industrial practice consists in applying spatial interpolation tools (as the ordinary kriging method) on these data to predict the value of interest for the contamination (radionuclide concentration, radioactivity, etc.) at unobserved positions. This paper questions the ordinary kriging tool on the well-known problem of the overoptimistic prediction variances due to not taking into account uncertainties on the estimation of the kriging parameters (variance and range). To overcome this issue, the practical use of the Bayesian kriging method, where the model parameters are considered as random variables, is deepened. The usefulness of Bayesian kriging, whilst comparing its performance to that of ordinary kriging, is demonstrated in the small data context (which is often the case in decommissioning projects). This result is obtained via several numerical tests on different toy models, and using complementary validation criteria: the predictivity coefficient (Q${}^2$), the Predictive Variance Adequacy (PVA), the $\alpha$-Confidence Interval plot (and its associated Mean Squared Error alpha (MSEalpha)), and the Predictive Interval Adequacy (PIA). The latter is a new criterion adapted to the Bayesian kriging results. Finally, the same comparison is performed on a real dataset coming from the decommissioning project of the CEA Marcoule G3 reactor. It illustrates the practical interest of Bayesian kriging in industrial radiological characterisation.

Current deep learning research is dominated by benchmark evaluation. A method is regarded as favorable if it empirically performs well on the dedicated test set. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving sets of benchmark data are investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten due to the iterative parameter updates. However, comparison of individual methods is nevertheless treated in isolation from real world application and typically judged by monitoring accumulated test set performance. The closed world assumption remains predominant. It is assumed that during deployment a model is guaranteed to encounter data that stems from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown instances and break down in the face of corrupted data. In this work we argue that notable lessons from open set recognition, the identification of statistically deviating data outside of the observed dataset, and the adjacent field of active learning, where data is incrementally queried such that the expected performance gain is maximized, are frequently overlooked in the deep learning era. Based on these forgotten lessons, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Our results show that this not only benefits each individual paradigm, but highlights the natural synergies in a common framework. We empirically demonstrate improvements when alleviating catastrophic forgetting, querying data in active learning, selecting task orders, while exhibiting robust open world application where previously proposed methods fail.

北京阿比特科技有限公司