高清国产三级在线播放_全部免费毛片AV_国产欧美另类久久久精品99_免费中文字幕一区二区三区_欧美一区二区日韩一区二区欧美_黄色网站在线免费观看一区二区_AV小电影在线播放

This article introduces hinted dictionaries for expressing efficient ordered sets and maps functionally. As opposed to the traditional ordered dictionaries with logarithmic operations, hinted dictionaries can achieve better performance by using cursor-like objects referred to as hints. Hinted dictionaries unify the interfaces of imperative ordered dictionaries (e.g., C++ maps) and functional ones (e.g., Adams' sets). We show that such dictionaries can use sorted arrays, unbalanced trees, and balanced trees as their underlying representations. Throughout the article, we use Scala to present the different components of hinted dictionaries. We also provide a C++ implementation to evaluate the effectiveness of hinted dictionaries. Hinted dictionaries provide superior performance for set-set operations in comparison with the standard library of C++. Also, they show a competitive performance in comparison with the SciPy library for sparse vector operations.

相關內容

Performer

關注 10

相關系數 · 有向 · Extensibility · Pair · 泛函 ·

2022 年 7 月 24 日

A Direct Construction of Cross Z-Complementary Sets with Flexible Lengths and Large Zero Correlation Zone

Praveen Kumar,Sudhan Majhi,Subhabrata Paul

This letter proposes a direct construction for cross Z-complementary sets (CZCSs) with flexible lengths and a large zero correlation zone (ZCZ). CZCS is an extension of the cross Z-complementary pair (CZCP). The maximum possible ZCZ width of a CZCP is half of its sequence length. In this letter, for the first time, a generalized Boolean function based construction of CZCSs with a large number of constituent sequences and a ZCZ ratio of $2/3$ is presented. For integers $m$ and $\delta$, the proposed construction produces CZCS with length expressed as $2^{m-1}+2^\delta$ ($0 \leq \delta <m-1,m\geq 4$), where both odd and even lengths CZCS can be obtained. Additionally, the constructed CZCS also feature a complementary set of the same length. Finally, the proposed construction is compared with the existing works.

線性回歸 · 線性的 · 估計/估計量 · 泛函 · 推斷 ·

2022 年 7 月 23 日

Simultaneous Inference for Time Series Functional Linear Regression

Yan Cui,Zhou Zhou

from arxiv, 20 pages

We consider the problem of joint simultaneous confidence band (JSCB) construction for regression coefficient functions of time series scalar-on-function linear regression when the regression model is estimated by roughness penalization approach with flexible choices of orthonormal basis functions. A simple and unified multiplier bootstrap methodology is proposed for the JSCB construction which is shown to achieve the correct coverage probability asymptotically. Furthermore, the JSCB is asymptotically robust to inconsistently estimated standard deviations of the model. The proposed methodology is applied to a time series data set of electricity market to visually investigate and formally test the overall regression relationship as well as perform model validation. A uniform Gaussian approximation and comparison result over all Euclidean convex sets for normalized sums of a class of moderately high-dimensional stationary time series is established. Finally, the proposed methodology can be applied to simultaneous inference for scalar-on-function linear regression of independent cross-sectional data.

全 · 統計量 · MoDELS · motivation · Performer ·

2022 年 7 月 22 日

Comparing baseball players across eras via the novel Full House Model

Shen Yan,Adrian Burgos Jr.,Christopher Kinson,Brandon Niedert,Daniel J. Eck

We motivate a new methodological framework for era-adjusting baseball statistics. Our methodology is a crystallization of the conceptual ideas put forward by Stephen Jay Gould. We name this methodology the Full House Model in his honor. The Full House Model works by balancing the achievements of Major League Baseball (MLB) players within a given season and the size of the MLB eligible population. We demonstrate the utility of our Full House Model in an application of comparing baseball players' performance statistics across eras. Our results reveal a radical reranking of baseball's greatest players that is consistent with what one would expect under a sensible uniform talent generation assumption. Most importantly, we find that the greatest African American and Latino players now sit atop the greatest all-time lists of historical baseball players while conventional wisdom ranks such players lower. Our conclusions largely refute a consensus of baseball greatness that is reinforced by nostalgic bias, recorded statistics, and statistical methodologies which we argue are not suited to the task of comparing players across eras.

Performer · 模型評估 · Analysis · 標注 · 估計/估計量 ·

2022 年 7 月 22 日

Efficient Testing of Deep Neural Networks via Decision Boundary Analysis

Qiang Hu,Yuejun Guo,Xiaofei Xie,Maxime Cordy,Lei Ma,Mike Papadakis,Yves Le Traon

from arxiv, 12 pages

Deep learning plays a more and more important role in our daily life due to its competitive performance in multiple industrial application domains. As the core of DL-enabled systems, deep neural networks automatically learn knowledge from carefully collected and organized training data to gain the ability to predict the label of unseen data. Similar to the traditional software systems that need to be comprehensively tested, DNNs also need to be carefully evaluated to make sure the quality of the trained model meets the demand. In practice, the de facto standard to assess the quality of DNNs in industry is to check their performance (accuracy) on a collected set of labeled test data. However, preparing such labeled data is often not easy partly because of the huge labeling effort, i.e., data labeling is labor-intensive, especially with the massive new incoming unlabeled data every day. Recent studies show that test selection for DNN is a promising direction that tackles this issue by selecting minimal representative data to label and using these data to assess the model. However, it still requires human effort and cannot be automatic. In this paper, we propose a novel technique, named Aries, that can estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data. The key insight behind our technique is that the model should have similar prediction accuracy on the data which have similar distances to the decision boundary. We performed a large-scale evaluation of our technique on 13 types of data transformation methods. The results demonstrate the usefulness of our technique that the estimated accuracy by Aries is only 0.03% -- 2.60% (on average 0.61%) off the true accuracy. Besides, Aries also outperforms the state-of-the-art selection-labeling-based methods in most (96 out of 128) cases.

Learning · INFORMS · 條件概率密度函數 · 互信息 · 概率密度函數 ·

2022 年 7 月 21 日

Deep Sufficient Representation Learning via Mutual Information

Siming Zheng,Yuanyuan Lin,Jian Huang

from arxiv, 43 pages, 6 figures and 5 tables

We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or categorical response variables. MSRL is shown to be consistent in the sense that the conditional probability density function of the response variable given the learned representation converges to the conditional probability density function of the response variable given the predictor. Non-asymptotic error bounds for MSRL are also established under suitable conditions. To establish the error bounds, we derive a generalized Dudley's inequality for an order-two U-process indexed by deep neural networks, which may be of independent interest. We discuss how to determine the intrinsic dimension of the underlying data distribution. Moreover, we evaluate the performance of MSRL via extensive numerical experiments and real data analysis and demonstrate that MSRL outperforms some existing nonlinear sufficient dimension reduction methods.

Learning · 基學習器 · Machine Learning · 學習器 · 統計量 ·

2022 年 7 月 21 日

Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A Frequentest and Machine Learning based Stacking Framework

Numan Ahmad,Behram Wali,Asad J. Khattak

from arxiv, This paper was presented at the 101st Transportation Research Board Annual Meeting (TRBAM) by National Academy of Sciences in January 2022 in Washington D.C. The paper is currently under review for potential publication in an Impact Factor Journal

A variety of statistical and machine learning methods are used to model crash frequency on specific roadways with machine learning methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including stacking, have emerged as more accurate and robust intelligent techniques and are often used to solve pattern recognition problems by providing more reliable and accurate predictions. In this study, we apply one of the key HEM methods, Stacking, to model crash frequency on five lane undivided segments (5T) of urban and suburban arterials. The prediction performance of Stacking is compared with parametric statistical models (Poisson and negative binomial) and three state of the art machine learning techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base learner. By employing an optimal weight scheme to combine individual base learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training, validation, and testing datasets. Estimation results of statistical models reveal that besides other factors, crashes increase with density (number per mile) of different types of driveways. Comparison of out-of-sample predictions of various models confirms the superiority of Stacking over the alternative methods considered. From a practical standpoint, stacking can enhance prediction accuracy (compared to using only one base learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.

MoDELS · Machine Learning · 學成 · 線性的 · 線性模型 ·

2021 年 9 月 6 日

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

Yehuda Dar,Vidya Muthukumar,Richard G. Baraniuk

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.

Networking · Extensibility · MoDELS · Neural Networks · 模型復雜度 ·

2018 年 9 月 6 日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Yen-Yu Chang,Fan-Yun Sun,Yueh-Hua Wu,Shou-De Lin

from arxiv, 8 pages, 4 figures, submitted to AAAI 2019

Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.

圖 · 學成 · state-of-the-art · GNN · 表示學習 ·

2018 年 6 月 26 日

Hierarchical Graph Representation Learning with Differentiable Pooling

Rex Ying,Jiaxuan You,Christopher Morris,Xiang Ren,William L. Hamilton,Jure Leskovec

Recently, graph neural networks (GNNs) have revolutionized the field of graph representation learning through effectively learned node embeddings, and achieved state-of-the-art results in tasks such as node classification and link prediction. However, current GNN methods are inherently flat and do not learn hierarchical representations of graphs---a limitation that is especially problematic for the task of graph classification, where the goal is to predict the label associated with an entire graph. Here we propose DiffPool, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. DiffPool learns a differentiable soft cluster assignment for nodes at each layer of a deep GNN, mapping nodes to a set of clusters, which then form the coarsened input for the next GNN layer. Our experimental results show that combining existing GNN methods with DiffPool yields an average improvement of 5-10% accuracy on graph classification benchmarks, compared to all existing pooling approaches, achieving a new state-of-the-art on four out of five benchmark data sets.

Ripple · Networking · 圖 · 知識圖譜 · Extensibility ·

2018 年 5 月 19 日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Hongwei Wang,Fuzheng Zhang,Jialin Wang,Miao Zhao,Wenjie Li,Xing Xie,Minyi Guo

To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.