亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (IC) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of IC can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of IC, which we call PanIC (from the Greek root 'pan', meaning 'of everything'), with easily verifiable regularity conditions. The PanIC are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression, and principal component analysis, and we demonstrate the effectiveness of the PanIC for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC to PanIC.

相關內容

Optimal model reduction for large-scale linear dynamical systems is studied. In contrast to most existing works, the systems under consideration are not required to be stable, neither in discrete nor in continuous time. As a consequence, the underlying rational transfer functions are allowed to have poles in general domains in the complex plane. In particular, this covers the case of specific conservative partial differential equations such as the linear Schr\"odinger and the undamped linear wave equation with spectra on the imaginary axis. By an appropriate modification of the classical continuous time Hardy space $\mathcal{H}_2$, a new $\mathcal{H}_2$ like optimal model reduction problem is introduced and first order optimality conditions are derived. As in the classical $\mathcal{H}_2$ case, these conditions exhibit a rational Hermite interpolation structure for which an iterative model reduction algorithm is proposed. Numerical examples demonstrate the effectiveness of the new method.

Necessary and sufficient conditions of uniform consistency are explored. Nonparametric sets of alternatives are bounded convex sets in $\mathbb{L}_p$ with "small" balls deleted. The "small" balls have the center at the point of hypothesis and radii of balls tend to zero as sample size increases. For problem of hypothesis testing on a density, we show that, for the sets of alternatives, there are uniformly consistent tests for some sequence of radii of the balls, if and only if, convex set is compact. The results are established for problem of hypothesis testing on a density, for signal detection in Gaussian white noise, for linear ill-posed problems with random Gaussian noise and so on.

We characterize the uniqueness condition in the hardcore model for bipartite graphs with degree bounds only on one side, and provide a nearly linear time sampling algorithm that works up to the uniqueness threshold. We show that the uniqueness threshold for bipartite graph has almost the same form of the tree uniqueness threshold for general graphs, except with degree bounds only on one side of the bipartition. The hardcore model from statistical physics can be seen as a weighted enumeration of independent sets. Its bipartite version (#BIS) is a central open problem in approximate counting. Compared to the same problem in a general graph, surprising tractable regime have been identified that are believed to be hard in general. This is made possible by two lines of algorithmic approach: the high-temperature algorithms starting from Liu and Lu (STOC 2015), and the low-temperature algorithms starting from Helmuth, Perkins, and Regts (STOC 2019). In this work, we study the limit of these algorithms in the high-temperature case. Our characterization of the uniqueness condition is obtained by proving decay of correlations for arguably the best possible regime, which involves locating fixpoints of multivariate iterative rational maps and showing their contraction. We also give a nearly linear time sampling algorithm based on simulating field dynamics only on one side of the bipartite graph that works up to the uniqueness threshold. Our algorithm is very different from the original high-temperature algorithm of Liu and Lu, and it makes use of a connection between correlation decay and spectral independence of Markov chains. Last but not the least, we are able to show that the standard Glauber dynamics on both side of the bipartite graph mixes in polynomial time up to the uniqueness.

We propose fast and communication-efficient optimization algorithms for multi-robot rotation averaging and translation estimation problems that arise from collaborative simultaneous localization and mapping (SLAM), structure-from-motion (SfM), and camera network localization applications. Our methods are based on theoretical relations between the Hessians of the underlying Riemannian optimization problems and the Laplacians of suitably weighted graphs. We leverage these results to design a collaborative solver in which robots coordinate with a central server to perform approximate second-order optimization, by solving a Laplacian system at each iteration. Crucially, our algorithms permit robots to employ spectral sparsification to sparsify intermediate dense matrices before communication, and hence provide a mechanism to trade off accuracy with communication efficiency with provable guarantees. We perform rigorous theoretical analysis of our methods and prove that they enjoy (local) linear rate of convergence. Furthermore, we show that our methods can be combined with graduated non-convexity to achieve outlier-robust estimation. Extensive experiments on real-world SLAM and SfM scenarios demonstrate the superior convergence rate and communication efficiency of our methods.

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding. However, obtaining embeddings at the document level is challenging due to computational requirements and lack of appropriate data. Instead, most approaches fall back on computing document embeddings based on sentence representations. Although there exist architectures and models to encode documents fully, they are in general limited to English and few other high-resourced languages. In this work, we provide a systematic comparison of methods to produce document-level representations from sentences based on LASER, LaBSE, and Sentence BERT pre-trained multilingual models. We compare input token number truncation, sentence averaging as well as some simple windowing and in some cases new augmented and learnable approaches, on 3 multi- and cross-lingual tasks in 8 languages belonging to 3 different language families. Our task-based extrinsic evaluations show that, independently of the language, a clever combination of sentence embeddings is usually better than encoding the full document as a single unit, even when this is possible. We demonstrate that while a simple sentence average results in a strong baseline for classification tasks, more complex combinations are necessary for semantic tasks.

Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.

Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available at //github.com/yunzhe-zhou/GenericDistillation.

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

Graph machine learning has been extensively studied in both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To tackle the challenge, automated graph machine learning, which aims at discovering the best hyper-parameter and neural architecture configuration for different graph tasks/data without manual design, is gaining an increasing number of attentions from the research community. In this paper, we extensively discuss automated graph machine approaches, covering hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We briefly overview existing libraries designed for either graph machine learning or automated machine learning respectively, and further in depth introduce AutoGL, our dedicated and the world's first open-source library for automated graph machine learning. Last but not least, we share our insights on future research directions for automated graph machine learning. This paper is the first systematic and comprehensive discussion of approaches, libraries as well as directions for automated graph machine learning.

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

北京阿比特科技有限公司