亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Establishing limiting distributions of Chatterjee's rank correlation for a general, possibly non-independent, pair of random variables has been eagerly awaited to many. This paper shows that (a) Chatterjee's rank correlation is asymptotically normal as long as one variable is not a measurable function of the other, and (b) the corresponding asymptotic variance is uniformly bounded by 36. Similar results also hold for Azadkia-Chatterjee's graph-based correlation coefficient, a multivariate analogue of Chatterjee's original proposal. The proof is given by appealing to H\'ajek representation and Chatterjee's nearest-neighbor CLT.

相關內容

In supervised learning using kernel methods, we often encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space (RKHS). Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence (with high probability). We derive the theoretical lower bound for the number of random features required for the approximated Hessian to be close to the true Hessian in the norm sense. Our numerical experiments on real-world data verify the efficiency of our method compared to several benchmarks.

A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally the results of each blocks are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduces a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.

We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited inventory). Despite outsize practical importance, the best estimators for this `Markovian' interference problem are largely heuristic in nature, and their bias is not well understood. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general have exponentially smaller variance than off-policy evaluation. At the same time, its bias is second order in the impact of the intervention. This yields a striking bias-variance tradeoff so that the DQ estimator effectively dominates state-of-the-art alternatives. From a theoretical perspective, we introduce three separate novel techniques that are of independent interest in the theory of Reinforcement Learning (RL). Our empirical evaluation includes a set of experiments on a city-scale ride-hailing simulator.

We study the log-rank conjecture from the perspective of point-hyperplane incidence geometry. We formulate the following conjecture: Given a point set in $\mathbb{R}^d$ that is covered by constant-sized sets of parallel hyperplanes, there exists an affine subspace that accounts for a large (i.e., $2^{-{\operatorname{polylog}(d)}}$) fraction of the incidences. Alternatively, our conjecture may be interpreted linear-algebraically as follows: Any rank-$d$ matrix containing at most $O(1)$ distinct entries in each column contains a submatrix of fractional size $2^{-{\operatorname{polylog}(d)}}$, in which each column contains one distinct entry. We prove that our conjecture is equivalent to the log-rank conjecture. Motivated by the connections above, we revisit well-studied questions in point-hyperplane incidence geometry without structural assumptions (i.e., the existence of partitions). We give an elementary argument for the existence of complete bipartite subgraphs of density $\Omega(\epsilon^{2d}/d)$ in any $d$-dimensional configuration with incidence density $\epsilon$. We also improve an upper-bound construction of Apfelbaum and Sharir (SIAM J. Discrete Math. '07), yielding a configuration whose complete bipartite subgraphs are exponentially small and whose incidence density is $\Omega(1/\sqrt d)$. Finally, we discuss various constructions (due to others) which yield configurations with incidence density $\Omega(1)$ and bipartite subgraph density $2^{-\Omega(\sqrt d)}$. Our framework and results may help shed light on the difficulty of improving Lovett's $\tilde{O}(\sqrt{\operatorname{rank}(f)})$ bound (J. ACM '16) for the log-rank conjecture; in particular, any improvement on this bound would imply the first bipartite subgraph size bounds for parallel $3$-partitioned configurations which beat our generic bounds for unstructured configurations.

In this paper, a new weighted average estimator (WAVE) is proposed to enhance the performance of the simple-averaging based distributed estimator, under a general loss with a high dimensional parameter. To obtain an efficient estimator, a weighted least-square ensemble framework plus an adaptive $L_1$ penalty is proposed, in which the local estimator is estimated via the adaptive-lasso and the weight is inversely proportional to the variance of local estimators. It can be proved that WAVE enjoys the same asymptotic properties as the global estimator and simultaneously spend a very low communication cost, only requiring the local worker to deliver two vectors to the master. Moreover, it is shown that WAVE is effective even when the samples across local workers have different mean and covariance. In particular, the asymptotic normality is established under such conditions, while other competitors may not own this property. The effectiveness of WAVE is further illustrated by an extensive numerical study and a real data analysis.

The synthetic control method has become a widely popular tool to estimate causal effects with observational data. Despite this, inference for synthetic control methods remains challenging. Often, inferential results rely on linear factor model data generating processes. In this paper, we characterize the conditions on the factor model primitives (the factor loadings) for which the statistical risk minimizers are synthetic controls (in the simplex). Then, we propose a Bayesian alternative to the synthetic control method that preserves the main features of the standard method and provides a new way of doing valid inference. We explore a Bernstein-von Mises style result to link our Bayesian inference to the frequentist inference. For linear factor model frameworks we show that a maximum likelihood estimator (MLE) of the synthetic control weights can consistently estimate the predictive function of the potential outcomes for the treated unit and that our Bayes estimator is asymptotically close to the MLE in the total variation sense. Through simulations, we show that there is convergence between the Bayes and frequentist approach even in sparse settings. Finally, we apply the method to re-visit the study of the economic costs of the German re-unification. The Bayesian synthetic control method is available in the bsynth R-package.

Bonnet et al. (FOCS 2020) introduced the graph invariant twin-width and showed that many NP-hard problems are tractable for graphs of bounded twin-width, generalizing similar results for other width measures, including treewidth and clique-width. In this paper, we investigate the use of twin-width for solving the propositional satisfiability problem (SAT) and propositional model counting. We particularly focus on Bounded-ones Weighted Model Counting (BWMC), which takes as input a CNF formula $F$ along with a bound $k$ and asks for the weighted sum of all models with at most $k$ positive literals. BWMC generalizes not only SAT but also (weighted) model counting. We develop the notion of "signed" twin-width of CNF formulas and establish that BWMC is fixed-parameter tractable when parameterized by the certified signed twin-width of $F$ plus $k$. We show that this result is tight: it is neither possible to drop the bound $k$ nor use the vanilla twin-width instead if one wishes to retain fixed-parameter tractability, even for the easier problem SAT. Our theoretical results are complemented with an empirical evaluation and comparison of signed twin-width on various classes of CNF formulas.

What do applications like semantic optimization, data exchange and integration, answering queries under dependencies, query reformulation with constraints, and data cleaning have in common? All these applications can be processed by the Chase, a family of algorithms for reasoning with constraints. While the theory of the Chase is well understood, existing implementations are confined to specific use cases and application scenarios, making it difficult to reuse them in other settings. ChaTEAU overcomes this limitation: It takes the logical core of the Chase, generalizes it, and provides a software library for different Chase applications in a single toolkit.

This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test objectives, we further explain why it already occurs in classification tasks without assuming any particular mismatch between train and test data distributions.

Aggregating signals from a collection of noisy sources is a fundamental problem in many domains including crowd-sourcing, multi-agent planning, sensor networks, signal processing, voting, ensemble learning, and federated learning. The core question is how to aggregate signals from multiple sources (e.g. experts) in order to reveal an underlying ground truth. While a full answer depends on the type of signal, correlation of signals, and desired output, a problem common to all of these applications is that of differentiating sources based on their quality and weighting them accordingly. It is often assumed that this differentiation and aggregation is done by a single, accurate central mechanism or agent (e.g. judge). We complicate this model in two ways. First, we investigate the setting with both a single judge, and one with multiple judges. Second, given this multi-agent interaction of judges, we investigate various constraints on the judges' reporting space. We build on known results for the optimal weighting of experts and prove that an ensemble of sub-optimal mechanisms can perform optimally under certain conditions. We then show empirically that the ensemble approximates the performance of the optimal mechanism under a broader range of conditions.

北京阿比特科技有限公司