亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the balanced $k$-way hypergraph partitioning problem, with a special focus on its practical applications to manycore scheduling. Given a hypergraph on $n$ nodes, our goal is to partition the node set into $k$ parts of size at most $(1+\epsilon)\cdot \frac{n}{k}$ each, while minimizing the cost of the partitioning, defined as the number of cut hyperedges, possibly also weighted by the number of partitions they intersect. We show that this problem cannot be approximated to within a $n^{1/\text{poly} \log\log n}$ factor of the optimal solution in polynomial time if the Exponential Time Hypothesis holds, even for hypergraphs of maximal degree 2. We also study the hardness of the partitioning problem from a parameterized complexity perspective, and in the more general case when we have multiple balance constraints. Furthermore, we consider two extensions of the partitioning problem that are motivated from practical considerations. Firstly, we introduce the concept of hyperDAGs to model precedence-constrained computations as hypergraphs, and we analyze the adaptation of the balanced partitioning problem to this case. Secondly, we study the hierarchical partitioning problem to model hierarchical NUMA (non-uniform memory access) effects in modern computer architectures, and we show that ignoring this hierarchical aspect of the communication cost can yield significantly weaker solutions.

相關內容

Several problems in statistics involve the combination of high-variance unbiased estimators with low-variance estimators that are only unbiased under strong assumptions. A notable example is the estimation of causal effects while combining small experimental datasets with larger observational datasets. There exist a series of recent proposals on how to perform such a combination, even when the bias of the low-variance estimator is unknown. To build intuition for the differing trade-offs of competing approaches, we argue for examining the finite-sample estimation error of each approach as a function of the unknown bias. This includes understanding the bias threshold -- the largest bias for which a given approach improves over using the unbiased estimator alone. Though this lens, we review several recent proposals, and observe in simulation that different approaches exhibits qualitatively different behavior. We also introduce a simple alternative approach, which compares favorably in simulation to recent alternatives, having a higher bias threshold and generally making a more conservative trade-off between best-case performance (when the bias is zero) and worst-case performance (when the bias is adversarially chosen). More broadly, we prove that for any amount of (unknown) bias, the MSE of this estimator can be bounded in a transparent way that depends on the variance / covariance of the underlying estimators that are being combined.

Probabilistic graphical models have become an important unsupervised learning tool for detecting network structures for a variety of problems, including the estimation of functional neuronal connectivity from two-photon calcium imaging data. However, in the context of calcium imaging, technological limitations only allow for partially overlapping layers of neurons in a brain region of interest to be jointly recorded. In this case, graph estimation for the full data requires inference for edge selection when many pairs of neurons have no simultaneous observations. This leads to the Graph Quilting problem, which seeks to estimate a graph in the presence of block-missingness in the empirical covariance matrix. Solutions for the Graph Quilting problem have previously been studied for Gaussian graphical models; however, neural activity data from calcium imaging are often non-Gaussian, thereby requiring a more flexible modeling approach. Thus, in our work, we study two approaches for nonparanormal Graph Quilting based on the Gaussian copula graphical model, namely a maximum likelihood procedure and a low-rank based framework. We provide theoretical guarantees on edge recovery for the former approach under similar conditions to those previously developed for the Gaussian setting, and we investigate the empirical performance of both methods using simulations as well as real data calcium imaging data. Our approaches yield more scientifically meaningful functional connectivity estimates compared to existing Gaussian graph quilting methods for this calcium imaging data set.

This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a R\'enyi-Ulam style analytical framework for this problem, and prove upper and lower bounds on its worst-case query complexity. Further, we bound the expected performance of a relevant randomized algorithm. Finally, we study the relationship between adaptivity and query complexity for this problem and related variants.

In this work we consider a new family of algorithms for sequential prediction, Hierarchical Partitioning Forecasters (HPFs). Our goal is to provide appealing theoretical - regret guarantees on a powerful model class - and practical - empirical performance comparable to deep networks - properties at the same time. We built upon three principles: hierarchically partitioning the feature space into sub-spaces, blending forecasters specialized to each sub-space and learning HPFs via local online learning applied to these individual forecasters. Following these principles allows us to obtain regret guarantees, where Constant Partitioning Forecasters (CPFs) serve as competitor. A CPF partitions the feature space into sub-spaces and predicts with a fixed forecaster per sub-space. Fixing a hierarchical partition $\mathcal H$ and considering any CPF with a partition that can be constructed using elements of $\mathcal H$ we provide two guarantees: first, a generic one that unveils how local online learning determines regret of learning the entire HPF online; second, a concrete instance that considers HPF with linear forecasters (LHPF) and exp-concave losses where we obtain $O(k \log T)$ regret for sequences of length $T$ where $k$ is a measure of complexity for the competing CPF. Finally, we provide experiments that compare LHPF to various baselines, including state of the art deep learning models, in precipitation nowcasting. Our results indicate that LHPF is competitive in various settings.

Average treatment effect (ATE) estimation is an essential problem in the causal inference literature, which has received significant recent attention, especially with the presence of high-dimensional confounders. We consider the ATE estimation problem in high dimensions when the observed outcome (or label) itself is possibly missing. The labeling indicator's conditional propensity score is allowed to depend on the covariates, and also decay uniformly with sample size - thus allowing for the unlabeled data size to grow faster than the labeled data size. Such a setting fills in an important gap in both the semi-supervised (SS) and missing data literatures. We consider a missing at random (MAR) mechanism that allows selection bias - this is typically forbidden in the standard SS literature, and without a positivity condition - this is typically required in the missing data literature. We first propose a general doubly robust 'decaying' MAR (DR-DMAR) SS estimator for the ATE, which is constructed based on flexible (possibly non-parametric) nuisance estimators. The general DR-DMAR SS estimator is shown to be doubly robust, as well as asymptotically normal (and efficient) when all the nuisance models are correctly specified. Additionally, we propose a bias-reduced DR-DMAR SS estimator based on (parametric) targeted bias-reducing nuisance estimators along with a special asymmetric cross-fitting strategy. We demonstrate that the bias-reduced ATE estimator is asymptotically normal as long as either the outcome regression or the propensity score model is correctly specified. Moreover, the required sparsity conditions are weaker than all the existing doubly robust causal inference literature even under the regular supervised setting - this is a special degenerate case of our setting. Lastly, this work also contributes to the growing literature on generalizability in causal inference.

PDE solutions are numerically represented by basis functions. Classical methods employ pre-defined bases that encode minimum desired PDE properties, which naturally cause redundant computations. What are the best bases to numerically represent PDE solutions? From the analytical perspective, the Kolmogorov $n$-width is a popular criterion for selecting representative basis functions. From the Bayesian computation perspective, the concept of optimality selects the modes that, when known, minimize the variance of the conditional distribution of the rest of the solution. We show that these two definitions of optimality are equivalent. Numerically, both criteria reduce to solving a Singular Value Decomposition (SVD), a procedure that can be made numerically efficient through randomized sampling. We demonstrate computationally the effectiveness of the basis functions so obtained on several linear and nonlinear problems. In all cases, the optimal accuracy is achieved with a small set of basis functions.

Off-policy evaluation (OPE) is the problem of estimating the value of a target policy using historical data collected under a different logging policy. OPE methods typically assume overlap between the target and logging policy, enabling solutions based on importance weighting and/or imputation. In this work, we approach OPE without assuming either overlap or a well-specified model by considering a strategy based on partial identification under non-parametric assumptions on the conditional mean function, focusing especially on Lipschitz smoothness. Under such smoothness assumptions, we formulate a pair of linear programs whose optimal values upper and lower bound the contributions of the no-overlap region to the off-policy value. We show that these linear programs have a concise closed form solution that can be computed efficiently and that their solutions converge, under the Lipschitz assumption, to the sharp partial identification bounds on the off-policy value. Furthermore, we show that the rate of convergence is minimax optimal, up to log factors. We deploy our methods on two semi-synthetic examples, and obtain informative and valid bounds that are tighter than those possible without smoothness assumptions.

Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.

Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.

Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics system, learning molecular fingerprints, predicting protein interface, and classifying diseases require that a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures, like the dependency tree of sentences and the scene graph of images, is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with an arbitrary depth. Although the primitive graph neural networks have been found difficult to train for a fixed point, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful learning with them. In recent years, systems based on graph convolutional network (GCN) and gated graph neural network (GGNN) have demonstrated ground-breaking performance on many tasks mentioned above. In this survey, we provide a detailed review over existing graph neural network models, systematically categorize the applications, and propose four open problems for future research.

北京阿比特科技有限公司