亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Streaming is a model where an input graph is provided one edge at a time, instead of being able to inspect it at will. In this work, we take a parameterized approach by assuming a vertex cover of the graph is given, building on work of Bishnu et al. [COCOON 2020]. We show the further potency of combining this parameter with the Adjacency List streaming model to obtain results for vertex deletion problems. This includes kernels, parameterized algorithms, and lower bounds for the problems of Pi-free Deletion, H-free Deletion, and the more specific forms of Cluster Vertex Deletion and Odd Cycle Transversal. We focus on the complexity in terms of the number of passes over the input stream, and the memory used. This leads to a pass/memory trade-off, where a different algorithm might be favourable depending on the context and instance. We also discuss implications for parameterized complexity in the non-streaming setting.

相關內容

In this work, we consider $d$-{\sc Hyperedge Estimation} and $d$-{\sc Hyperedge Sample} problem in a hypergraph $\mathcal{H}(U(\mathcal{H}),\mathcal{F}(\mathcal{H}))$ in the query complexity framework, where $U(\mathcal{H})$ denotes the set of vertices and $\mathcal{F}(\mathcal{H})$ denotes the set of hyperedges. The oracle access to the hypergraph is called {\sc Colorful Independence Oracle} ({\sc CID}), which takes $d$ (non-empty) pairwise disjoint subsets of vertices $A_1,\ldots,A_d \subseteq U(\mathcal{H})$ as input, and answers whether there exists a hyperedge in $\mathcal{H}$ having (exactly) one vertex in each $A_i, i \in \{1,2,\ldots,d\}$. The problem of $d$-{\sc Hyperedge Estimation} and $d$-{\sc Hyperedge Sample} with {\sc CID} oracle access is important in its own right as a combinatorial problem. Also, Dell {\it{et al.}}~[SODA '20] established that {\em decision} vs {\em counting} complexities of a number of combinatorial optimization problems can be abstracted out as $d$-{\sc Hyperedge Estimation} problems with a {\sc CID} oracle access. The main technical contribution of the paper is an algorithm that estimates $m= \lvert {\mathcal{F}(\mathcal{H})}\rvert$ with $\widehat{m}$ such that { $$ \frac{1}{C_{d}\log^{d-1} n} \;\leq\; \frac{\widehat{m}}{m} \;\leq\; C_{d} \log ^{d-1} n . $$ by using at most $C_{d}\log ^{d+2} n$ many {\sc CID} queries, where $n$ denotes the number of vertices in the hypergraph $\mathcal{H}$ and $C_{d}$ is a constant that depends only on $d$}. Our result coupled with the framework of Dell {\it{et al.}}~[SODA '21] implies improved bounds for a number of fundamental problems.

We study the following two fixed-cardinality optimization problems (a maximization and a minimization variant). For a fixed $\alpha$ between zero and one we are given a graph and two numbers $k \in \mathbb{N}$ and $t \in \mathbb{Q}$. The task is to find a vertex subset $S$ of exactly $k$ vertices that has value at least (resp. at most for minimization) $t$. Here, the value of a vertex set computes as $\alpha$ times the number of edges with exactly one endpoint in $S$ plus $1-\alpha$ times the number of edges with both endpoints in $S$. These two problems generalize many prominent graph problems, such as Densest $k$-Subgraph, Sparsest $k$-Subgraph, Partial Vertex Cover, and Max ($k$,$n-k$)-Cut. In this work, we complete the picture of their parameterized complexity on several types of sparse graphs that are described by structural parameters. In particular, we provide kernelization algorithms and kernel lower bounds for these problems. A somewhat surprising consequence of our kernelizations is that Partial Vertex Cover and Max $(k,n-k)$-Cut not only behave in the same way but that the kernels for both problems can be obtained by the same algorithms.

Decision forests, including random forests and gradient boosting trees, remain the leading machine learning methods for many real-world data problems, specifically on tabular data. However, current standard implementations only operate in batch mode, and therefore cannot incrementally update when more data arrive. Several previous works developed streaming trees and ensembles to overcome this limitation. Nonetheless, we found that those state-of-the-art algorithms suffer from a number of drawbacks, including performing very poorly on some problems and requiring a huge amount of memory on others. We therefore developed the simplest possible extension of decision trees we could think of: given new data, simply update existing trees by continuing to grow them, and replace some old trees with new ones to control the total number of trees. On three standard datasets, we illustrate that our approach, Stream Decision Forest (SDF), does not suffer from either of the aforementioned limitations. In a benchmark suite containing 72 classification problems (the OpenML-CC18 data suite), we illustrate that our approach often performs as well, and sometimes better even, than the batch mode random forest algorithm. Thus, we believe SDFs establish a simple standard for streaming trees and forests that could readily be applied to many real-world problems, including those with distribution drift and continual learning.

We consider parameter estimation for a linear parabolic second-order stochastic partial differential equation (SPDE) in two space dimensions driven by two types $Q$-Wiener processes based on high frequency data in time and space. We first estimate the parameters which appear in the coordinate process of the SPDE using the minimum contrast estimator based on the thinned data with respect to space, and then construct an approximate coordinate process of the SPDE. Furthermore, we propose estimators of the coefficient parameters of the SPDE utilizing the approximate coordinate process based on the thinned data with respect to time. We also give some simulation results.

Let $P$ be a polyhedron, defined by a system $A x \leq b$, where $A \in Z^{m \times n}$, $rank(A) = n$, and $b \in Z^{m}$. In the Integer Feasibility Problem, we need to decide whether $P \cap Z^n = \emptyset$ or to find some $x \in P \cap Z^n$ in the opposite case. Currently, its state of the art algorithm, due to \cite{DadushDis,DadushFDim} (see also \cite{Convic,ConvicComp,DConvic} for more general formulations), has the complexity bound $O(n)^n \cdot poly(\phi)$, where $\phi = size(A,b)$. It is a long-standing open problem to break the $O(n)^n$ dimension-dependence in the complexity of ILP algorithms. We show that if the matrix $A$ has a small $l_1$ or $l_\infty$ norm, or $A$ is sparse and has bounded elements, then the integer feasibility problem can be solved faster. More precisely, we give the following complexity bounds \begin{gather*} \min\{\|A\|_{\infty}, \|A\|_1\}^{5 n} \cdot 2^n \cdot poly(\phi), \bigl( \|A\|_{\max} \bigr)^{5 n} \cdot \min\{cs(A),rs(A)\}^{3 n} \cdot 2^n \cdot poly(\phi). \end{gather*} Here $\|A\|_{\max}$ denotes the maximal absolute value of elements of $A$, $cs(A)$ and $rs(A)$ denote the maximal number of nonzero elements in columns and rows of $A$, respectively. We present similar results for the integer linear counting and optimization problems. Additionally, we apply the last result for multipacking and multicover problems on graphs and hypergraphs, where we need to choose a minimal/maximal multiset of vertices to cover/pack the edges by a prescribed number of times. For example, we show that the stable multiset and vertex multicover problems on simple graphs admit FPT-algorithms with the complexity bound $2^{O(|V|)} \cdot poly(\phi)$, where $V$ is the vertex set of a given graph.

The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e. across conditions or subtypes) and vertical (i.e. across different layers containing data on molecular compartments) integration of information in such datasets. We start with decomposing the multi-layer problem into a series of two-layer problems. For each two-layer problem, we model the outcomes at a node in the lower layer as dependent on those of other nodes in that layer, as well as all nodes in the upper layer. We use a combination of neighborhood selection and group-penalized regression to obtain sparse estimates of all model parameters. Following this, we develop a debiasing technique and asymptotic distributions of inter-layer directed edge weights that utilize already computed neighborhood selection coefficients for nodes in the upper layer. Subsequently, we establish global and simultaneous testing procedures for these edge weights. Performance of the proposed methodology is evaluated on synthetic and real data.

A determinantal point process (DPP) on a collection of $M$ items is a model, parameterized by a symmetric kernel matrix, that assigns a probability to every subset of those items. Recent work shows that removing the kernel symmetry constraint, yielding nonsymmetric DPPs (NDPPs), can lead to significant predictive performance gains for machine learning applications. However, existing work leaves open the question of scalable NDPP sampling. There is only one known DPP sampling algorithm, based on Cholesky decomposition, that can directly apply to NDPPs as well. Unfortunately, its runtime is cubic in $M$, and thus does not scale to large item collections. In this work, we first note that this algorithm can be transformed into a linear-time one for kernels with low-rank structure. Furthermore, we develop a scalable sublinear-time rejection sampling algorithm by constructing a novel proposal distribution. Additionally, we show that imposing certain structural constraints on the NDPP kernel enables us to bound the rejection rate in a way that depends only on the kernel rank. In our experiments we compare the speed of all of these samplers for a variety of real-world tasks.

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.

Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司