亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$, where any such concept depends only on $K$ out of $n$ attributes of the input. Our main contribution is a new algorithm that runs in time $({nd}/{\epsilon})^{O(d)}$ and under the Gaussian marginal distribution, PAC learns the class up to error rate $\epsilon$ with $O(\frac{K^{4d}}{\epsilon^{2d}} \cdot \log^{5d} n)$ samples even when an $\eta \leq O(\epsilon^d)$ fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-$2d$ polynomial as a filter to detect corrupted samples.

相關內容

In this work, we give a statistical characterization of the $\gamma$-regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is $\gamma$ times the optimal solution. The $\gamma$-regret emerges in structured bandit problems over a function class $\mathcal{F}$ where finding an exact optimum of $f \in \mathcal{F}$ is intractable. Our characterization is given in terms of the $\gamma$-DEC, a statistical complexity parameter for the class $\mathcal{F}$, which is a modification of the constrained Decision-Estimation Coefficient (DEC) of Foster et al., 2023 (and closely related to the original offset DEC of Foster et al., 2021). Our lower bound shows that the $\gamma$-DEC is a fundamental limit for any model class $\mathcal{F}$: for any algorithm, there exists some $f \in \mathcal{F}$ for which the $\gamma$-regret of that algorithm scales (nearly) with the $\gamma$-DEC of $\mathcal{F}$. We provide an upper bound showing that there exists an algorithm attaining a nearly matching $\gamma$-regret. Due to significant challenges in applying the prior results on the DEC to the $\gamma$-regret case, both our lower and upper bounds require novel techniques and a new algorithm.

It has been observed by several authors that well-known periodization strategies like tent or Chebychev transforms lead to remarkable results for the recovery of multivariate functions from few samples. So far, theoretical guarantees are missing. The goal of this paper is twofold. On the one hand, we give such guarantees and briefly describe the difficulties of the involved proof. On the other hand, we combine these periodization strategies with recent novel constructive methods for the efficient subsampling of finite frames in $\mathbb{C}$. As a result we are able to reconstruct non-periodic multivariate functions from very few samples. The used sampling nodes are the result of a two-step procedure. Firstly, a random draw with respect to the Chebychev measure provides an initial node set. A further sparsification technique selects a significantly smaller subset of these nodes with equal approximation properties. This set of sampling nodes scales linearly in the dimension of the subspace on which we project and works universally for the whole class of functions. The method is based on principles developed by Batson, Spielman, and Srivastava and can be numerically implemented. Samples on these nodes are then used in a (plain) least-squares sampling recovery step on a suitable hyperbolic cross subspace of functions resulting in a near-optimal behavior of the sampling error. Numerical experiments indicate the applicability of our results.

The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $\delta$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(\delta)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(\delta)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.

We demonstrate possibility for consensus under the model and conditions used by Fischer, Lynch, and Patterson (FLP) to prove impossibility of binary consensus - in complete asynchrony and up to one unannounced process crash-fail. We also show that: i) assembling by every process a dataset containing the initial values of individual processes is an inevitable phase of binary consensus; and ii) agreeing on this dataset is sufficient for a quasi-binary consensus. Key finding: Direct causal relationship between complete asynchrony and the impossibility to solve consensus does not exist. The impossibility is caused by the dependence of agreement on the content of the initial values.

Randomized quadratures for integrating functions in Sobolev spaces of order $\alpha \ge 1$, where the integrability condition is with respect to the Gaussian measure, are considered. In this function space, the optimal rate for the worst-case root-mean-squared error (RMSE) is established. Here, optimality is for a general class of quadratures, in which adaptive non-linear algorithms with a possibly varying number of function evaluations are also allowed. The optimal rate is given by showing matching bounds. First, a lower bound on the worst-case RMSE of $O(n^{-\alpha-1/2})$ is proven, where $n$ denotes an upper bound on the expected number of function evaluations. It turns out that a suitably randomized trapezoidal rule attains this rate, up to a logarithmic factor. A practical error estimator for this trapezoidal rule is also presented. Numerical results support our theory.

This paper is devoted to the statistical and numerical properties of the geometric median, and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in R^d, and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension d; (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a "local quadratic growth" condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself, and propose a simple stopping rule applicable to any optimization method which yields explicit error guarantees. We conclude with the numerical experiments including the application to estimation of mean values of log-returns for S&P 500 data.

Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{SurCo}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. The surrogate costs are learned end-to-end with nonlinear loss by differentiating through the linear surrogate solver, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We propose three $\texttt{SurCo}$ variants: $\texttt{SurCo}-\texttt{zero}$ for individual nonlinear problems, $\texttt{SurCo}-\texttt{prior}$ for problem distributions, and $\texttt{SurCo}-\texttt{hybrid}$ to combine both distribution and problem-specific information. We give theoretical intuition motivating $\texttt{SurCo}$, and evaluate it empirically. Experiments show that $\texttt{SurCo}$ finds better solutions faster than state-of-the-art and domain expert approaches in real-world optimization problems such as embedding table sharding, inverse photonic design, and nonlinear route planning.

We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the posterior mean histogram and with respect to posterior contraction: under the class of Borel probability measures and some classes of smooth densities. The attained memory footprint overcomes existing minimax optimal procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class. Additionally constructing both the posterior mean histogram and the posterior itself can be done super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the coverage provided by the $d < 2v$ case, our results are of most practical interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide simulations demonstrating the theory in several of these instances.

The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the fragments are relatively long, and typically no fragment appears more than once, the scaling of the failure probability of maximum likelihood reconstruction algorithm is exactly determined for perfect reconstruction and bounded for partial reconstruction. Second, the regime in which the fragments are relatively short and repeating fragments abound is characterized. A trade-off is stated between the fraction of fragments that cannot be adequately reconstructed vs. the distortion level allowed for the reconstruction of each fragment, while still allowing vanishing failure probability

Image segmentation is still an open problem especially when intensities of the interested objects are overlapped due to the presence of intensity inhomogeneity (also known as bias field). To segment images with intensity inhomogeneities, a bias correction embedded level set model is proposed where Inhomogeneities are Estimated by Orthogonal Primary Functions (IEOPF). In the proposed model, the smoothly varying bias is estimated by a linear combination of a given set of orthogonal primary functions. An inhomogeneous intensity clustering energy is then defined and membership functions of the clusters described by the level set function are introduced to rewrite the energy as a data term of the proposed model. Similar to popular level set methods, a regularization term and an arc length term are also included to regularize and smooth the level set function, respectively. The proposed model is then extended to multichannel and multiphase patterns to segment colourful images and images with multiple objects, respectively. It has been extensively tested on both synthetic and real images that are widely used in the literature and public BrainWeb and IBSR datasets. Experimental results and comparison with state-of-the-art methods demonstrate that advantages of the proposed model in terms of bias correction and segmentation accuracy.

北京阿比特科技有限公司