亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the complexity of small-depth Frege proofs and give the first tradeoffs between the size of each line and the number of lines. Existing lower bounds apply to the overall proof size -- the sum of sizes of all lines -- and do not distinguish between these notions of complexity. For depth-$d$ Frege proofs of the Tseitin principle where each line is a size-$s$ formula, we prove that $\exp(n/2^{\Omega(d\sqrt{\log s})})$ many lines are necessary. This yields new lower bounds on line complexity that are not implied by H{\aa}stad's recent $\exp(n^{\Omega(1/d)})$ lower bound on the overall proof size. For $s = \mathrm{poly}(n)$, for example, our lower bound remains $\exp(n^{1-o(1)})$ for all $d = o(\sqrt{\log n})$, whereas H{\aa}stad's lower bound is $\exp(n^{o(1)})$ once $d = \omega_n(1)$. Our main conceptual contribution is the simple observation that techniques for establishing correlation bounds in circuit complexity can be leveraged to establish such tradeoffs in proof complexity.

相關內容

We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $\Omega(\sigma\sqrt{T / \log (T)}) \cap \mathcal{O}(\sigma\sqrt{T} \cdot \log T)$ and $\Omega(\sigma / \sqrt{T \log (T)}) \cap \mathcal{O}(\sigma\log T / \sqrt{T})$ respectively, where $\sigma^2$ is the noise variance. Thus, our upper and lower bounds are tight up to a factor of $\mathcal{O}( (\log T)^{1.5} )$. The upper bound uses an algorithm based on confidence bounds and the Markov property of Brownian motion (among other useful properties), and the lower bound is based on a reduction to binary hypothesis testing.

In this paper, we study code-based signatures constructed from Proof of Knowledge (PoK). This line of work can be traced back to Stern who introduces the first efficient PoK for the syndrome decoding problem in 1993. Afterward, different variations were proposed in order to reduce signature's size. In practice, obtaining a smaller signature size relies on the interaction of two main considerations: (i) the underlying protocol and its soundness error and (ii) the type of optimizations which are compatible with a given protocol. Over the years, different variations were proposed to improve the Stern scheme such as the Veron scheme (with public key a noisy codeword rather than a syndrome), the AGS scheme which is a 5-pass protocol with cheating probability asymptotically equal to 1/2 and more recently the FJR approach which permits to decrease the cheating probability to 1/N but induces a performance overhead. Overall the length of the signature depends on a trade-off between: the scheme in itself, the possible optimizations and the cost of the implementation. The recent approaches which increase the cost of the implementation opens the door to many different type of trade-offs. In this paper we propose three new schemes and different trade-offs, which are all interesting in themselves, since depending on potential future optimizations a scheme may eventually become more efficient than another. All the schemes we propose use a trusted helper: a first scheme permits to get a 1/2 cheating probability, a second scheme permits to decrease the cheating probability in 1/N but with a different approach than the recent FJR scheme and at last a third scheme propose a Veron-like adaptation of the FJR scheme in which the public key is a noisy codeword rather than a syndrome. We provide an extensive comparison table which lists various trade-offs between our schemes and previous ones.

Given polynomials $f_0,\dots, f_k$ the Ideal Membership Problem, IMP for short, asks if $f_0$ belongs to the ideal generated by $f_1,\dots, f_k$. In the search version of this problem the task is to find a proof of this fact. The IMP is a well-known fundamental problem with numerous applications, for instance, it underlies many proof systems based on polynomials such as Nullstellensatz, Polynomial Calculus, and Sum-of-Squares. Although the IMP is in general intractable, in many important cases it can be efficiently solved. Mastrolilli [SODA'19] initiated a systematic study of IMPs for ideals arising from Constraint Satisfaction Problems (CSPs), parameterized by constraint languages, denoted IMP($\Gamma$). The ultimate goal of this line of research is to classify all such IMPs accordingly to their complexity. Mastrolilli achieved this goal for IMPs arising from CSP($\Gamma$) where $\Gamma$ is a Boolean constraint language, while Bulatov and Rafiey [ArXiv'21] advanced these results to several cases of CSPs over finite domains. In this paper we consider IMPs arising from CSPs over `affine' constraint languages, in which constraints are subgroups (or their cosets) of direct products of Abelian groups. This kind of CSPs include systems of linear equations and are considered one of the most important types of tractable CSPs. Some special cases of the problem have been considered before by Bharathi and Mastrolilli [MFCS'21] for linear equation modulo 2, and by Bulatov and Rafiey [ArXiv'21] to systems of linear equations over $GF(p)$, $p$ prime. Here we prove that if $\Gamma$ is an affine constraint language then IMP($\Gamma$) is solvable in polynomial time assuming the input polynomial has bounded degree.

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.

Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. It uses pairs of augmentations of unlabeled training examples to define a classification task for pretext learning of a deep embedding. Despite extensive works in augmentation procedures, prior works do not address the selection of challenging negative pairs, as images within a sampled batch are treated independently. This paper addresses the problem, by introducing a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE. When compared to standard CL, the use of adversarial examples creates more challenging positive pairs and adversarial training produces harder negative pairs by accounting for all images in a batch during the optimization. CLAE is compatible with many CL methods in the literature. Experiments show that it improves the performance of several existing CL baselines on multiple datasets.

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.

Quantum machine learning is expected to be one of the first potential general-purpose applications of near-term quantum devices. A major recent breakthrough in classical machine learning is the notion of generative adversarial training, where the gradients of a discriminator model are used to train a separate generative model. In this work and a companion paper, we extend adversarial training to the quantum domain and show how to construct generative adversarial networks using quantum circuits. Furthermore, we also show how to compute gradients -- a key element in generative adversarial network training -- using another quantum circuit. We give an example of a simple practical circuit ansatz to parametrize quantum machine learning models and perform a simple numerical experiment to demonstrate that quantum generative adversarial networks can be trained successfully.

We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main contributions are the development of a simple but general model for the notion of side information, and a corresponding simple matrix-based algorithm for solving the search problem in this general setting. We then specialize this model and algorithm to four common scenarios: Gaussian mixture models, LDA topic models, subspace clustering, and mixed linear regression. For each one of these we show that if (and only if) the side information is informative, we obtain parameter estimates with greater accuracy, and also improved computation complexity than existing moment based mixture model algorithms (e.g. tensor methods). We also illustrate several natural ways one can obtain such side information, for specific problem instances. Our experiments on real data sets (NY Times, Yelp, BSDS500) further demonstrate the practicality of our algorithms showing significant improvement in runtime and accuracy.

北京阿比特科技有限公司