亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$ and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,\theta \rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,\theta \rangle$. Specifically, we show that with high probability, \[\sup_{\theta \in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left|\langle X_i,\theta \rangle^\sharp - \lambda^\theta_i \right|^2 \right)^{1/2} \leq c \left( \frac{d}{m} \right)^{1/4},\] where $\lambda^{\theta}_i = m\int_{(\frac{i-1}{m}, \frac{i}{m}]} F_{ \langle X,\theta \rangle }^{-1}(u)\,du$ and $a^\sharp$ denotes the monotone non-decreasing rearrangement of $a$. Moreover, this estimate is optimal. The proof follows from a sharp estimate on the worst Wasserstein distance between a marginal of $X$ and its empirical counterpart, $\frac{1}{m} \sum_{i=1}^m \delta_{\langle X_i, \theta \rangle}$.

相關內容

Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stopping times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be $\mathcal{P}$-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions $\mathcal{P}$. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in $\mathcal{P}$ and in the sample size $n$. This paper fills that gap by deriving a novel $\mathcal{P}$-uniform strong Gaussian approximation theorem, enabling $\mathcal{P}$-uniform anytime-valid inference for the first time. Along the way, our Gaussian approximation also yields a $\mathcal{P}$-uniform law of the iterated logarithm.

We consider the problem of maintaining a $(1+\epsilon)\Delta$-edge coloring in a dynamic graph $G$ with $n$ nodes and maximum degree at most $\Delta$. The state-of-the-art update time is $O_\epsilon(\text{polylog}(n))$, by Duan, He and Zhang [SODA'19] and by Christiansen [STOC'23], and more precisely $O(\log^7 n/\epsilon^2)$, where $\Delta = \Omega(\log^2 n / \epsilon^2)$. The following natural question arises: What is the best possible update time of an algorithm for this task? More specifically, \textbf{ can we bring it all the way down to some constant} (for constant $\epsilon$)? This question coincides with the \emph{static} time barrier for the problem: Even for $(2\Delta-1)$-coloring, there is only a naive $O(m \log \Delta)$-time algorithm. We answer this fundamental question in the affirmative, by presenting a dynamic $(1+\epsilon)\Delta$-edge coloring algorithm with $O(\log^4 (1/\epsilon)/\epsilon^9)$ update time, provided $\Delta = \Omega_\epsilon(\text{polylog}(n))$. As a corollary, we also get the first linear time (for constant $\epsilon$) \emph{static} algorithm for $(1+\epsilon)\Delta$-edge coloring; in particular, we achieve a running time of $O(m \log (1/\epsilon)/\epsilon^2)$. We obtain our results by carefully combining a variant of the \textsc{Nibble} algorithm from Bhattacharya, Grandoni and Wajc [SODA'21] with the subsampling technique of Kulkarni, Liu, Sah, Sawhney and Tarnawski [STOC'22].

Given a definable function $f: S \to \mathbb{R}$ on a definable set $S$, we study sublevel sets of the form $S^f_t = \{x \in S: f(x) \leq t\}$ for all $t \in \mathbb{R}$. Using o-minimal structures, we prove that the Euler characteristic of $S^f_t$ is right continuous with respect to $t$. Furthermore, when $S$ is compact, we show that $S^f_{t+\delta}$ deformation retracts to $S^f_t$ for all sufficiently small $\delta > 0$. Applying these results, we also characterize the connections between the following concepts in topological data analysis: the Euler characteristic transform (ECT), smooth ECT, Euler-Radon transform (ERT), and smooth ERT.

In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more than $\Theta(n^2)$ adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution $D$. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution $D$ would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from $D$. We consider more restricted adversaries, called \emph{balanced}, where each such adversary consists of two separated algorithms: The \emph{sampler} who is the entity that chooses the distribution and provides the samples to the mechanism, and the \emph{analyst} who chooses the adaptive queries, but has no prior knowledge of the underlying distribution (and hence has no a priori advantage with respect to the mechanism). We improve the quality of previous lower bounds by revisiting them using an efficient \emph{balanced} adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded \emph{balanced} adversary that has the structure of all known attacks, implies the existence of public-key cryptography.

A $k$-uniform hypergraph $H = (V, E)$ is $k$-partite if $V$ can be partitioned into $k$ sets $V_1, \ldots, V_k$ such that every edge in $E$ contains precisely one vertex from each $V_i$. We call such a graph $n$-balanced if $|V_i| = n$ for each $i$. An independent set $I$ in $H$ is balanced if $|I\cap V_i| = |I|/k$ for each $i$, and a coloring is balanced if each color class induces a balanced independent set in $H$. In this paper, we provide a lower bound on the balanced independence number $\alpha_b(H)$ in terms of the average degree $D = |E|/n$, and an upper bound on the balanced chromatic number $\chi_b(H)$ in terms of the maximum degree $\Delta$. Our results match those of recent work of Chakraborti for $k = 2$.

We obtain certain algebraic invariants relevant to study codes on subgroups of weighted projective tori inside an $n$-dimensional weighted projective space. As application, we compute all the main parameters of generalized toric codes on these subgroups of tori lying inside a weighted projective plane of the form $\Pp(1,1,a)$.

Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that "copy-average" configurations are critical points if the teacher's incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when $n-1$ student neurons each copy one teacher neuron and the $n$-th student neuron averages the remaining $k-n+1$ teacher neurons. For the student network with $n=1$ neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.

When the unknown regression function of a single variable is known to have derivatives up to the $(\gamma+1)$th order bounded in absolute values by a common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness), the minimax optimal rate of the mean integrated squared error (MISE) is stated as $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ in the literature. This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\frac{\log n}{n\log(\log n)}$ and the optimal degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \frac{\log n}{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if $n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is $\left(\frac{1}{n}\right)^{\frac{2\gamma+2}{2\gamma+3}}$ and the optimal degree of smoothness to exploit is $\gamma+1$. The fundamental contribution of this paper is a set of metric entropy bounds we develop for smooth function classes. Some of our bounds are original, and some of them improve and/or generalize the ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric entropy bounds allow us to show phase transitions in the minimax optimal MISE rates associated with some commonly seen smoothness classes as well as non-standard smoothness classes, and can also be of independent interest outside the nonparametric regression problems.

We present a novel stochastic variational Gaussian process ($\mathcal{GP}$) inference method, based on a posterior over a learnable set of weighted pseudo input-output points (coresets). Instead of a free-form variational family, the proposed coreset-based, variational tempered family for $\mathcal{GP}$s (CVTGP) is defined in terms of the $\mathcal{GP}$ prior and the data-likelihood; hence, accommodating the modeling inductive biases. We derive CVTGP's lower bound for the log-marginal likelihood via marginalization of the proposed posterior over latent $\mathcal{GP}$ coreset variables, and show it is amenable to stochastic optimization. CVTGP reduces the learnable parameter size to $\mathcal{O}(M)$, enjoys numerical stability, and maintains $\mathcal{O}(M^3)$ time- and $\mathcal{O}(M^2)$ space-complexity, by leveraging a coreset-based tempered posterior that, in turn, provides sparse and explainable representations of the data. Results on simulated and real-world regression problems with Gaussian observation noise validate that CVTGP provides better evidence lower-bound estimates and predictive root mean squared error than alternative stochastic $\mathcal{GP}$ inference methods.

We investigate the relation between $\delta$ and $\epsilon$ required for obtaining a $(1+\delta)$-approximation in time $N^{2-\epsilon}$ for closest pair problems under various distance metrics, and for other related problems in fine-grained complexity. Specifically, our main result shows that if it is impossible to (exactly) solve the (bichromatic) inner product (IP) problem for vectors of dimension $c \log N$ in time $N^{2-\epsilon}$, then there is no $(1+\delta)$-approximation algorithm for (bichromatic) Euclidean Closest Pair running in time $N^{2-2\epsilon}$, where $\delta \approx (\epsilon/c)^2$ (where $\approx$ hides $\polylog$ factors). This improves on the prior result due to Chen and Williams (SODA 2019) which gave a smaller polynomial dependence of $\delta$ on $\epsilon$, on the order of $\delta \approx (\epsilon/c)^6$. Our result implies in turn that no $(1+\delta)$-approximation algorithm exists for Euclidean closest pair for $\delta \approx \epsilon^4$, unless an algorithmic improvement for IP is obtained. This in turn is very close to the approximation guarantee of $\delta \approx \epsilon^3$ for Euclidean closest pair, given by the best known algorithm of Almam, Chan, and Williams (FOCS 2016). By known reductions, a similar result follows for a host of other related problems in fine-grained hardness of approximation. Our reduction combines the hardness of approximation framework of Chen and Williams, together with an MA communication protocol for IP over a small alphabet, that is inspired by the MA protocol of Chen (Theory of Computing, 2020).

北京阿比特科技有限公司