亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The problem of recovering a signal $\boldsymbol x\in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol x^\top\boldsymbol A_i\boldsymbol x,\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol A_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol x$. First, we consider a $k$-sparse $\boldsymbol x$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol x$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent which, when provided a good initialization, produces a sequence linearly converging to $\boldsymbol x$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $x$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. With an estimate correlated with the signal, we develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

相關內容

Martin-L\"{o}f type theory $\mathbf{MLTT}$ was extended by Setzer with the so-called Mahlo universe types. The extension of $\mathbf{MLTT}$ with one Mahlo universe is called $\mathbf{MLM}$ and was introduced to develop a variant of $\mathbf{MLTT}$ equipped with an analogue of a large cardinal. Another instance of constructive systems extended with an analogue of a large set was formulated in the context of Aczel's constructive set theory: $\mathbf{CZF}$. Rathjen, Griffor and Palmgren extended $\mathbf{CZF}$ with inaccessible sets of all transfinite orders. While Rathjen proved that this extended system of $\mathbf{CZF}$ is interpretable in an extension of $\mathbf{MLM}$ with one usual universe type above the Mahlo universe, it is unknown whether it can be interpreted by the Mahlo universe without a universe type above it. We extend $\mathbf{MLM}$ not by a universe type but by the accessibility predicate, and show that $\mathbf{CZF}$ with inaccessible sets can be interpreted in $\mathbf{MLM}$ with the accessibility predicate. Our interpretation of this extension of $\mathbf{CZF}$ is the same as that of Rathjen, Griffor and Palmgren formulated by $\mathbf{MLTT}$ with second-order universe operators, except that we construct the inaccessible sets by using the Mahlo universe and the accessibility predicate. We formalised the main part of our interpretation in the proof assistant Agda.

Given a point set $P$ in a metric space and a real number $t \geq 1$, an \emph{oriented $t$-spanner} is an oriented graph $\overrightarrow{G}=(P,\overrightarrow{E})$, where for every pair of distinct points $p$ and $q$ in $P$, the shortest oriented closed walk in $\overrightarrow{G}$ that contains $p$ and $q$ is at most a factor $t$ longer than the perimeter of the smallest triangle in $P$ containing $p$ and $q$. The \emph{oriented dilation} of a graph $\overrightarrow{G}$ is the minimum $t$ for which $\overrightarrow{G}$ is an oriented $t$-spanner. We present the first algorithm that computes, in Euclidean space, a sparse oriented spanner whose oriented dilation is bounded by a constant. More specifically, for any set of $n$ points in $\mathbb{R}^d$, where $d$ is a constant, we construct an oriented $(2+\varepsilon)$-spanner with $\mathcal{O}(n)$ edges in $\mathcal{O}(n \log n)$ time and $\mathcal{O}(n)$ space. Our construction uses the well-separated pair decomposition and an algorithm that computes a $(1+\varepsilon)$-approximation of the minimum-perimeter triangle in $P$ containing two given query points in $\mathcal{O}(\log n)$ time. While our algorithm is based on first computing a suitable undirected graph and then orienting it, we show that, in general, computing the orientation of an undirected graph that minimises its oriented dilation is NP-hard, even for point sets in the Euclidean plane. We further prove that even if the orientation is already given, computing the oriented dilation is APSP-hard for points in a general metric space. We complement this result with an algorithm that approximates the oriented dilation of a given graph in subcubic time for point sets in $\mathbb{R}^d$, where $d$ is a constant.

The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

Diffuse domain methods (DDMs) have garnered significant attention for approximating solutions to partial differential equations on complex geometries. These methods implicitly represent the geometry by replacing the sharp boundary interface with a diffuse layer of thickness $\varepsilon$, which scales with the minimum grid size. This approach reformulates the original equations on an extended regular domain, incorporating boundary conditions through singular source terms. In this work, we conduct a matched asymptotic analysis of a DDM for a two-sided problem with transmission Robin boundary conditions. Our results show that, in one dimension, the solution of the diffuse domain approximation asymptotically converges to the solution of the original problem, with exactly first-order accuracy in $\varepsilon$. We provide numerical simulations that validate and illustrate the analytical result. Furthermore, for the Neumann boundary condition case, we show that the associated energy functional of the diffuse domain approximation $\Gamma-$convergences to the energy functional of the original problem, and the solution of the diffuse domain approximation strongly converges, up to a subsequence, to the solution of the original problem in $H^1(\Omega)$, as $\varepsilon \to 0$.

We study the problem of solving matrix games of the form $\max_{\mathbf{w}\in\mathcal{W}}\min_{\mathbf{p}\in\Delta}\mathbf{p}^{\top}A\mathbf{w}$, where $A$ is some matrix and $\Delta$ is the probability simplex. This problem encapsulates canonical tasks such as finding a linear separator and computing Nash equilibria in zero-sum games. However, perhaps surprisingly, its inherent complexity (as formalized in the standard framework of oracle complexity [Nemirovski and Yudin, 1983]) is not well-understood. In this work, we first identify different oracle models which are implicitly used by prior algorithms, amounting to multiplying the matrix $A$ by a vector from either one or both sides. We then prove complexity lower bounds for algorithms under both access models, which in particular imply a separation between them. Specifically, we start by proving that algorithms for linear separability based on one-sided multiplications must require $\Omega(\gamma_A^{-2})$ iterations, where $\gamma_A$ is the margin, as matched by the Perceptron algorithm. We then prove that accelerated algorithms for this task, which utilize multiplications from both sides, must require $\tilde{\Omega}(\gamma_{A}^{-2/3})$ iterations, establishing the first oracle complexity barrier for such algorithms. Finally, by adapting our lower bound to $\ell_1$ geometry, we prove that computing an $\epsilon$-approximate Nash equilibrium requires $\tilde{\Omega}(\epsilon^{-2/5})$ iterations, which is an exponential improvement over the previously best-known lower bound due to Hadiji et al. [2024].

Direct sum theorems state that the cost of solving $k$ instances of a problem is at least $\Omega(k)$ times the cost of solving a single instance. We prove the first such results in the randomised parity decision tree model. We show that a direct sum theorem holds whenever (1) the lower bound for parity decision trees is proved using the discrepancy method; or (2) the lower bound is proved relative to a product distribution.

The hypergraph Zarankiewicz's problem, introduced by Erd\H{o}s in 1964, asks for the maximum number of hyperedges in an $r$-partite hypergraph with $n$ vertices in each part that does not contain a copy of $K_{t,t,\ldots,t}$. Erd\H{o}s obtained a near optimal bound of $O(n^{r-1/t^{r-1}})$ for general hypergraphs. In recent years, several works obtained improved bounds under various algebraic assumptions -- e.g., if the hypergraph is semialgebraic. In this paper we study the problem in a geometric setting -- for $r$-partite intersection hypergraphs of families of geometric objects. Our main results are essentially sharp bounds for families of axis-parallel boxes in $\mathbb{R}^d$ and families of pseudo-discs. For axis-parallel boxes, we obtain the sharp bound $O_{d,t}(n^{r-1}(\frac{\log n}{\log \log n})^{d-1})$. The best previous bound was larger by a factor of about $(\log n)^{d(2^{r-1}-2)}$. For pseudo-discs, we obtain the bound $O_t(n^{r-1}(\log n)^{r-2})$, which is sharp up to logarithmic factors. As this hypergraph has no algebraic structure, no improvement of Erd\H{o}s' 60-year-old $O(n^{r-1/t^{r-1}})$ bound was known for this setting. Futhermore, even in the special case of discs for which the semialgebraic structure can be used, our result improves the best known result by a factor of $\tilde{\Omega}(n^{\frac{2r-2}{3r-2}})$. To obtain our results, we use the recently improved results for the graph Zarankiewicz's problem in the corresponding settings, along with a variety of combinatorial and geometric techniques, including shallow cuttings, biclique covers, transversals, and planarity.

In this paper, we describe an algorithm for approximating functions of the form $f(x)=\int_{a}^{b} x^{\mu} \sigma(\mu) \, d \mu$ over $[0,1]$, where $\sigma(\mu)$ is some signed Radon measure, or, more generally, of the form $f(x) = <\sigma(\mu),\, x^\mu>$, where $\sigma(\mu)$ is some distribution supported on $[a,b]$, with $0 <a < b < \infty$. One example from this class of functions is $x^c (\log{x})^m=(-1)^m <\delta^{(m)}(\mu-c), \, x^\mu>$, where $a\leq c \leq b$ and $m \geq 0$ is an integer. Given the desired accuracy $\epsilon$ and the values of $a$ and $b$, our method determines a priori a collection of non-integer powers $t_1$, $t_2$, $\ldots$, $t_N$, so that the functions are approximated by series of the form $f(x)\approx \sum_{j=1}^N c_j x^{t_j}$, and a set of collocation points $x_1$, $x_2$, $\ldots$, $x_N$, such that the expansion coefficients can be found by collocating the function at these points. We prove that our method has a small uniform approximation error which is proportional to $\epsilon$ multiplied by some small constants, and that the number of singular powers and collocation points grows as $N=O(\log{\frac{1}{\epsilon}})$. We demonstrate the performance of our algorithm with several numerical experiments.

Augmenting a smooth cost function with an $\ell_1$ penalty allows analysts to efficiently conduct estimation and variable selection simultaneously in sophisticated models and can be efficiently implemented using proximal gradient methods. However, one drawback of the $\ell_1$ penalty is bias: nonzero parameters are underestimated in magnitude, motivating techniques such as the Adaptive Lasso which endow each parameter with its own penalty coefficient. But it's not clear how these parameter-specific penalties should be set in complex models. In this article, we study the approach of treating the penalty coefficients as additional decision variables to be learned in a \textit{Maximum a Posteriori} manner, developing a proximal gradient approach to joint optimization of these together with the parameters of any differentiable cost function. Beyond reducing bias in estimates, this procedure can also encourage arbitrary sparsity structure via a prior on the penalty coefficients. We compare our method to implementations of specific sparsity structures for non-Gaussian regression on synthetic and real datasets, finding our more general method to be competitive in terms of both speed and accuracy. We then consider nonlinear models for two case studies: COVID-19 vaccination behavior and international refugee movement, highlighting the applicability of this approach to complex problems and intricate sparsity structures.

We propose a novel method that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not-necessarily differentiable objective function $f$ to obtain $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $\delta>0$, we prove that with a sufficiently large power $N_\delta$, this method converges to a solution in the $\delta$-neighborhood of $f$'s global maximum point. The convergence rate is $O(d^2\sigma^4\varepsilon^{-2})$, which is faster than both the standard and single-loop homotopy methods. Extensive experiments show that our method requires significantly fewer iterations than other compared algorithms to produce a high-quality solution.

北京阿比特科技有限公司