亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These subgradient methods are ``embedded'' into our framework, in the sense that they are incorporated as black-box updates to the primal variables. We prove that our proposed framework inherits the global convergence guarantees from these embedded subgradient methods under mild conditions. In addition, we show that our framework can be extended to solve constrained optimization problems with expectation constraints. Based on the proposed framework, we show that a wide range of existing stochastic subgradient methods, including the proximal SGD, proximal momentum SGD, and proximal ADAM, can be embedded into Lagrangian-based methods. Preliminary numerical experiments on deep learning tasks illustrate that our proposed framework yields efficient variants of Lagrangian-based methods with convergence guarantees for nonconvex nonsmooth constrained optimization problems.

相關內容

A nearest neighbor representation of a Boolean function $f$ is a set of vectors (anchors) labeled by $0$ or $1$ such that $f(\vec{x}) = 1$ if and only if the closest anchor to $\vec{x}$ is labeled by $1$. This model was introduced by Hajnal, Liu, and Tur\'an (2022), who studied bounds on the number of anchors required to represent Boolean functions under different choices of anchors (real vs. Boolean vectors) as well as the more expressive model of $k$-nearest neighbors. We initiate the study of the representational power of nearest and $k$-nearest neighbors through Boolean circuit complexity. To this end, we establish a connection between Boolean functions with polynomial nearest neighbor complexity and those that can be efficiently represented by classes based on linear inequalities -- min-plus polynomial threshold functions -- previously studied in relation to threshold circuits. This extends an observation of Hajnal et al. (2022). We obtain exponential lower bounds on the $k$-nearest neighbors complexity of explicit $n$-variate functions, assuming $k \leq n^{1-\epsilon}$. Previously, no superlinear lower bound was known for any $k>1$. Next, we further extend the connection between nearest neighbor representations and circuits to the $k$-nearest neighbors case. As a result, we show that proving superpolynomial lower bounds for the $k$-nearest neighbors complexity of an explicit function for arbitrary $k$ would require a breakthrough in circuit complexity. In addition, we prove an exponential separation between the nearest neighbor and $k$-nearest neighbors complexity (for unrestricted $k$) of an explicit function. These results address questions raised by Hajnal et al. (2022) of proving strong lower bounds for $k$-nearest neighbors and understanding the role of the parameter $k$. Finally, we devise new bounds on the nearest neighbor complexity for several explicit functions.

We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.

We provide a self contained proof of a result of Dudley [Dud64]} which shows that a bounded convex-body in $\Re^d$ can be $\varepsilon$-approximated, by the intersection of $O_d\bigl(\varepsilon^{-(d-1)/2} \bigr)$ halfspaces, where $O_d$ hides constants that depends on $d$.

We provide a framework to analyze the convergence of discretized kinetic Langevin dynamics for $M$-$\nabla$Lipschitz, $m$-convex potentials. Our approach gives convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration schemes which are popular in the molecular dynamics and machine learning communities. Further, we introduce the property ``$\gamma$-limit convergent" (GLC) to characterize underdamped Langevin schemes that converge to overdamped dynamics in the high-friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement. Finally, we provide asymptotic bias estimates for the BAOAB scheme, which remain accurate in the high-friction limit by comparison to a modified stochastic dynamics which preserves the invariant measure.

Self-supervised contrastive learning, which directly extracts inherent data correlations from unlabeled data, has been widely utilized to mitigate the data sparsity issue in sequential recommendation. The majority of existing methods create different augmented views of the same user sequence via random augmentation, and subsequently minimize their distance in the embedding space to enhance the quality of user representations. However, random augmentation often disrupts the semantic information and interest evolution pattern inherent in the user sequence, leading to the generation of semantically distinct augmented views. Promoting similarity of these semantically diverse augmented sequences can render the learned user representations insensitive to variations in user preferences and interest evolution, contradicting the core learning objectives of sequential recommendation. To address this issue, we leverage the inherent characteristics of sequential recommendation and propose the use of context information to generate more reasonable augmented positive samples. Specifically, we introduce a context-aware diffusion-based contrastive learning method for sequential recommendation. Given a user sequence, our method selects certain positions and employs a context-aware diffusion model to generate alternative items for these positions with the guidance of context information. These generated items then replace the corresponding original items, creating a semantically consistent augmented view of the original sequence. Additionally, to maintain representation cohesion, item embeddings are shared between the diffusion model and the recommendation model, and the entire framework is trained in an end-to-end manner. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method.

Vizing's celebrated theorem states that every simple graph with maximum degree $\Delta$ admits a $(\Delta+1)$ edge coloring which can be found in $O(m \cdot n)$ time on $n$-vertex $m$-edge graphs. This is just one color more than the trivial lower bound of $\Delta$ colors needed in any proper edge coloring. After a series of simplifications and variations, this running time was eventually improved by Gabow, Nishizeki, Kariv, Leven, and Terada in 1985 to $O(m\sqrt{n\log{n}})$ time. This has effectively remained the state-of-the-art modulo an $O(\sqrt{\log{n}})$-factor improvement by Sinnamon in 2019. As our main result, we present a novel randomized algorithm that computes a $\Delta+O(\log{n})$ coloring of any given simple graph in $O(m\log{\Delta})$ expected time; in other words, a near-linear time randomized algorithm for a ``near''-Vizing's coloring. As a corollary of this algorithm, we also obtain the following results: * A randomized algorithm for $(\Delta+1)$ edge coloring in $O(n^2\log{n})$ expected time. This is near-linear in the input size for dense graphs and presents the first polynomial time improvement over the longstanding bounds of Gabow et.al. for Vizing's theorem in almost four decades. * A randomized algorithm for $(1+\varepsilon) \Delta$ edge coloring in $O(m\log{(1/\varepsilon)})$ expected time for any $\varepsilon = \omega(\log{n}/\Delta)$. The dependence on $\varepsilon$ exponentially improves upon a series of recent results that obtain algorithms with runtime of $\Omega(m/\varepsilon)$ for this problem.

Let $\alpha(G)$ denote the cardinality of a maximum independent set, while $\mu(G)$ be the size of a maximum matching in $G=\left( V,E\right) $. It is known that if $\alpha(G)+\mu(G)=\left\vert V\right\vert $, then $G$ is a K\"{o}nig-Egerv\'{a}ry graph. The critical difference $d(G)$ is $\max\{d(I):I\in\mathrm{Ind}(G)\}$, where $\mathrm{Ind}(G)$\ denotes the family of all independent sets of $G$. If $A\in\mathrm{Ind}(G)$ with $d\left( X\right) =d(G)$, then $A$ is a critical independent set. For a graph $G$, let $\mathrm{diadem}(G)=\bigcup\{S:S$ is a critical independent set in $G\}$, and $\varrho_{v}\left( G\right) $ denote the number of vertices $v\in V\left( G\right) $, such that $G-v$ is a K\"{o}nig-Egerv\'{a}ry graph. A graph is called almost bipartite if it has a unique odd cycle. In this paper, we show that if $G$ is an almost bipartite non-K\"{o}nig-Egerv\'{a}ry graph with the unique odd cycle $C$, then the following assertions are true: 1. every maximum matching of $G$ contains $\left\lfloor {V(C)}/{2}\right\rfloor $ edges belonging to $C$; 2. $V(C)\cup N_{G}\left[ \mathrm{diadem}\left( G\right) \right] =V$ and $V(C)\cap N_{G}\left[ \mathrm{diadem}\left( G\right) \right] =\emptyset$; 3. $\varrho_{v}\left( G\right) =\left\vert \mathrm{corona}\left( G\right) \right\vert -\left\vert \mathrm{diadem}\left( G\right) \right\vert $, where $\mathrm{corona}\left( G\right) $ is the union of all maximum independent sets of $G$; 4. $\varrho_{v}\left( G\right) =\left\vert V\right\vert $ if and only if $G=C_{2k+1}$ for some integer $k\geq1$.

In this work, we address the problem of approximate pattern matching with wildcards. Given a pattern $P$ of length $m$ containing $D$ wildcards, a text $T$ of length $n$, and an integer $k$, our objective is to identify all fragments of $T$ within Hamming distance $k$ from $P$. Our primary contribution is an algorithm with runtime $O(n+(D+k)(G+k)\cdot n/m)$ for this problem. Here, $G \le D$ represents the number of maximal wildcard fragments in $P$. We derive this algorithm by elaborating in a non-trivial way on the ideas presented by [Charalampopoulos et al., FOCS'20] for pattern matching with mismatches (without wildcards). Our algorithm improves over the state of the art when $D$, $G$, and $k$ are small relative to $n$. For instance, if $m = n/2$, $k=G=n^{2/5}$, and $D=n^{3/5}$, our algorithm operates in $O(n)$ time, surpassing the $\Omega(n^{6/5})$ time requirement of all previously known algorithms. In the case of exact pattern matching with wildcards ($k=0$), we present a much simpler algorithm with runtime $O(n+DG\cdot n/m)$ that clearly illustrates our main technical innovation: the utilisation of positions of $P$ that do not belong to any fragment of $P$ with a density of wildcards much larger than $D/m$ as anchors for the sought (approximate) occurrences. Notably, our algorithm outperforms the best-known $O(n\log m)$-time FFT-based algorithms of [Cole and Hariharan, STOC'02] and [Clifford and Clifford, IPL'04] if $DG = o(m\log m)$. We complement our algorithmic results with a structural characterization of the $k$-mismatch occurrences of $P$. We demonstrate that in a text of length $O(m)$, these occurrences can be partitioned into $O((D+k)(G+k))$ arithmetic progressions. Additionally, we construct an infinite family of examples with $\Omega((D+k)k)$ arithmetic progressions of occurrences, leveraging a combinatorial result on progression-free sets [Elkin, SODA'10].

In the trace reconstruction problem, one observes the output of passing a binary string $s \in \{0,1\}^n$ through a deletion channel $T$ times and wishes to recover $s$ from the resulting $T$ "traces." Most of the literature has focused on characterizing the hardness of this problem in terms of the number of traces $T$ needed for perfect reconstruction either in the worst case or in the average case (over input sequences $s$). In this paper, we propose an alternative, instance-based approach to the problem. We define the "Levenshtein difficulty" of a problem instance $(s,T)$ as the probability that the resulting traces do not provide enough information for correct recovery with full certainty. One can then try to characterize, for a specific $s$, how $T$ needs to scale in order for the Levenshtein difficulty to go to zero, and seek reconstruction algorithms that match this scaling for each $s$. For a class of binary strings with alternating long runs, we precisely characterize the scaling of $T$ for which the Levenshtein difficulty goes to zero. For this class, we also prove that a simple "Las Vegas algorithm" has an error probability that decays to zero with the same rate as that with which the Levenshtein difficulty tends to zero.

We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document $x \in X$ is indexed by $t \in T$, and a neural autoregressive model is trained to map queries $Q$ to $T$. GDR can be considered to involve information transmission from documents $X$ to queries $Q$, with the requirement to transmit more bits via the indexes $T$. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes $T$ can then be regarded as a {\em bottleneck} in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.

北京阿比特科技有限公司