亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves qualitatively similarly to orthogonal machine learning: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.

相關內容

We study various aspects of Dyck words appearing in binary sequences, where $0$ is treated as a left parenthesis and $1$ as a right parenthesis. We show that binary words that are $7/3$-power-free have bounded nesting level, but this no longer holds for larger repetition exponents. We give an explicit characterization of the factors of the Thue-Morse word that are Dyck, and show how to count them. We also prove tight upper and lower bounds on $f(n)$, the number of Dyck factors of Thue-Morse of length $2n$.

We provide an algorithm that maintains, against an adaptive adversary, a $(1-\varepsilon)$-approximate maximum matching in $n$-node $m$-edge general (not necessarily bipartite) undirected graph undergoing edge deletions with high probability with (amortized) $O(\mathrm{poly}(\varepsilon^{-1}, \log n))$ time per update. We also obtain the same update time for maintaining a fractional approximate weighted matching (and hence an approximation to the value of the maximum weight matching) and an integral approximate weighted matching in dense graphs. Our unweighted result improves upon the prior state-of-the-art which includes a $\mathrm{poly}(\log{n}) \cdot 2^{O(1/\varepsilon^2)}$ update time [Assadi-Bernstein-Dudeja 2022] and an $O(\sqrt{m} \varepsilon^{-2})$ update time [Gupta-Peng 2013], and our weighted result improves upon the $O(\sqrt{m}\varepsilon^{-O(1/\varepsilon)}\log{n})$ update time due to [Gupta-Peng 2013]. To obtain our results, we generalize a recent optimization approach to dynamic algorithms from [Jambulapati-Jin-Sidford-Tian 2022]. We show that repeatedly solving entropy-regularized optimization problems yields a lazy updating scheme for fractional decremental problems with a near-optimal number of updates. To apply this framework we develop optimization methods compatible with it and new dynamic rounding algorithms for the matching polytope.

In this paper, we consider the problem of maintaining a $(1-\varepsilon)$-approximate maximum weight matching in a dynamic graph $G$, while the adversary makes changes to the edges of the graph. In the fully dynamic setting, where both edge insertions and deletions are allowed, Gupta and Peng gave an algorithm for this problem with an update time of $\tilde{O}_{\varepsilon}(\sqrt{m})$. We study a natural relaxation of this problem, namely the decremental model, where the adversary is only allowed to delete edges. For the cardinality version of this problem in general (possibly, non-bipartite) graphs, Assadi, Bernstein, and Dudeja gave a decremental algorithm with update time $O_{\varepsilon}(\text{poly}(\log n))$. However, beating $\tilde{O}_{\varepsilon}(\sqrt{m})$ update time remained an open problem for the \emph{weighted} version in \emph{general graphs}. In this paper, we bridge the gap between unweighted and weighted general graphs for the decremental setting. We give a $O_{\varepsilon}(\text{poly}(\log n))$ update time algorithm that maintains a $(1-\varepsilon)$-approximate maximum weight matching under adversarial deletions. Like the decremental algorithm of Assadi, Bernstein, and Dudeja, our algorithm is randomized, but works against an adaptive adversary. It also matches the time bound for the cardinality version upto dependencies on $\varepsilon$ and a $\log R$ factor, where $R$ is the ratio between the maximum and minimum edge weight in $G$.

We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq \epsilon\|b\|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

L1-norm regularized logistic regression models are widely used for analyzing data with binary response. In those analyses, fusing regression coefficients is useful for detecting groups of variables. This paper proposes a binomial logistic regression model with Bayesian fused lasso. Assuming a Laplace prior on regression coefficients and differences between adjacent regression coefficients enables us to perform variable selection and variable fusion simultaneously in the Bayesian framework. We also propose assuming a horseshoe prior on the differences to improve the flexibility of variable fusion. The Gibbs sampler is derived to estimate the parameters by a hierarchical expression of priors and a data-augmentation method. Using simulation studies and real data analysis, we compare the proposed methods with the existing method.

We propose a novel framework DropTop that suppresses the shortcut bias in online continual learning (OCL) while being adaptive to the varying degree of the shortcut bias incurred by continuously changing environment. By the observed high-attention property of the shortcut bias, highly-activated features are considered candidates for debiasing. More importantly, resolving the limitation of the online environment where prior knowledge and auxiliary data are not ready, two novel techniques -- feature map fusion and adaptive intensity shifting -- enable us to automatically determine the appropriate level and proportion of the candidate shortcut features to be dropped. Extensive experiments on five benchmark datasets demonstrate that, when combined with various OCL algorithms, DropTop increases the average accuracy by up to 10.4% and decreases the forgetting by up to 63.2%.

In the standard formulation of the denoising problem, one is given a probabilistic model relating a latent variable $\Theta \in \Omega \subset \mathbb{R}^m \; (m\ge 1)$ and an observation $Z \in \mathbb{R}^d$ according to: $Z \mid \Theta \sim p(\cdot\mid \Theta)$ and $\Theta \sim G^*$, and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating $\Theta$ from $Z$, attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the $Z$, and in general may fail to capture the geometric features of the prior distribution $G^*$ (e.g., low dimensionality, discreteness, sparsity, etc.). To rectify these drawbacks, in this paper we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to propose a new OT-based denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, our OT-based denoiser is well-defined and unique, and is closely connected to solutions to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, our OT-based denoiser can be recovered solely from information of the marginal distribution of $Z$ and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of a standard multimarginal OT (MOT) problem. In particular, thanks to Tweedie's formula, when the likelihood model $\{ p(\cdot \mid \theta) \}_{\theta \in \Omega}$ is an exponential family of distributions, the OT-based denoiser can be recovered solely from the marginal distribution of $Z$. In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.

Oblivious routing is a well-studied paradigm that uses static precomputed routing tables for selecting routing paths within a network. Existing oblivious routing schemes with polylogarithmic competitive ratio for general networks are tree-based, in the sense that routing is performed according to a convex combination of trees. However, this restriction to trees leads to a construction that has time quadratic in the size of the network and does not parallelize well. In this paper we study oblivious routing schemes based on electrical routing. In particular, we show that general networks with $n$ vertices and $m$ edges admit a routing scheme that has competitive ratio $O(\log^2 n)$ and consists of a convex combination of only $O(\sqrt{m})$ electrical routings. This immediately leads to an improved construction algorithm with time $\tilde{O}(m^{3/2})$ that can also be implemented in parallel with $\tilde{O}(\sqrt{m})$ depth.

We give a fully dynamic algorithm maintaining a $(1-\varepsilon)$-approximate directed densest subgraph in $\tilde{O}(\log^3(n)/\varepsilon^6)$ amortized time or $\tilde{O}(\log^4(n)/\varepsilon^7)$ per edge update (where $\tilde{O}$ hides $\log\log$ factors), based on earlier work by Chekuri and Quanrud [arXiv:2210.02611]. This result improves on earlier work done by Sawlani and Wang [arXiv:1907.03037], which guarantees $O(\log^5(n)/\varepsilon^7)$ worst case time for edge insertions and deletions.

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

北京阿比特科技有限公司