亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.

相關內容

Given a database of bit strings $A_1,\ldots,A_m\in \{0,1\}^n$, a fundamental data structure task is to estimate the distances between a given query $B\in \{0,1\}^n$ with all the strings in the database. In addition, one might further want to ensure the integrity of the database by releasing these distance statistics in a secure manner. In this work, we propose differentially private (DP) data structures for this type of tasks, with a focus on Hamming and edit distance. On top of the strong privacy guarantees, our data structures are also time- and space-efficient. In particular, our data structure is $\epsilon$-DP against any sequence of queries of arbitrary length, and for any query $B$ such that the maximum distance to any string in the database is at most $k$, we output $m$ distance estimates. Moreover, - For Hamming distance, our data structure answers any query in $\widetilde O(mk+n)$ time and each estimate deviates from the true distance by at most $\widetilde O(k/e^{\epsilon/\log k})$; - For edit distance, our data structure answers any query in $\widetilde O(mk^2+n)$ time and each estimate deviates from the true distance by at most $\widetilde O(k/e^{\epsilon/(\log k \log n)})$. For moderate $k$, both data structures support sublinear query operations. We obtain these results via a novel adaptation of the randomized response technique as a bit flipping procedure, applied to the sketched strings.

Accurate approximation of a real-valued function depends on two aspects of the available data: the density of inputs within the domain of interest and the variation of the outputs over that domain. There are few methods for assessing whether the density of inputs is \textit{sufficient} to identify the relevant variations in outputs -- i.e., the ``geometric scale'' of the function -- despite the fact that sampling density is closely tied to the success or failure of an approximation method. In this paper, we introduce a general purpose, computational approach to detecting the geometric scale of real-valued functions over a fixed domain using a deterministic interpolation technique from computational geometry. The algorithm is intended to work on scalar data in moderate dimensions (2-10). Our algorithm is based on the observation that a sequence of piecewise linear interpolants will converge to a continuous function at a quadratic rate (in $L^2$ norm) if and only if the data are sampled densely enough to distinguish the feature from noise (assuming sufficiently regular sampling). We present numerical experiments demonstrating how our method can identify feature scale, estimate uncertainty in feature scale, and assess the sampling density for fixed (i.e., static) datasets of input-output pairs. We include analytical results in support of our numerical findings and have released lightweight code that can be adapted for use in a variety of data science settings.

In submodular multiway partition (SUB-MP), the input is a non-negative submodular function $f:2^V \rightarrow \mathbb{R}_{\ge 0}$ given by an evaluation oracle along with $k$ terminals $t_1, t_2, \ldots, t_k\in V$. The goal is to find a partition $V_1, V_2, \ldots, V_k$ of $V$ with $t_i\in V_i$ for every $i\in [k]$ in order to minimize $\sum_{i=1}^k f(V_i)$. In this work, we focus on SUB-MP when the input function is monotone (termed MONO-SUB-MP). MONO-SUB-MP formulates partitioning problems over several interesting structures -- e.g., matrices, matroids, graphs, and hypergraphs. MONO-SUB-MP is NP-hard since the graph multiway cut problem can be cast as a special case. We investigate the approximability of MONO-SUB-MP: we show that it admits a $4/3$-approximation and does not admit a $(10/9-\epsilon)$-approximation for every constant $\epsilon>0$. Next, we study a special case of MONO-SUB-MP where the monotone submodular function of interest is the coverage function of an input graph, termed GRAPH-COVERAGE-MP. GRAPH-COVERAGE-MP is equivalent to the classic multiway cut problem for the purposes of exact optimization. We show that GRAPH-COVERAGE-MP admits a $1.125$-approximation and does not admit a $(1.00074-\epsilon)$-approximation for every constant $\epsilon>0$ assuming the Unique Games Conjecture. These results separate GRAPH-COVERAGE-MP from graph multiway cut in terms of approximability.

We consider metrical task systems on general metric spaces with $n$ points, and show that any fully randomized algorithm can be turned into a randomized algorithm that uses only $2\log n$ random bits, and achieves the same competitive ratio up to a factor $2$. This provides the first order-optimal barely random algorithms for metrical task systems, i.e., which use a number of random bits that does not depend on the number of requests addressed to the system. We discuss implications on various aspects of online decision-making such as: distributed systems, advice complexity, and transaction costs, suggesting broad applicability. We put forward an equivalent view that we call collective metrical task systems where $k$ agents in a metrical task system team up, and suffer the average cost paid by each agent. Our results imply that such a team can be $O(\log^2 n)$-competitive as soon as $k\geq n^2$. In comparison, a single agent is always $\Omega(n)$-competitive.

Multivariate random effects with unstructured variance-covariance matrices of large dimensions, $q$, can be a major challenge to estimate. In this paper, we introduce a new implementation of a reduced-rank approach to fit large dimensional multivariate random effects by writing them as a linear combination of $d < q$ latent variables. By adding reduced-rank functionality to the package glmmTMB, we enhance the mixed models available to include random effects of dimensions that were previously not possible. We apply the reduced-rank random effect to two examples, estimating a generalized latent variable model for multivariate abundance data and a random-slopes model.

Neural collapse ($\mathcal{NC}$) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored $\mathcal{NC}$ in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs. This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards $\mathcal{NC}$. We find that $\mathcal{NC}$ properties that develop with scale (and regularization) are linked to generalization. Moreover, there is evidence of some relationship between $\mathcal{NC}$ and generalization independent of scale. Our work thereby underscores the generality of $\mathcal{NC}$ as it extends to the novel and more challenging setting of language modeling. Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on $\mathcal{NC}$-related properties. Our code is hosted on GitHub at //github.com/rhubarbwu/linguistic-collapse .

We provide bounds on the tail probabilities for simple procedures that generate random samples _without replacement_, when the probabilities of being selected need not be equal.

Traditional CNC technology mostly uses the method of increasing the degree of interpolation polynomial when constructing $C^2$ continuous NURBS curves, but this often leads to the appearance of Runge phenomenon in interpolation curves. Alternatively,the method of adding boundary conditions at the endpoints can often make it difficult to control the error range of the interpolation curve. This article presents a $C^2$ continuous cubic B-spline curve interpolation method,which achieves $C^2$ continuity of the interpolation curve when the interpolation polynomial is cubic. At the same time, this article also studies the corresponding error control methods.

Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

北京阿比特科技有限公司