亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.

相關內容

Privately counting distinct elements in a stream is a fundamental data analysis problem with many applications in machine learning. In the turnstile model, Jain et al. [NeurIPS2023] initiated the study of this problem parameterized by the maximum flippancy of any element, i.e., the number of times that the count of an element changes from 0 to above 0 or vice versa. They give an item-level $(\epsilon,\delta)$-differentially private algorithm whose additive error is tight with respect to that parameterization. In this work, we show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level $(\epsilon,\delta)$-differential privacy and item-level $\epsilon$-differential privacy with regards to a different parameterization, namely the sum of all flippancies. Our second result is a bound which shows that for a large class of algorithms, including all existing differentially private algorithms for this problem, the lower bound from item-level differential privacy extends to event-level differential privacy. This partially answers an open question by Jain et al. [NeurIPS2023].

Extreme Classification (XC) aims to map a query to the most relevant documents from a very large document set. XC algorithms used in real-world applications learn this mapping from datasets curated from implicit feedback, such as user clicks. However, these datasets inevitably suffer from missing labels. In this work, we observe that systematic missing labels lead to missing knowledge, which is critical for accurately modelling relevance between queries and documents. We formally show that this absence of knowledge cannot be recovered using existing methods such as propensity weighting and data imputation strategies that solely rely on the training dataset. While LLMs provide an attractive solution to augment the missing knowledge, leveraging them in applications with low latency requirements and large document sets is challenging. To incorporate missing knowledge at scale, we propose SKIM (Scalable Knowledge Infusion for Missing Labels), an algorithm that leverages a combination of small LM and abundant unstructured meta-data to effectively mitigate the missing label problem. We show the efficacy of our method on large-scale public datasets through exhaustive unbiased evaluation ranging from human annotations to simulations inspired from industrial settings. SKIM outperforms existing methods on Recall@100 by more than 10 absolute points. Additionally, SKIM scales to proprietary query-ad retrieval datasets containing 10 million documents, outperforming contemporary methods by 12% in offline evaluation and increased ad click-yield by 1.23% in an online A/B test conducted on a popular search engine. We release our code, prompts, trained XC models and finetuned SLMs at: //github.com/bicycleman15/skim

Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.

The goal of this paper is to demonstrate the general modeling and practical simulation of approximate solutions of random equations with mixture model parameter random variables. Random equations, understood as stationary (non-dynamical) equations with parameters as random variables, have a long history and a broad range of applications. The specific novelty of this explorative study lies on the demonstration of the combinatorial complexity of these equations with mixture model parameters. In a Bayesian argumentation framework, we derive a likelihood function and posterior density of approximate best fit solutions while avoiding significant restrictions about the type of nonlinearity of the equation or mixture models, and demonstrate their numerically efficient implementation for the applied researcher. In the results section, we are specifically focusing on expressive example simulations showcasing the combinatorial potential of random linear equation systems and nonlinear systems of random conic section equations. Introductory applications to portfolio optimization, stochastic control and random matrix theory are provided in order to show the wide applicability of the presented methodology.

Jacobi sets are an important tool to study the relationship between functions. Defined as the set of all points where the function's gradients are linearly dependent, Jacobi sets extend the notion of critical point to multifields. In practice, Jacobi sets for piecewise-linear approximations of smooth functions can become very complex and large due to noise and numerical errors. Existing methods that simplify Jacobi sets exist, but either do not address how the functions' values have to change in order to have simpler Jacobi sets or remain purely theoretical. In this paper, we present a method that modifies 2D bivariate scalar fields such that Jacobi set components that are due to noise are removed, while preserving the essential structures of the fields. The method uses the Jacobi set to decompose the domain, stores the and weighs the resulting regions in a neighborhood graph, which is then used to determine which regions to join by collapsing the image of the region's cells. We investigate the influence of different tie-breaks when building the neighborhood graphs and the treatment of collapsed cells. We apply our algorithm to a range of datasets, both analytical and real-world and compare its performance to simple data smoothing.

This thesis embarks on a comprehensive exploration of formal computational models that underlie typed programming languages. We focus on programming calculi, both functional (sequential) and concurrent, as they provide a compelling rigorous framework for evaluating program semantics and for developing analyses and program verification techniques. This is the full version of the thesis containing appendices.

We consider the problem of learning stable matchings with unknown preferences in a decentralized and uncoordinated manner, where "decentralized" means that players make decisions individually without the influence of a central platform, and "uncoordinated" means that players do not need to synchronize their decisions using pre-specified rules. First, we provide a game formulation for this problem with known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and mixed NE can be rounded to a stable matching. Then, we show that for hierarchical markets, applying the exponential weight (EXP) learning algorithm to the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated fashion. Moreover, we show that EXP converges locally and exponentially fast to a stable matching in general markets. We also introduce another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability. Finally, we provide stronger feedback conditions under which it is possible to drive the market faster toward an approximate stable matching. Our proposed game-theoretic framework bridges the discrete problem of learning stable matchings with the problem of learning NE in continuous-action games.

Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at //github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.

Classical knowledge graph completion (KGC) methods rely solely on structural information, struggling with the inherent sparsity of knowledge graphs (KGs). Large Language Models (LLMs) learn extensive knowledge from large corpora with powerful context modeling, which is ideal for mitigating the limitations of previous methods. Directly fine-tuning LLMs offers great capability but comes at the cost of huge time and memory consumption, while utilizing frozen LLMs yields suboptimal results. In this work, we aim to leverage LLMs for KGC effectively and efficiently. We capture the context-aware hidden states of knowledge triples by employing prompts to stimulate the intermediate layers of LLMs. We then train a data-efficient classifier on these hidden states to harness the inherent capabilities of frozen LLMs in KGC. We also generate entity descriptions with subgraph sampling on KGs, reducing the ambiguity of triplets and enriching the knowledge representation. Extensive experiments on standard benchmarks showcase the efficiency and effectiveness of our approach. We outperform classical KGC methods on most datasets and match the performance of fine-tuned LLMs. Additionally, compared to fine-tuned LLMs, we boost GPU memory efficiency by \textbf{$188\times$} and speed up training+inference by \textbf{$13.48\times$}.

Density functional theory (DFT) is a powerful computational method used to obtain physical and chemical properties of materials. In the materials discovery framework, it is often necessary to virtually screen a large and high-dimensional chemical space to find materials with desired properties. However, grid searching a large chemical space with DFT is inefficient due to its high computational cost. We propose an approach utilizing Bayesian optimization (BO) with an artificial neural network kernel to enable smart search. This method leverages the BO algorithm, where the neural network, trained on a limited number of DFT results, determines the most promising regions of the chemical space to explore in subsequent iterations. This approach aims to discover materials with target properties while minimizing the number of DFT calculations required. To demonstrate the effectiveness of this method, we investigated 63 doped graphene quantum dots (GQDs) with sizes ranging from 1 to 2 nm to find the structure with the highest light absorbance. Using time-dependent DFT (TDDFT) only 12 times, we achieved a significant reduction in computational cost, approximately 20% of what would be required for a full grid search, by employing the BO algorithm with a neural network kernel. Considering that TDDFT calculations for a single GQD require about half a day of wall time on high-performance computing nodes, this reduction is substantial. Our approach can be generalized to the discovery of new drugs, chemicals, crystals, and alloys with high-dimensional and large chemical spaces, offering a scalable solution for various applications in materials science.

北京阿比特科技有限公司