亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider estimation of the spot volatility in a stochastic boundary model with one-sided microstructure noise for high-frequency limit order prices. Based on discrete, noisy observations of an It\^o semimartingale with jumps and general stochastic volatility, we present a simple and explicit estimator using local order statistics. We establish consistency and stable central limit theorems as asymptotic properties. The asymptotic analysis builds upon an expansion of tail probabilities for the order statistics based on a generalized arcsine law. In order to use the involved distribution of local order statistics for a bias correction, an efficient numerical algorithm is developed. We demonstrate the finite-sample performance of the estimation in a Monte Carlo simulation.

相關內容

Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell \& Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical upper bound describes the approximation errors achieved with the method of alternating projections.

We present a discretization of the dynamic optimal transport problem for which we can obtain the convergence rate for the value of the transport cost to its continuous value when the temporal and spatial stepsize vanish. This convergence result does not require any regularity assumption on the measures, though experiments suggest that the rate is not sharp. Via an analysis of the duality gap we also obtain the convergence rates for the gradient of the optimal potentials and the velocity field under mild regularity assumptions. To obtain such rates we discretize the dual formulation of the dynamic optimal transport problem and use the mature literature related to the error due to discretizing the Hamilton-Jacobi equation.

Machine Learning (ML) in low-data settings remains an underappreciated yet crucial problem. This challenge is pronounced in low-to-middle income countries where access to large datasets is often limited or even absent. Hence, data augmentation methods to increase the sample size of datasets needed for ML are key to unlocking the transformative potential of ML in data-deprived regions and domains. Unfortunately, the limited training set constrains traditional tabular synthetic data generators in their ability to generate a large and diverse augmented dataset needed for ML tasks. To address this technical challenge, we introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime. While diverse, not all the data generated by LLMs will help increase utility for a downstream task, as for any generative model. Consequently, we introduce a principled curation process, leveraging learning dynamics, coupled with confidence and uncertainty metrics, to obtain a high-quality dataset. Empirically, on multiple real-world datasets, we demonstrate the superior performance of LLMs in the low-data regime compared to conventional generators. We further show our curation mechanism improves the downstream performance for all generators, including LLMs. Additionally, we provide insights and understanding into the LLM generation and curation mechanism, shedding light on the features that enable them to output high-quality augmented datasets. CLLM paves the way for wider usage of ML in data scarce domains and regions, by allying the strengths of LLMs with a robust data-centric approach.

Detection and identification of emitters provide vital information for defensive strategies in electronic intelligence. Based on a received signal containing pulses from an unknown number of emitters, this paper introduces an unsupervised methodology for deinterleaving RADAR signals based on a combination of clustering algorithms and optimal transport distances. The first step involves separating the pulses with a clustering algorithm under the constraint that the pulses of two different emitters cannot belong to the same cluster. Then, as the emitters exhibit complex behavior and can be represented by several clusters, we propose a hierarchical clustering algorithm based on an optimal transport distance to merge these clusters. A variant is also developed, capable of handling more complex signals. Finally, the proposed methodology is evaluated on simulated data provided through a realistic simulator. Results show that the proposed methods are capable of deinterleaving complex RADAR signals.

We present ParrotTTS, a modularized text-to-speech synthesis model leveraging disentangled self-supervised speech representations. It can train a multi-speaker variant effectively using transcripts from a single speaker. ParrotTTS adapts to a new language in low resource setup and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on bilingual or parallel examples, ParrotTTS can transfer voices across languages while preserving the speaker specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results in monolingual and multi-lingual scenarios. ParrotTTS outperforms state-of-the-art multi-lingual TTS models using only a fraction of paired data as latter.

In current applied research the most-used route to an analysis of composition is through log-ratios -- that is, contrasts among log-transformed measurements. Here we argue instead for a more direct approach, using a statistical model for the arithmetic mean on the original scale of measurement. Central to the approach is a general variance-covariance function, derived by assuming multiplicative measurement error. Quasi-likelihood analysis of logit models for composition is then a general alternative to the use of multivariate linear models for log-ratio transformed measurements, and it has important advantages. These include robustness to secondary aspects of model specification, stability when there are zero-valued or near-zero measurements in the data, and more direct interpretation. The usual efficiency property of quasi-likelihood estimation applies even when the error covariance matrix is unspecified. We also indicate how the derived variance-covariance function can be used, instead of the variance-covariance matrix of log-ratios, with more general multivariate methods for the analysis of composition. A specific feature is that the notion of `null correlation' -- for compositional measurements on their original scale -- emerges naturally.

Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities, whereby samples from the base are computed conditionally given samples from the target in a way that is different from (but does preclude) incorporating information about class labels or continuous embeddings. This enables us to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.

Ad hoc architectures have emerged as a valuable alternative to centralized participatory sensing systems due to their infrastructureless nature, which ensures good availability, easy maintenance and direct user communication. As a result, they need to incorporate content-aware assessment mechanisms to deal with a common problem in participatory sensing: information assessment. Easy contribution encourages users participation and improves the sensing task but may result in large amounts of data, which may not be valid or relevant. Currently, prioritization is the only totally ad hoc scheme to assess user-generated alerts. This strategy prevents duplicates from congesting the network. However, it does not include the assessment of every generated alert and does not deal with low-quality or irrelevant alerts. In order to ensure users receive only interesting alerts and the network is not compromised, we propose two collaborative alert assessment mechanisms that, while keeping the network flat, provide an effective message filter. Both of them rely on opportunistic collaboration with nearby peers. By simulating their behavior in a real urban area, we have proved them able to decrease network load while maintaining alert delivery ratio.

Sentence embeddings induced with various transformer architectures encode much semantic and syntactic information in a distributed manner in a one-dimensional array. We investigate whether specific grammatical information can be accessed in these distributed representations. Using data from a task developed to test rule-like generalizations, our experiments on detecting subject-verb agreement yield several promising results. First, we show that while the usual sentence representations encoded as one-dimensional arrays do not easily support extraction of rule-like regularities, a two-dimensional reshaping of these vectors allows various learning architectures to access such information. Next, we show that various architectures can detect patterns in these two-dimensional reshaped sentence embeddings and successfully learn a model based on smaller amounts of simpler training data, which performs well on more complex test data. This indicates that current sentence embeddings contain information that is regularly distributed, and which can be captured when the embeddings are reshaped into higher dimensional arrays. Our results cast light on representations produced by language models and help move towards developing few-shot learning approaches.

To capture the extremal behaviour of complex environmental phenomena in practice, flexible techniques for modelling tail behaviour are required. In this paper, we introduce a variety of such methods, which were used by the Lancopula Utopiversity team to tackle the data challenge of the 2023 Extreme Value Analysis Conference. This data challenge was split into four sections, labelled C1-C4. Challenges C1 and C2 comprise univariate problems, where the goal is to estimate extreme quantiles for a non-stationary time series exhibiting several complex features. We propose a flexible modelling technique, based on generalised additive models, with diagnostics indicating generally good performance for the observed data. Challenges C3 and C4 concern multivariate problems where the focus is on estimating joint extremal probabilities. For challenge C3, we propose an extension of available models in the multivariate literature and use this framework to estimate extreme probabilities in the presence of non-stationary dependence. Finally, for challenge C4, which concerns a 50 dimensional random vector, we employ a clustering technique to achieve dimension reduction and use a conditional modelling approach to estimate extremal probabilities across independent groups of variables.

北京阿比特科技有限公司