亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present a novel approach to automatically detect and classify great ape calls from continuous raw audio recordings collected during field research. Our method leverages deep pretrained and sequential neural networks, including wav2vec 2.0 and LSTM, and is validated on three data sets from three different great ape lineages (orangutans, chimpanzees, and bonobos). The recordings were collected by different researchers and include different annotation schemes, which our pipeline preprocesses and trains in a uniform fashion. Our results for call detection and classification attain high accuracy. Our method is aimed to be generalizable to other animal species, and more generally, sound event detection tasks. To foster future research, we make our pipeline and methods publicly available.

相關內容

神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)(Neural Networks)是世(shi)界(jie)上三個(ge)最古(gu)老的(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)建模學(xue)(xue)(xue)(xue)會的(de)(de)檔案期刊:國(guo)際(ji)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)會(INNS)、歐洲神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)會(ENNS)和(he)日本神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)學(xue)(xue)(xue)(xue)會(JNNS)。神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)提供了(le)一(yi)(yi)個(ge)論(lun)壇,以(yi)發展(zhan)和(he)培育(yu)一(yi)(yi)個(ge)國(guo)際(ji)社會的(de)(de)學(xue)(xue)(xue)(xue)者和(he)實踐(jian)者感興趣的(de)(de)所有方(fang)面(mian)的(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)和(he)相關方(fang)法的(de)(de)計(ji)算智能。神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)歡迎高(gao)質量論(lun)文(wen)的(de)(de)提交,有助于全面(mian)的(de)(de)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)研(yan)究,從行為和(he)大腦(nao)建模,學(xue)(xue)(xue)(xue)習(xi)算法,通過(guo)數(shu)學(xue)(xue)(xue)(xue)和(he)計(ji)算分析,系統(tong)(tong)的(de)(de)工(gong)(gong)(gong)程(cheng)和(he)技術(shu)應(ying)用(yong)(yong),大量使(shi)用(yong)(yong)神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)的(de)(de)概念和(he)技術(shu)。這(zhe)一(yi)(yi)獨特而廣泛(fan)的(de)(de)范圍促(cu)(cu)進了(le)生物(wu)(wu)和(he)技術(shu)研(yan)究之間(jian)的(de)(de)思想交流,并有助于促(cu)(cu)進對(dui)生物(wu)(wu)啟發的(de)(de)計(ji)算智能感興趣的(de)(de)跨(kua)學(xue)(xue)(xue)(xue)科(ke)社區的(de)(de)發展(zhan)。因此,神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)網(wang)絡(luo)編(bian)委會代(dai)表的(de)(de)專(zhuan)家(jia)領域包括心理學(xue)(xue)(xue)(xue),神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)生物(wu)(wu)學(xue)(xue)(xue)(xue),計(ji)算機(ji)科(ke)學(xue)(xue)(xue)(xue),工(gong)(gong)(gong)程(cheng),數(shu)學(xue)(xue)(xue)(xue),物(wu)(wu)理。該雜志發表文(wen)章、信件和(he)評論(lun)以(yi)及給編(bian)輯的(de)(de)信件、社論(lun)、時事、軟件調查和(he)專(zhuan)利信息。文(wen)章發表在(zai)五個(ge)部分之一(yi)(yi):認(ren)知(zhi)科(ke)學(xue)(xue)(xue)(xue),神(shen)(shen)(shen)(shen)(shen)(shen)經(jing)科(ke)學(xue)(xue)(xue)(xue),學(xue)(xue)(xue)(xue)習(xi)系統(tong)(tong),數(shu)學(xue)(xue)(xue)(xue)和(he)計(ji)算分析、工(gong)(gong)(gong)程(cheng)和(he)應(ying)用(yong)(yong)。 官網(wang)地址:

Learning good self-supervised graph representations that are beneficial to downstream tasks is challenging. Among a variety of methods, contrastive learning enjoys competitive performance. The embeddings of contrastive learning are arranged on a hypersphere that enables the Cosine distance measurement in the Euclidean space. However, the underlying structure of many domains such as graphs exhibits highly non-Euclidean latent geometry. To this end, we propose a novel contrastive learning framework to learn high-quality graph embedding. Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information, as well as we propose a substitute of uniformity metric to prevent the so-called dimensional collapse. We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees, whereas in the ambient space of the hyperbolic manifold, these notions translate into imposing an isotropic ring density towards boundaries of Poincar\'e ball. This ring density can be easily imposed by promoting the isotropic feature distribution on the tangent space of manifold. In the experiments, we demonstrate the efficacy of our proposed method across different hyperbolic graph embedding techniques in both supervised and self-supervised learning settings.

We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at //kylesargent.github.io/zeronvs/

Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are based upon Temporal Action Localisation (TAL), which typically identifies short-term actions. The emerging field of Complex Activity Detection (CompAD) extends this analysis to long-term activities, with a deeper understanding obtained by modelling the internal structure of a complex activity taking place within the video. We address the CompAD problem using a hybrid graph neural network which combines attention applied to a graph encoding the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our approach is as follows: i) Firstly, we propose a novel feature extraction technique which, for each video snippet, generates spatiotemporal `tubes' for the active elements (`agents') in the (local) scene by detecting individual objects, tracking them and then extracting 3D features from all the agent tubes as well as the overall scene. ii) Next, we construct a local scene graph where each node (representing either an agent tube or the scene) is connected to all other nodes. Attention is then applied to this graph to obtain an overall representation of the local dynamic scene. iii) Finally, all local scene graph representations are interconnected via a temporal graph, to estimate the complex activity class together with its start and end time. The proposed framework outperforms all previous state-of-the-art methods on all three datasets including ActivityNet-1.3, Thumos-14, and ROAD.

We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels. No expensive 3D labels are needed. Thus, instead of using 3D labels, our model is trained with easy-to-annotate 2D labels along with the physical knowledge of the object's motion. Given this information, the model can infer the latent third dimension, even though it has never seen this information during training. Our method is evaluated on both synthetic and real-world datasets, and we are able to achieve a mean distance error of just 6 cm in our experiments on real data. The results indicate the method's potential as a step towards learning 3D object location estimation, where collecting 3D data for training is not feasible.

A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.

This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves reparameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise, specifically setting the conventional $\displaystyle \sqrt{\bar{\alpha}}=\cos(\eta)$. This reparameterization eliminates two singularities and allows for the expression of diffusion evolution as a well-behaved ordinary differential equation (ODE). In turn, this allows higher order ODE solvers such as Runge-Kutta methods to be used effectively. The second contribution is to directly estimate both the image ($\mathbf{x}_0$) and noise ($\mathbf{\epsilon}$) using our network, which enables more stable calculations of the update step in the inverse diffusion steps, as accurate estimation of both the image and noise are crucial at different stages of the process. Together with these changes, our model achieves faster generation, with the ability to converge on high-quality images more quickly, and higher quality of the generated images, as measured by metrics such as Frechet Inception Distance (FID), spatial Frechet Inception Distance (sFID), precision, and recall.

While diffusion models demonstrate a remarkable capability for generating high-quality images, their tendency to `replicate' training data raises privacy concerns. Although recent research suggests that this replication may stem from the insufficient generalization of training data captions and duplication of training images, effective mitigation strategies remain elusive. To address this gap, our paper first introduces a generality score that measures the caption generality and employ large language model (LLM) to generalize training captions. Subsequently, we leverage generalized captions and propose a novel dual fusion enhancement approach to mitigate the replication of diffusion models. Our empirical results demonstrate that our proposed methods can significantly reduce replication by 43.5% compared to the original diffusion model while maintaining the diversity and quality of generations.

Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods.

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.

Recent advance in fluorescence microscopy enables acquisition of 3D image volumes with better quality and deeper penetration into tissue. Segmentation is a required step to characterize and analyze biological structures in the images. 3D segmentation using deep learning has achieved promising results in microscopy images. One issue is that deep learning techniques require a large set of groundtruth data which is impractical to annotate manually for microscopy volumes. This paper describes a 3D nuclei segmentation method using 3D convolutional neural networks. A set of synthetic volumes and the corresponding groundtruth volumes are generated automatically using a generative adversarial network. Segmentation results demonstrate that our proposed method is capable of segmenting nuclei successfully in 3D for various data sets.

北京阿比特科技有限公司