国产裸体美女永久免费无遮挡久久-欧美91精品久久久久影视网

Recent work on intracranial brain-machine interfaces has demonstrated that spoken speech can be decoded with high accuracy, essentially by treating the problem as an instance of supervised learning and training deep neural networks to map from neural activity to text. However, such networks pay for their expressiveness with very large numbers of labeled data, a requirement that is particularly burdensome for invasive neural recordings acquired from human patients. On the other hand, these patients typically produce speech outside of the experimental blocks used for training decoders. Making use of such data, and data from other patients, to improve decoding would ease the burden of data collection -- especially onerous for dys- and anarthric patients. Here we demonstrate that this is possible, by reengineering wav2vec -- a simple, self-supervised, fully convolutional model that learns latent representations of audio using a noise-contrastive loss -- for electrocorticographic (ECoG) data. We train this model on unlabelled ECoG recordings, and subsequently use it to transform ECoG from labeled speech sessions into wav2vec's representation space, before finally training a supervised encoder-decoder to map these representations to text. We experiment with various numbers of labeled blocks; for almost all choices, the new representations yield superior decoding performance to the original ECoG data, and in no cases do they yield worse. Performance can also be improved in some cases by pretraining wav2vec on another patient's data. In the best cases, wav2vec's representations decrease word error rates over the original data by upwards of 50%.

相關內容

解碼

關注 0

Networking · Tensor · 量子計算 · 狀態空間 · 設計 ·

2024 年 7 月 9 日

Quantum Computing and Tensor Networks for Laminate Design: A Novel Approach to Stacking Sequence Retrieval

Arne Wulff,Boyang Chen,Matthew Steinberg,Yinglu Tang,Matthias M?ller,Sebastian Feld

from arxiv, 54 pages, 9 figures. Accompanying code repository: //github.com/ArneWulff/ssr-with-qc-and-tn . Accompanying data repository: //doi.org/10.4121/ae276609-55b0-4af1-88c0-1102b1b58990 . Changes: Major revision, addition of state-vector simulations of VQAs

As with many tasks in engineering, structural design frequently involves navigating complex and computationally expensive problems. A prime example is the weight optimization of laminated composite materials, which to this day remains a formidable task, due to an exponentially large configuration space and non-linear constraints. The rapidly developing field of quantum computation may offer novel approaches for addressing these intricate problems. However, before applying any quantum algorithm to a given problem, it must be translated into a form that is compatible with the underlying operations on a quantum computer. Our work specifically targets stacking sequence retrieval with lamination parameters. To adapt this problem for quantum computational methods, we map the possible stacking sequences onto a quantum state space. We further derive a linear operator, the Hamiltonian, within this state space that encapsulates the loss function inherent to the stacking sequence retrieval problem. Additionally, we demonstrate the incorporation of manufacturing constraints on stacking sequences as penalty terms in the Hamiltonian. This quantum representation is suitable for a variety of classical and quantum algorithms for finding the ground state of a quantum Hamiltonian. For a practical demonstration, we performed state-vector simulations of two variational quantum algorithms and additionally chose a classical tensor network algorithm, the DMRG algorithm, to numerically validate our approach. Although this work primarily concentrates on quantum computation, the application of tensor network algorithms presents a novel quantum-inspired approach for stacking sequence retrieval.

Networking · Neural Networks · MoDELS · state-of-the-art · 模型評估 ·

2024 年 7 月 8 日

Multi-Fidelity Bayesian Neural Network for Uncertainty Quantification in Transonic Aerodynamic Loads

Andrea Vaiuso,Gabriele Immordino,Marcello Righi,Andrea Da Ronch

Multi-fidelity models are becoming more prevalent in engineering, particularly in aerospace, as they combine both the computational efficiency of low-fidelity models with the high accuracy of higher-fidelity simulations. Various state-of-the-art techniques exist for fusing data from different fidelity sources, including Co-Kriging and transfer learning in neural networks. This paper aims to implement a multi-fidelity Bayesian neural network model that applies transfer learning to fuse data generated by models at different fidelities. Bayesian neural networks use probability distributions over network weights, enabling them to provide predictions along with estimates of their confidence. This approach harnesses the predictive and data fusion capabilities of neural networks while also quantifying uncertainty. The results demonstrate that the multi-fidelity Bayesian model outperforms the state-of-the-art Co-Kriging in terms of overall accuracy and robustness on unseen data.

Facebook AI Research · 數據增強 · 類別 · Performer · 優化器 ·

2024 年 7 月 8 日

Enhancing Class Fairness in Classification with A Two-Player Game Approach

Yunpeng Jiang,Paul Weng,Yutong Ban

Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose a FAir Classification approach with a Two-player game (FACT). We first formulate the training of a classifier with data augmentation as a fair optimization problem, which can be further written as an adversarial two-player game. Following this formulation, we propose a novel multiplicative weight optimization algorithm, for which we theoretically prove that it can converge to a solution that is fair over classes. Interestingly, our formulation also reveals that this fairness issue over classes is not due to data augmentation only, but is in fact a general phenomenon. Our empirical experiments demonstrate that the performance of our learned classifiers is indeed more fairly distributed over classes in five datasets, with only limited impact on the average accuracy.

估計/估計量 · MoDELS · Pair · SimPLe · 泛函 ·

2024 年 7 月 8 日

Woven Fabric Capture with a Reflection-Transmission Photo Pair

Yingjie Tang,Zixuan Li,Milo? Ha?an,Jian Yang,Beibei Wang

from arxiv, 10 pages, 16 figures (in the main paper). Accepted by SIGGRAPH 2024 conference

Digitizing woven fabrics would be valuable for many applications, from digital humans to interior design. Previous work introduces a lightweight woven fabric acquisition approach by capturing a single reflection image and estimating the fabric parameters with a differentiable geometric and shading model. The renderings of the estimated fabric parameters can closely match the photo; however, the captured reflection image is insufficient to fully characterize the fabric sample reflectance. For instance, fabrics with different thicknesses might have similar reflection images but lead to significantly different transmission. We propose to recover the woven fabric parameters from two captured images: reflection and transmission. At the core of our method is a differentiable bidirectional scattering distribution function (BSDF) model, handling reflection and transmission, including single and multiple scattering. We propose a two-layer model, where the single scattering uses an SGGX phase function as in previous work, and multiple scattering uses a new azimuthally-invariant microflake definition, which we term ASGGX. This new fabric BSDF model closely matches real woven fabrics in both reflection and transmission. We use a simple setup for capturing reflection and transmission photos with a cell phone camera and two point lights, and estimate the fabric parameters via a lightweight network, together with a differentiable optimization. We also model the out-of-focus effects explicitly with a simple solution to match the thin-lens camera better. As a result, the renderings of the estimated parameters can agree with the input images on both reflection and transmission for the first time. The code for this paper is at //github.com/lxtyin/FabricBTDF-Recovery.

推斷 · 詞元分析器 · 大語言模型 · ReQuEST · MoDELS ·

2024 年 7 月 7 日

A Queueing Theoretic Perspective on Low-Latency LLM Inference with Variable Token Length

Yuqing Yang,Yuedong Xu,Lei Jiao

from arxiv, 8 pages

Large language models (LLMs) propel the prosperity of interactive AI applications showcased by ChatGPT that demand timely response of inference services. However, LLM inference is computation intensive and memory intensive, and improper parameter configuration at LLM platforms may exacerbate the inference time. In this paper, we analyze the impact of LLM output token distribution on the inference queueing delay, where the max-token clipping and the batched inference are considered. By formulating an M/G/1 model, we observe that enforcing a maximum output token limit on a very small fraction of inference requests can significantly reduce the queueing delay, and our model facilitates the selection of the optimal limit. For the batch inference, we model the service process as a bulk queue in which the batch processing time is affected by the batch size and the maximum token size inside this batch jointly. The queueing delays of the batching of all buffered requests (dynamic batching), the batching of constant number of requests (fixed batching), and the batching without intra-batch waiting (elastic batching) are derived. Experimental results show that our mathematical models coincide with the event-driven simulations well.

MoDELS · 3D · 數據集 · 估計/估計量 · INFORMS ·

2024 年 7 月 5 日

Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

Octave Mariotti,Oisin Mac Aodha,Hakan Bilen

Recent progress in self-supervised representation learning has resulted in models that are capable of extracting image features that are not only effective at encoding image level, but also pixel-level, semantics. These features have been shown to be effective for dense visual semantic correspondence estimation, even outperforming fully-supervised methods. Nevertheless, current self-supervised approaches still fail in the presence of challenging image characteristics such as symmetries and repeated parts. To address these limitations, we propose a new approach for semantic correspondence estimation that supplements discriminative self-supervised features with 3D understanding via a weak geometric spherical prior. Compared to more involved 3D pipelines, our model only requires weak viewpoint information, and the simplicity of our spherical representation enables us to inject informative geometric priors into the model during training. We propose a new evaluation metric that better accounts for repeated part and symmetry-induced mistakes. We present results on the challenging SPair-71k dataset, where we show that our approach demonstrates is capable of distinguishing between symmetric views and repeated parts across many object categories, and also demonstrate that we can generalize to unseen classes on the AwA dataset.

Performer · 3D · 情景 · 相互獨立的 · 多峰值 ·

2024 年 7 月 5 日

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

Penglei Sun,Yaoxian Song,Xinglin Pan,Peijie Dong,Xiaofei Yang,Qiang Wang,Zhixu Li,Tiefeng Li,Xiaowen Chu

The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a novel method called Domain Adaptation for Language Grounding (DA4LG) with 3D objects. Specifically, the proposed DA4LG consists of a visual adapter module with multi-task learning to realize vision-language alignment by comprehensive multimodal feature representation. Experimental results demonstrate that DA4LG competitively performs across visual and non-visual language descriptions, independent of the completeness of observation. DA4LG achieves state-of-the-art performance in the single-view setting and multi-view setting with the accuracy of 83.8% and 86.8% respectively in the language grounding benchmark SNARE. The simulation experiments show the well-practical and generalized performance of DA4LG compared to the existing methods. Our project is available at //sites.google.com/view/da4lg.

Performer · Learning · 優化器 · 推薦系統 · 強化學習 ·

2024 年 7 月 4 日

Deep Pareto Reinforcement Learning for Multi-Objective Recommender System

Pan Li,Alexander Tuzhilin

Optimizing multiple objectives simultaneously is an important task in recommendation platforms to improve their performance on different fronts. However, this task is particularly challenging since the relationships between different objectives are heterogeneous across different consumers and dynamically fluctuating according to different contexts. Especially in those cases when objectives become conflicting with each other, the result of recommendations will form a pareto-frontier, where the improvements on any objective comes at the cost of a performance decrease in another objective. Unfortunately, existing multi-objective recommender systems do not systematically consider such relationships; instead, they balance between these objectives in a static and uniform manner, resulting in performance that is significantly worse than the pareto-optimality. In this paper, we propose a Deep Pareto Reinforcement Learning (DeepPRL) approach, where we (1) comprehensively model the complex relationships between multiple objectives in recommendations; (2) effectively capture the personalized and contextual consumer preference towards each objective and update the recommendations correspondingly; (3) optimize both the short-term and the long-term performance of multi-objective recommendations. As a result, our method achieves significant pareto-dominance over state-of-the-art baselines in extensive offline experiments conducted on three real-world datasets. Furthermore, we conduct a large-scale online controlled experiment at the video streaming platform of Alibaba, where our method simultaneously improves the three conflicting objectives of Click-Through Rate, Video View, and Dwell Time by 2%, 5%, and 7% respectively over the latest production system, demonstrating its tangible economic impact in industrial applications.

知識 (knowledge) · Machine Learning · MoDELS · 學成 · Conformer ·

2022 年 5 月 10 日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Julian W?rmann,Daniel Bogdoll,Etienne Bührle,Han Chen,Evaristus Fuh Chuo,Kostadin Cvejoski,Ludger van Elst,Tobias Glei?ner,Philip Gottschall,Stefan Griesche,Christian Hellert,Christian Hesels,Sebastian Houben,Tim Joseph,Niklas Keil,Johann Kelsch,Hendrik K?nigshof,Erwin Kraft,Leonie Kreuser,Kevin Krone,Tobias Latka,Denny Mattern,Stefan Matthes,Mohsin Munir,Moritz Nekolla,Adrian Paschke,Maximilian Alexander Pintz,Tianming Qiu,Faraz Qureishi,Syed Tahseen Raza Rizvi,J?rg Reichardt,Laura von Rueden,Stefan Rudolph,Alexander Sagel,Gerhard Schunk,Hao Shen,Hendrik Stapelbroek,Vera Stehr,Gurucharan Srinivas,Anh Tuan Tran,Abhishek Vivekanandan,Ya Wang,Florian Wasserrab,Tino Werner,Christian Wirth,Stefan Zwicklbauer

from arxiv, 93 pages

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

視頻描述生成（Video Caption） · INFORMS · Performer · 蒸餾 · Extensibility ·

2020 年 3 月 31 日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Boxiao Pan,Haoye Cai,De-An Huang,Kuan-Hui Lee,Adrien Gaidon,Ehsan Adeli,Juan Carlos Niebles

from arxiv, CVPR 2020

Video captioning is a challenging task that requires a deep understanding of visual scenes. State-of-the-art methods generate captions using either scene-level or object-level information but without explicitly modeling object interactions. Thus, they often fail to make visually grounded predictions, and are sensitive to spurious correlations. In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid unstable performance caused by the variable number of objects, we further propose an object-aware knowledge distillation mechanism, in which local object information is used to regularize global scene features. We demonstrate the efficacy of our approach through extensive experiments on two benchmarks, showing our approach yields competitive performance with interpretable predictions.