亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We compare using a PHOIBLE-based phone mapping method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We use Character Error Rates from automatic speech recognition and predicted Mean Opinion Scores for evaluation. Results show that both phone mapping and features input improve the output quality and the latter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) with a family tree-based distance measure as a criterion to select source languages in transfer learning. ASPF proves effective if label-based phone input is used, while the language distance does not have expected effects.

相關內容

遷移(yi)(yi)學習(xi)(xi)(xi)(xi)(Transfer Learning)是一(yi)(yi)種(zhong)機器(qi)學習(xi)(xi)(xi)(xi)方(fang)法,是把一(yi)(yi)個(ge)領(ling)(ling)域(yu)(yu)(yu)(即源領(ling)(ling)域(yu)(yu)(yu))的(de)(de)(de)知(zhi)識(shi),遷移(yi)(yi)到另外一(yi)(yi)個(ge)領(ling)(ling)域(yu)(yu)(yu)(即目標領(ling)(ling)域(yu)(yu)(yu)),使得目標領(ling)(ling)域(yu)(yu)(yu)能夠取得更好(hao)的(de)(de)(de)學習(xi)(xi)(xi)(xi)效果。遷移(yi)(yi)學習(xi)(xi)(xi)(xi)(TL)是機器(qi)學習(xi)(xi)(xi)(xi)(ML)中的(de)(de)(de)一(yi)(yi)個(ge)研究問(wen)題(ti),著重于存儲在(zai)解決一(yi)(yi)個(ge)問(wen)題(ti)時獲(huo)得的(de)(de)(de)知(zhi)識(shi)并將(jiang)其(qi)應用于另一(yi)(yi)個(ge)但相關(guan)的(de)(de)(de)問(wen)題(ti)。例如,在(zai)學習(xi)(xi)(xi)(xi)識(shi)別(bie)汽車時獲(huo)得的(de)(de)(de)知(zhi)識(shi)可(ke)以在(zai)嘗試識(shi)別(bie)卡車時應用。盡管(guan)這(zhe)兩個(ge)領(ling)(ling)域(yu)(yu)(yu)之(zhi)間的(de)(de)(de)正式聯(lian)系是有限(xian)的(de)(de)(de),但這(zhe)一(yi)(yi)領(ling)(ling)域(yu)(yu)(yu)的(de)(de)(de)研究與心理(li)學文獻關(guan)于學習(xi)(xi)(xi)(xi)轉移(yi)(yi)的(de)(de)(de)悠久(jiu)歷(li)史有關(guan)。從實踐的(de)(de)(de)角(jiao)度來(lai)看,為學習(xi)(xi)(xi)(xi)新任務(wu)而重用或轉移(yi)(yi)先前學習(xi)(xi)(xi)(xi)的(de)(de)(de)任務(wu)中的(de)(de)(de)信息可(ke)能會顯著提高強化學習(xi)(xi)(xi)(xi)代理(li)的(de)(de)(de)樣本效率(lv)。

知識薈萃

精品入門和進階教程、論文和代碼整理等

更多

查看相關(guan)VIP內容(rong)、論文、資訊等

Current gaze input methods for VR headsets predominantly utilize the gaze ray as a pointing cursor, often neglecting depth information in it. This study introduces FocusFlow, a novel gaze interaction technique that integrates focal depth into gaze input dimensions, facilitating users to actively shift their focus along the depth dimension for interaction. A detection algorithm to identify the user's focal depth is developed. Based on this, a layer-based UI is proposed, which uses focal depth changes to enable layer switch operations, offering an intuitive hands-free selection method. We also designed visual cues to guide users to adjust focal depth accurately and get familiar with the interaction process. Preliminary evaluations demonstrate the system's usability, and several potential applications are discussed. Through FocusFlow, we aim to enrich the input dimensions of gaze interaction, achieving more intuitive and efficient human-computer interactions on headset devices.

Recently, transformers are trending as replacements for CNNs in vision tasks, including compression. This trend compels us to question the inherent limitations of CNNs compared to transformers and to explore if CNNs can be enhanced to achieve the same or even better performance than transformers. We want to design a pure CNN based model for compression as most devices are optimized for CNNs well. In our analysis, we find that the key strengths of transformers lie in their dynamic weights and large receptive fields. To enable CNNs with such properties, we propose a novel transform module with large receptive filed learning and self-conditioned adaptability for learned image compression, named SLIC. Specifically, we enlarge the receptive field of depth-wise convolution with suitable complexity and generate the weights according to given conditions. In addition, we also investigate the self-conditioned factor for channels. To prove the effectiveness of our proposed transform module, we equip it with existing entropy models ChARM, SCCTX, and SWAtten and we obtain models SLIC-ChARM, SLIC-SCCTX, and SLIC-SWAtten. Extensive experiments demonstrate our SLIC-ChARM, SLIC-SCCTX, and SLIC-SWAtten have significant improvements over corresponding baselines and achieve SOTA performances with suitable complexity on 5 test datasets (Kodak, Tecnick, CLIC 20, CLIC 21, JPEGAI). Code will be available at //github.com/JiangWeibeta/SLIC.

Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn significant attention in both computer vision and natural language processing communities. Existing VLN tasks are built for agents that navigate on the ground, either indoors or outdoors. However, many tasks require intelligent agents to carry out in the sky, such as UAV-based goods delivery, traffic/security patrol, and scenery tour, to name a few. Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios. Our simulator supports continuous navigation, environment extension and configuration. We also proposed an extended baseline model based on the widely-used cross-modal-alignment (CMA) navigation methods. We find that there is still a significant gap between the baseline model and human performance, which suggests AerialVLN is a new challenging task. Dataset and code is available at //github.com/AirVLN/AirVLN.

Understanding the social context of eating is crucial for promoting healthy eating behaviors. Multimodal smartphone sensor data could provide valuable insights into eating behavior, particularly in mobile food diaries and mobile health apps. However, research on the social context of eating with smartphone sensor data is limited, despite extensive studies in nutrition and behavioral science. Moreover, the impact of country differences on the social context of eating, as measured by multimodal phone sensor data and self-reports, remains under-explored. To address this research gap, our study focuses on a dataset of approximately 24K self-reports on eating events provided by 678 college students in eight countries to investigate the country diversity that emerges from smartphone sensors during eating events for different social contexts (alone or with others). Our analysis revealed that while some smartphone usage features during eating events were similar across countries, others exhibited unique trends in each country. We further studied how user and country-specific factors impact social context inference by developing machine learning models with population-level (non-personalized) and hybrid (partially personalized) experimental setups. We showed that models based on the hybrid approach achieve AUC scores up to 0.75 with XGBoost models. These findings emphasize the importance of considering country differences in building and deploying machine learning models to minimize biases and improve generalization across different populations.

The Internet of Things (IoT) devices are rapidly increasing in popularity, with more individuals using Internet-connected devices that continuously monitor their activities. This work explores privacy concerns and expectations of end-users related to Trigger-Action platforms (TAPs) in the context of the Internet of Things (IoT). TAPs allow users to customize their smart environments by creating rules that trigger actions based on specific events or conditions. As personal data flows between different entities, there is a potential for privacy concerns. In this study, we aimed to identify the privacy factors that impact users' concerns and preferences for using IoT TAPs. To address this research objective, we conducted three focus groups with 15 participants and we extracted nine themes related to privacy factors using thematic analysis. Our participants particularly prefer to have control and transparency over the automation and are concerned about unexpected data inferences, risks and unforeseen consequences for themselves and for bystanders that are caused by the automation. The identified privacy factors can help researchers derive predefined and selectable profiles of privacy permission settings for IoT TAPs that represent the privacy preferences of different types of users as a basis for designing usable privacy controls for IoT TAPs.

An increasing number of researchers are finding use for nth-order gradient computations for a wide variety of applications, including graphics, meta-learning (MAML), scientific computing, and most recently, implicit neural representations (INRs). Recent work shows that the gradient of an INR can be used to edit the data it represents directly without needing to convert it back to a discrete representation. However, given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient due to the higher demand for computing power and higher complexity in data movement. This makes it a promising target for FPGA acceleration. In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We address this problem in two phases. First, we design a dataflow architecture that uses FIFO streams and an optimized computation kernel library, ensuring high memory efficiency and parallel computation. Second, we propose a compiler that extracts and optimizes computation graphs, automatically configures hardware parameters such as latency and stream depths to optimize throughput, while ensuring deadlock-free operation, and outputs High-Level Synthesis (HLS) code for FPGA implementation. We utilize INR editing as our benchmark, presenting results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively. Furthermore, we obtain 3.1-8.9x and 1.7-4.3x lower memory usage, and 1.7-11.3x and 5.5-32.8x lower energy-delay product. Our framework will be made open-source and available on GitHub.

Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.

The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at //github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

北京阿比特科技有限公司