亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Our previous interview study explores the needs and uses of diagrammatic information by the Blind and Low Vision (BLV) community, resulting in a framework called the Ladder of Diagram Access. The framework outlines five levels of information access when interacting with a diagram. In this paper, we connect this framework to include the global activity of sensemaking and discuss its (in)accessibility to the BLV demographic. We also discuss the integration of this framework into the sensemaking process and explore the current sensemaking practices and strategies employed by the BLV community, the challenges they face at different levels of the ladder, and potential solutions to enhance inclusivity towards a data-driven workforce.

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · MoDELS · CASE · 情景 · Learning ·
2024 年 5 月 20 日

To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed. When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.

A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.

It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS) synthesis. Prior studies have primarily focused on learning a global prosodic representation at the utterance level, which strongly correlates with linguistic prosody. Our goal is to construct a hierarchical emotion distribution (ED) that effectively encapsulates intensity variations of emotions at various levels of granularity, encompassing phonemes, words, and utterances. During TTS training, the hierarchical ED is extracted from the ground-truth audio and guides the predictor to establish a connection between emotional and linguistic prosody. At run-time inference, the TTS model generates emotional speech and, at the same time, provides quantitative control of emotion over the speech constituents. Both objective and subjective evaluations validate the effectiveness of the proposed framework in terms of emotion prediction and control.

The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a Combinatorial definition for the External Validity (EV) of intervention effects. We first define the concept of an effect observation 'background'. We then formulate conditions for effect generalization based on their sets of (observed and unobserved) backgrounds. This reveals two limits for effect generalization: (1) when effects are observed under all their enumerable backgrounds, or, (2) when backgrounds have become sufficiently randomized. We use the resulting combinatorial framework to re-examine several issues in the original counterfactual formulation: out-of-sample validity, concurrent estimation of multiple effects, bias-variance tradeoffs, statistical power, and connections to current predictive and explaining techniques. Methodologically, the definitions also allow us to replace the parametric estimation problems that followed the counterfactual definition by combinatorial enumeration and randomization problems in non-experimental samples. We use this non-parametric framework to demonstrate (External Validity, Unconfoundness and Precision) tradeoffs in the performance of popular supervised, explaining, and causal-effect estimators. We also illustrate how the approach allows for the use of supervised and explaining methods in non-i.i.d. samples. The COVID19 pandemic highlighted the need for learning solutions to provide predictions in severally incomplete samples. We demonstrate applications in this pressing problem.

The advent of Large Language Models (LLMs) has ushered in a new era for design science in Information Systems, demanding a paradigm shift in tailoring LLMs design for business contexts. We propose and test a novel framework to customize LLMs for general business contexts that aims to achieve three fundamental objectives simultaneously: (1) aligning conversational patterns, (2) integrating in-depth domain knowledge, and (3) embodying theory-driven soft skills and core principles. We design methodologies that combine domain-specific theory with Supervised Fine Tuning (SFT) to achieve these objectives simultaneously. We instantiate our proposed framework in the context of medical consultation. Specifically, we carefully construct a large volume of real doctors' consultation records and medical knowledge from multiple professional databases. Additionally, drawing on medical theory, we identify three soft skills and core principles of human doctors: professionalism, explainability, and emotional support, and design approaches to integrate these traits into LLMs. We demonstrate the feasibility of our framework using online experiments with thousands of real patients as well as evaluation by domain experts and consumers. Experimental results show that the customized LLM model substantially outperforms untuned base model in medical expertise as well as consumer satisfaction and trustworthiness, and it substantially reduces the gap between untuned LLMs and human doctors, elevating LLMs to the level of human experts. Additionally, we delve into the characteristics of textual consultation records and adopt interpretable machine learning techniques to identify what drives the performance gain. Finally, we showcase the practical value of our model through a decision support system designed to assist human doctors in a lab experiment.

Amidst the recent strides in evaluating Large Language Models for Code (Code-LLMs), existing benchmarks have mainly focused on functional correctness, overlooking the importance of computational efficiency. To fill the gap, we present Mercury, the first computational efficiency benchmark for Code-LLMs. It comprises 1,889 Python tasks, each with adequate solutions to support a runtime distribution. Based on the distribution, we introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and computational efficiency simultaneously. On Mercury, leading Code-LLMs can achieve 67% on Pass, while less than 50% on Beyond. Given that an ideal Beyond score would be aligned with the Pass score, it indicates that while Code-LLMs exhibit impressive capabilities in generating functionally correct code, there remains a notable gap in their efficiency. Finally, our empirical experiments reveal that Direct Preference Optimization (DPO) serves as a robust baseline for enhancing computational efficiency compared with Supervised Fine Tuning (SFT), which paves a promising avenue for future exploration of efficient code generation.

We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.

We propose UniViLM: a Unified Video and Language pre-training Model for multimodal understanding and generation. Motivated by the recent success of BERT based pre-training technique for NLP and image-language tasks, VideoBERT and CBT are proposed to exploit BERT model for video and language pre-training using narrated instructional videos. Different from their works which only pre-train understanding task, we propose a unified video-language pre-training model for both understanding and generation tasks. Our model comprises of 4 components including two single-modal encoders, a cross encoder and a decoder with the Transformer backbone. We first pre-train our model to learn the universal representation for both video and language on a large instructional video dataset. Then we fine-tune the model on two multimodal tasks including understanding task (text-based video retrieval) and generation task (multimodal video captioning). Our extensive experiments show that our method can improve the performance of both understanding and generation tasks and achieves the state-of-the art results.

北京阿比特科技有限公司