苹果电影在线观看免费高清-啊在线不卡视频无码

John M. Scanlon,Eric R. Teoh,David G. Kidd,Kristofer D. Kusano,Jonas B?rgman,Geoffrey Chi-Johnston,Luigi Di Lillo,Francesca Favaro,Carol Flannagan,Henrik Liers,Bonnie Lin,Magdalena Lindman,Shane McLaughlin,Miguel Perez,Trent Victor

The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance. In January 2024, a working group of experts across academia, insurance, and industry came together in Washington, DC to discuss the current and future challenges in performing such evaluations. A subset of this working group then met, virtually, on multiple occasions to produce this paper. This paper presents the RAVE (Retrospective Automated Vehicle Evaluation) checklist, a set of fifteen recommendations for performing and evaluating retrospective ADS performance comparisons. The recommendations are centered around the concepts of (1) quality and validity, (2) transparency, and (3) interpretation. Over time, it is anticipated there will be a large and varied body of work evaluating the observed performance of these ADS fleets. Establishing and promoting good scientific practices benefits the work of stakeholders, many of whom may not be subject matter experts. This working group's intentions are to: i) strengthen individual research studies and ii) make the at-large community more informed on how to evaluate this collective body of work.

相關內容

Automator

關注 5

Automator是蘋果公司為他們的Mac OS X系統開發的一款軟件。 只要通過點擊拖拽鼠標等操作就可以將一系列動作組合成一個工作流，從而幫助你自動的（可重復的）完成一些復雜的工作。Automator還能橫跨很多不同種類的程序，包括：查找器、Safari網絡瀏覽器、iCal、地址簿或者其他的一些程序。它還能和一些第三方的程序一起工作，如微軟的Office、Adobe公司的Photoshop或者Pixelmator等。

代碼 · 設計 · 縮放 · 三角形化 · 可辨認的 ·

2024 年 10 月 1 日

Code Interviews: Design and Evaluation of a More Authentic Assessment for Introductory Programming Assignments

Suhas Kannam,Yuri Yang,Aarya Dharm,Kevin Lin

from arxiv, Experience Reports and Tools paper in the Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2025); 7 pages

Generative artificial intelligence poses new challenges around assessment and academic integrity, increasingly driving introductory programming educators to employ invigilated exams often conducted in-person on pencil-and-paper. But the structure of exams often fails to accommodate authentic programming experiences that involve planning, implementing, and debugging programs with computer interaction. In this experience report, we describe code interviews: a more authentic assessment method for take-home programming assignments. Through action research, we experimented with varying the number and type of questions as well as whether interviews were conducted individually or with groups of students. To scale the program, we converted most of our weekly teaching assistant (TA) sections to conduct code interviews on 5 major weekly take-home programming assignments. By triangulating data from 5 sources, we identified 4 themes. Code interviews (1) pushed students to discuss their work, motivating more nuanced but sometimes repetitive insights; (2) enabled peer learning, reducing stress in some ways but increasing stress in other ways; (3) scaled with TA-led sections, replacing familiar practice with an unfamiliar assessment; (4) focused on student contributions, limiting opportunities for TAs to give guidance and feedback. We conclude by discussing the different decisions about the design of code interviews with implications for student experience, academic integrity, and teaching workload.

Neural Networks · Networking · Learning · 循環神經網絡 · 反向傳播 ·

2024 年 10 月 1 日

Gradient-Free Training of Recurrent Neural Networks using Random Perturbations

Jesus Garcia Fernandez,Sander Keemink,Marcel van Gerven

Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle to propagate gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. We subsequently conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability compared to BPTT, strongly outperforming standard node and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic computing applications

多峰值 · Prompt · INTERACT · MoDELS · 語言模型化 ·

2024 年 9 月 30 日

POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

Jianben He,Xingbo Wang,Shiyi Liu,Guande Wu,Claudio Silva,Huamin Qu

from arxiv, 11 pages, 6 figures

Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modalities within multimodal inputs. This oversight hinders the development of effective prompts that guide model multimodal reasoning processes by fully exploiting the rich context provided by multiple modalities. In this paper, we present POEM, a visual analytics system to facilitate efficient prompt engineering for enhancing the multimodal reasoning performance of LLMs. The system enables users to explore the interaction patterns across modalities at varying levels of detail for a comprehensive understanding of the multimodal knowledge elicited by various prompts. Through diverse recommendations of demonstration examples and instructional principles, POEM supports users in iteratively crafting and refining prompts to better align and enhance model knowledge with human insights. The effectiveness and efficiency of our system are validated through two case studies and interviews with experts.

Networking · Analysis · 損失函數（機器學習） · 有向 · 線性的 ·

2024 年 9 月 30 日

Beyond Derivative Pathology of PINNs: Variable Splitting Strategy with Convergence Analysis

Yesom Park,Changhoon Song,Myungjoo Kang

Physics-informed neural networks (PINNs) have recently emerged as effective methods for solving partial differential equations (PDEs) in various problems. Substantial research focuses on the failure modes of PINNs due to their frequent inaccuracies in predictions. However, most are based on the premise that minimizing the loss function to zero causes the network to converge to a solution of the governing PDE. In this study, we prove that PINNs encounter a fundamental issue that the premise is invalid. We also reveal that this issue stems from the inability to regulate the behavior of the derivatives of the predicted solution. Inspired by the \textit{derivative pathology} of PINNs, we propose a \textit{variable splitting} strategy that addresses this issue by parameterizing the gradient of the solution as an auxiliary variable. We demonstrate that using the auxiliary variable eludes derivative pathology by enabling direct monitoring and regulation of the gradient of the predicted solution. Moreover, we prove that the proposed method guarantees convergence to a generalized solution for second-order linear PDEs, indicating its applicability to various problems.

Performer · 語言模型化 · INTERACT · MoDELS · Agent ·

2024 年 9 月 30 日

Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models

David Castillo-Bolado,Joseph Davidson,Finlay Gray,Marek Rosa

from arxiv, Accepted as a poster at NeurIPS D&B Track 2024

We introduce a dynamic benchmarking system for conversational agents that evaluates their performance through a single, simulated, and lengthy user$\leftrightarrow$agent interaction. The interaction is a conversation between the user and agent, where multiple tasks are introduced and then undertaken concurrently. We context switch regularly to interleave the tasks, which constructs a realistic testing scenario in which we assess the Long-Term Memory, Continual Learning, and Information Integration capabilities of the agents. Results from both proprietary and open-source Large-Language Models show that LLMs in general perform well on single-task interactions, but they struggle on the same tasks when they are interleaved. Notably, short-context LLMs supplemented with an LTM system perform as well as or better than those with larger contexts. Our benchmark suggests that there are other challenges for LLMs responding to more natural interactions that contemporary benchmarks have heretofore not been able to capture.

大語言模型 · 推斷 · 通道 · cache · Prompt ·

2024 年 9 月 30 日

The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems

Linke Song,Zixuan Pang,Wenhao Wang,Zihao Wang,XiaoFeng Wang,Hongbo Chen,Wei Song,Yier Jin,Dan Meng,Rui Hou

The wide deployment of Large Language Models (LLMs) has given rise to strong demands for optimizing their inference performance. Today's techniques serving this purpose primarily focus on reducing latency and improving throughput through algorithmic and hardware enhancements, while largely overlooking their privacy side effects, particularly in a multi-user environment. In our research, for the first time, we discovered a set of new timing side channels in LLM systems, arising from shared caches and GPU memory allocations, which can be exploited to infer both confidential system prompts and those issued by other users. These vulnerabilities echo security challenges observed in traditional computing systems, highlighting an urgent need to address potential information leakage in LLM serving infrastructures. In this paper, we report novel attack strategies designed to exploit such timing side channels inherent in LLM deployments, specifically targeting the Key-Value (KV) cache and semantic cache widely used to enhance LLM inference performance. Our approach leverages timing measurements and classification models to detect cache hits, allowing an adversary to infer private prompts with high accuracy. We also propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches, showing the feasibility of stealing system prompts and those produced by peer users. Our experimental studies on black-box testing of popular online LLM services demonstrate that such privacy risks are completely realistic, with significant consequences. Our findings underscore the need for robust mitigation to protect LLM systems against such emerging threats.

類別 · binary · 二分類 · Performer · 閾值 ·

2024 年 9 月 29 日

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Mohamed Abdelhamid,Abhyuday Desai

from arxiv, 13 pages including appendix, 4 tables

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling class imbalance: Synthetic Minority Over-sampling Technique (SMOTE), Class Weights tuning, and Decision Threshold Calibration. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models and 30 datasets from various domains, conducting a total of 9,000 experiments. Performance was primarily assessed using the F1-score, although our study also tracked results on additional 9 metrics including F2-score, precision, recall, Brier-score, PR-AUC, and AUC. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold Calibration emerging as the most consistently effective technique. However, we observed substantial variability in the best-performing method across datasets, highlighting the importance of testing multiple approaches for specific problems. This study provides valuable insights for practitioners dealing with imbalanced datasets and emphasizes the need for dataset-specific analysis in evaluating class imbalance handling techniques.

Learning · 可理解性 · Prompt · 大語言模型 · 語言模型化 ·

2024 年 9 月 27 日

CausalBench: A Comprehensive Benchmark for Causal Learning Capability of LLMs

Yu Zhou,Xingyu Wu,Beicheng Huang,Jibin Wu,Liang Feng,Kay Chen Tan

The ability to understand causality significantly impacts the competence of large language models (LLMs) in output explanation and counterfactual reasoning, as causality reveals the underlying data distribution. However, the lack of a comprehensive benchmark currently limits the evaluation of LLMs' causal learning capabilities. To fill this gap, this paper develops CausalBench based on data from the causal research community, enabling comparative evaluations of LLMs against traditional causal learning algorithms. To provide a comprehensive investigation, we offer three tasks of varying difficulties, including correlation, causal skeleton, and causality identification. Evaluations of 19 leading LLMs reveal that, while closed-source LLMs show potential for simple causal relationships, they significantly lag behind traditional algorithms on larger-scale networks ($>50$ nodes). Specifically, LLMs struggle with collider structures but excel at chain structures, especially at long-chain causality analogous to Chains-of-Thought techniques. This supports the current prompt approaches while suggesting directions to enhance LLMs' causal reasoning capability. Furthermore, CausalBench incorporates background knowledge and training data into prompts to thoroughly unlock LLMs' text-comprehension ability during evaluation, whose findings indicate that, LLM understand causality through semantic associations with distinct entities, rather than directly from contextual information or numerical distributions.

MoDELS · 語言模型化 · Performer · ACID · 大語言模型 ·

2024 年 9 月 27 日

SciDFM: A Large Language Model with Mixture-of-Experts for Science

Liangtai Sun,Danyu Luo,Da Ma,Zihan Zhao,Baocai Chen,Zhennan Shen,Su Zhu,Lu Chen,Xin Chen,Kai Yu

from arxiv, 12 pages, 1 figure, 9 tables. Technical Report, Under Review

Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines. To benefit the broader research community, we open-source SciDFM at //huggingface.co/OpenDFM/SciDFM-MoE-A5.6B-v1.0.

蒙特卡羅 · Better · Analysis · 情景 · 相同 ·

2024 年 9 月 26 日

Multilevel Metamodels: A Novel Approach to Enhance Efficiency and Generalizability in Monte Carlo Simulation Studies

Joshua Gilbert,Luke Miratrix

Metamodels, or the regression analysis of Monte Carlo simulation results, provide a powerful tool to summarize simulation findings. However, an underutilized approach is the multilevel metamodel (MLMM) that accounts for the dependent data structure that arises from fitting multiple models to the same simulated data set. In this study, we articulate the theoretical rationale for the MLMM and illustrate how it can improve the interpretability of simulation results, better account for complex simulation designs, and provide new insights into the generalizability of simulation findings.