在线点播亚洲日韩国产欧美,日韩少妇人妻VS一区二区三区

Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena.

相關內容

Performer

關注 10

均值 · GROUP · 估計/估計量 · Analysis · 推斷 ·

2023 年 7 月 6 日

A two-sample comparison of mean survival times of uncured sub-populations

Dennis Dobler,Eni Musta

Comparing the survival times among two groups is a common problem in time-to-event analysis, for example if one would like to understand whether one medical treatment is superior to another. In the standard survival analysis setting, there has been a lot of discussion on how to quantify such difference and what can be an intuitive, easily interpretable, summary measure. In the presence of subjects that are immune to the event of interest (`cured'), we illustrate that it is not appropriate to just compare the overall survival functions. Instead, it is more informative to compare the cure fractions and the survival of the uncured sub-populations separately from each other. Our research is mainly driven by the question: if the cure fraction is similar for two available treatments, how else can we determine which is preferable? To this end, we estimate the mean survival times in the uncured fractions of both treatment groups ($MST_u$) and develop permutation tests for inference. In the first out of two connected papers, we focus on nonparametric approaches. The methods are illustrated with medical data of leukemia patients. In Part II we adjust the mean survival time of the uncured for potential confounders, which is crucial in observational settings. For each group, we employ the widely used logistic-Cox mixture cure model and estimate the $MST_u$ conditionally on a given covariate value. An asymptotic and a permutation-based approach have been developed for making inference on the difference of conditional $MST_u$'s between two groups. Contrarily to available results in the literature, in the simulation study we do not observe a clear advantage of the permutation method over the asymptotic one to justify its increased computational cost. The methods are illustrated through a practical application to breast cancer data.

多峰值 · 數據集 · Performer · 模態 · 可理解性 ·

2023 年 7 月 6 日

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection

Huixuan Zhang,Xiaojun Wan

from arxiv, 11 pages, 6 figures. 6 tables

Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.

DNS · Extensibility · 可約的 · 假陽性 · INFORMS ·

2023 年 7 月 5 日

Information-Based Heavy Hitters for Real-Time DNS Data Exfiltration Detection and Prevention

Yarin Ozery,Asaf Nadler,Asaf Shabtai

Data exfiltration over the DNS protocol and its detection have been researched extensively in recent years. Prior studies focused on offline detection methods, which although capable of detecting attacks, allow a large amount of data to be exfiltrated before the attack is detected and dealt with. In this paper, we introduce Information-based Heavy Hitters (ibHH), a real-time detection method which is based on live estimations of the amount of information transmitted to registered domains. ibHH uses constant-size memory and supports constant-time queries, which makes it suitable for deployment on recursive DNS servers to further reduce detection and response time. In our evaluation, we compared the performance of the proposed method to that of leading state-of-the-art DNS exfiltration detection methods on real-world datasets comprising over 250 billion DNS queries. The evaluation demonstrates ibHH's ability to successfully detect exfiltration rates as slow as 0.7B/s, with a false positive alert rate of less than 0.004, with significantly lower resource consumption compared to other methods.

Performer · 情景 · 約束 · 拒絕采樣 · MoDELS ·

2023 年 7 月 4 日

On the Constrained Time-Series Generation Problem

Andrea Coletta,Sriram Gopalakrishan,Daniel Borrajo,Svitlana Vyetrenko

Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

Automator · 泛函 · 設計 · Performance · 進化計算 ·

2023 年 7 月 4 日

Automated design of relocation rules for minimising energy consumption in the container relocation problem

Marko ?urasevi?,Mateja ?umi?,Rebeka ?ori?,Francisco Javier Gil-Gala

The container relocation problem is a combinatorial optimisation problem aimed at finding a sequence of container relocations to retrieve all containers in a predetermined order by minimising a given objective. Relocation rules (RRs), which consist of a priority function and relocation scheme, are heuristics commonly used for solving the mentioned problem due to their flexibility and efficiency. Recently, in many real-world problems it is becoming increasingly important to consider energy consumption. However, for this variant no RRs exist and would need to be designed manually. One possibility to circumvent this issue is by applying hyperheuristics to automatically design new RRs. In this study we use genetic programming to obtain priority functions used in RRs whose goal is to minimise energy consumption. We compare the proposed approach with a genetic algorithm from the literature used to design the priority function. The results obtained demonstrate that the RRs designed by genetic programming achieve the best performance.

Performer · MoDELS · 數據集 · Learning · 數據生成過程 ·

2023 年 7 月 3 日

MADS: Modulated Auto-Decoding SIREN for time series imputation

Tom Bamford,Elizabeth Fons,Yousef El-Laham,Svitlana Vyetrenko

from arxiv, 8 pages (inc. refs), 1 figure

Time series imputation remains a significant challenge across many fields due to the potentially significant variability in the type of data being modelled. Whilst traditional imputation methods often impose strong assumptions on the underlying data generation process, limiting their applicability, researchers have recently begun to investigate the potential of deep learning for this task, inspired by the strong performance shown by these models in both classification and regression problems across a range of applications. In this work we propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations. Our method leverages the capabilities of SIRENs for high fidelity reconstruction of signals and irregular data, and combines it with a hypernetwork architecture which allows us to generalise by learning a prior over the space of time series. We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation. On the human activity dataset, it improves imputation performance by at least 40%, while on the air quality dataset it is shown to be competitive across all metrics. When evaluated on synthetic data, our model results in the best average rank across different dataset configurations over all baselines.

多樣性 · 數據集 · 機器人 · Learning · CASES ·

2023 年 7 月 2 日

RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Hao-Shu Fang,Hongjie Fang,Zhenyu Tang,Jirong Liu,Junbo Wang,Haoyi Zhu,Cewu Lu

from arxiv, RSS 2023 workshop on LTAMP. The project page is at rh20t.github.io

A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improving task and motion planning. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 \emph{contact-rich} robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected \emph{in the real world}. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available at rh20t.github.io

AI · INFORMS · 可辨認的 · Readability · 模型評估 ·

2023 年 6 月 30 日

AI and Non AI Assessments for Dementia

Mahboobeh, Parsapoor,Hamed Ghodrati,Vincenzo Dentamaro,Christopher R. Madan,Ioulietta Lazarou,Spiros Nikolopoulos,Ioannis Kompatsiaris

from arxiv, 49 pages

Current progress in the artificial intelligence domain has led to the development of various types of AI-powered dementia assessments, which can be employed to identify patients at the early stage of dementia. It can revolutionize the dementia care settings. It is essential that the medical community be aware of various AI assessments and choose them considering their degrees of validity, efficiency, practicality, reliability, and accuracy concerning the early identification of patients with dementia (PwD). On the other hand, AI developers should be informed about various non-AI assessments as well as recently developed AI assessments. Thus, this paper, which can be readable by both clinicians and AI engineers, fills the gap in the literature in explaining the existing solutions for the recognition of dementia to clinicians, as well as the techniques used and the most widespread dementia datasets to AI engineers. It follows a review of papers on AI and non-AI assessments for dementia to provide valuable information about various dementia assessments for both the AI and medical communities. The discussion and conclusion highlight the most prominent research directions and the maturity of existing solutions.

優化器 · INTERACT · Networking · 知識 (knowledge) · Performer ·

2022 年 5 月 11 日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Niall Creech,Natalia Criado Pacheco,Simon Miles

from arxiv, 28 pages

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

學成 · 泛化理論 · AIM · state-of-the-art · 強化學習 ·

2019 年 10 月 24 日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Tianhe Yu,Deirdre Quillen,Zhanpeng He,Ryan Julian,Karol Hausman,Chelsea Finn,Sergey Levine

from arxiv, CoRL 2019. Videos are here: meta-world.github.io and open-sourced codes are available at: //github.com/rlworkgroup/metaworld

Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.