亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation process tedious and costly. In this paper, we present APEx, Automatic Programming of Experiments, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand, and progressively compile a scientific report. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions. Finally, the LLM refines the report, presenting the results to the user in natural language. Thanks to its modularity, our framework is flexible and extensible as new tools become available. Empirically, APEx reproduces the findings of existing studies while allowing for arbitrary analyses and hypothesis testing.

相關內容

Among recent developments in definitions and analysis of selection bias is the potential outcomes approach of Kenah (Epidemiology, 2023), which allows non-parametric analysis using single-world intervention graphs, linking selection of study participants to identification of causal effects. Mohan & Pearl (JASA, 2021) provide a framework for missing data via directed acyclic graphs augmented with nodes indicating missingness for each sometimes-missing variable, which allows for analysis of more general missing data problems but cannot easily encode scenarios in which different groups of variables are observed in specific subsamples. We give an alternative formulation of the potential outcomes framework based on conditional separable effects and indicators for selection into subsamples. This is practical for problems between the single-sample scenarios considered by Kenah and the variable-wise missingness considered by Mohan & Pearl. This simplifies identification conditions and admits generalizations to scenarios with multiple, potentially nested or overlapping study samples, as well as multiple or time-dependent exposures. We give examples of identifiability arguments for case-cohort studies, multiple or time-dependent exposures, and direct effects of selection.

In particle systems, flocking refers to the phenomenon where particles' individual velocities eventually align. The Cucker-Smale model is a well-known mathematical framework that describes this behavior. Many continuous descriptions of the Cucker-Smale model use PDEs with both particle position and velocity as independent variables, thus providing a full description of the particles mean-field limit (MFL) dynamics. In this paper, we introduce a novel reduced inertial PDE model consisting of two equations that depend solely on particle position. In contrast to other reduced models, ours is not derived from the MFL, but directly includes the model reduction at the level of the empirical densities, thus allowing for a straightforward connection to the underlying particle dynamics. We present a thorough analytical investigation of our reduced model, showing that: firstly, our reduced PDE satisfies a natural and interpretable continuous definition of flocking; secondly, in specific cases, we can fully quantify the discrepancy between PDE solution and particle system. Our theoretical results are supported by numerical simulations.

The modern approaches for computer vision tasks significantly rely on machine learning, which requires a large number of quality images. While there is a plethora of image datasets with a single type of images, there is a lack of datasets collected from multiple cameras. In this thesis, we introduce Paired Image and Video data from three CAMeraS, namely PIV3CAMS, aimed at multiple computer vision tasks. The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras: Canon D5 Mark IV, Huawei P20, and ZED stereo camera. The dataset includes various indoor and outdoor scenes from different locations in Zurich (Switzerland) and Cheonan (South Korea). Some of the computer vision applications that can benefit from the PIV3CAMS dataset are image/video enhancement, view interpolation, image matching, and much more. We provide a careful explanation of the data collection process and detailed analysis of the data. The second part of this thesis studies the usage of depth information in the view synthesizing task. In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically. Through extensive experiments, we show that the effect of depth is crucial in small view changes. Finally, we apply our model to the introduced PIV3CAMS dataset to synthesize novel target views as an example application of PIV3CAMS.

The growing prevalence of high-dimensional data has fostered the development of multidimensional projection (MP) techniques, such as t-SNE, UMAP, and LAMP, for data visualization and exploration. However, conventional MP methods typically employ generic quality metrics, neglecting individual user preferences. This study proposes a new framework that tailors MP techniques based on user-specific quality criteria, enhancing projection interpretability. Our approach combines three visual quality metrics, stress, neighborhood preservation, and silhouette score, to create a composite metric for a precise MP evaluation. We then optimize the projection scale by maximizing the composite metric value. We conducted an experiment involving two users with different projection preferences, generating projections using t-SNE, UMAP, and LAMP. Users rate projections according to their criteria, producing two training sets. We derive optimal weights for each set and apply them to other datasets to determine the best projections per user. Our findings demonstrate that personalized projections effectively capture user preferences, fostering better data exploration and enabling more informed decision-making. This user-centric approach promotes advancements in multidimensional projection techniques that accommodate diverse user preferences and enhance interpretability.

Deep learning still struggles with certain kinds of scientific data. Notably, pretraining data may not provide coverage of relevant distribution shifts (e.g., shifts induced via the use of different measurement instruments). We consider deep learning models trained to classify the synthesis conditions of uranium ore concentrates (UOCs) and show that model editing is particularly effective for improving generalization to distribution shifts common in this domain. In particular, model editing outperforms finetuning on two curated datasets comprising of micrographs taken of U$_{3}$O$_{8}$ aged in humidity chambers and micrographs acquired with different scanning electron microscopes, respectively.

We propose and study a one-dimensional model which consists of two cross-diffusion systems coupled via a moving interface. The motivation stems from the modelling of complex diffusion processes in the context of the vapor deposition of thin films. In our model, cross-diffusion of the various chemical species can be respectively modelled by a size-exclusion system for the solid phase and the Stefan-Maxwell system for the gaseous phase. The coupling between the two phases is modelled by linear phase transition laws of Butler-Volmer type, resulting in an interface evolution. The continuous properties of the model are investigated, in particular its entropy variational structure and stationary states. We introduce a two-point flux approximation finite volume scheme. The moving interface is addressed with a moving-mesh approach, where the mesh is locally deformed around the interface. The resulting discrete nonlinear system is shown to admit a solution that preserves the main properties of the continuous system, namely: mass conservation, nonnegativity, volume-filling constraints, decay of the free energy and asymptotics. In particular, the moving-mesh approach is compatible with the entropy structure of the continuous model. Numerical results illustrate these properties and the dynamics of the model.

High-dimensional compositional data are frequently encountered in many fields of modern scientific research. In regression analysis of compositional data, the presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component of the composition has an impact on others. To simultaneously address the compositional nature and measurement errors in the high-dimensional design matrix of compositional covariates, we propose a new method named Error-in-composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign-consistent selection properties are established. We then illustrate the finite sample performance of Eric Lasso using simulation studies and demonstrate its potential usefulness in a real data application example.

In toxicology, the validation of the concurrent control by historical control data (HCD) has become requirements. This validation is usually done by historical control limits (HCL) which in practice are often graphically displayed in a Sheward control chart like manner. In many applications, HCL are applied to dichotomous data, e.g. the number of rats with a tumor vs. the number of rats without a tumor (carcinogenicity studies) or the number of cells with a micronucleus out of a total number of cells. Dichotomous HCD may be overdispersed and can be heavily right- (or left-) skewed, which is usually not taken into account in the practical applications of HCL. To overcome this problem, four different prediction intervals (two frequentist, two Bayesian), that can be applied to such data, are proposed. Comprehensive Monte-Carlo simulations assessing the coverage probabilities of seven different methods for HCL calculation reveal, that frequentist bootstrap calibrated prediction intervals control the type-1-error best. Heuristics traditionally used in control charts (e.g. the limits in Sheward np-charts or the mean plus minus 2 SD) as well a the historical range fail to control a pre-specified coverage probability. The application of HCL is demonstrated based on a real life data set containing historical controls from long-term carcinogenicity studies run on behalf of the U.S. National Toxicology Program. The proposed frequentist prediction intervals are publicly available from the R package predint, whereas R code for the computation of the Bayesian prediction intervals is provided via GitHub.

We present a generalization of the discrete Lehmann representation (DLR) to three-point correlation and vertex functions in imaginary time and Matsubara frequency. The representation takes the form of a linear combination of judiciously chosen exponentials in imaginary time, and products of simple poles in Matsubara frequency, which are universal for a given temperature and energy cutoff. We present a systematic algorithm to generate compact sampling grids, from which the coefficients of such an expansion can be obtained by solving a linear system. We show that the explicit form of the representation can be used to evaluate diagrammatic expressions involving infinite Matsubara sums, such as polarization functions or self-energies, with controllable, high-order accuracy. This collection of techniques establishes a framework through which methods involving three-point objects can be implemented robustly, with a substantially reduced computational cost and memory footprint.

Current approaches to identifying driving heterogeneity face challenges in comprehending fundamental patterns from the perspective of underlying driving behavior mechanisms. The concept of Action phases was proposed in our previous work, capturing the diversity of driving characteristics with physical meanings. This study presents a novel framework to further interpret driving patterns by classifying Action phases in an unsupervised manner. In this framework, a Resampling and Downsampling Method (RDM) is first applied to standardize the length of Action phases. Then the clustering calibration procedure including ''Feature Selection'', ''Clustering Analysis'', ''Difference/Similarity Evaluation'', and ''Action phases Re-extraction'' is iteratively applied until all differences among clusters and similarities within clusters reach the pre-determined criteria. Application of the framework using real-world datasets revealed six driving patterns in the I80 dataset, labeled as ''Catch up'', ''Keep away'', and ''Maintain distance'', with both ''Stable'' and ''Unstable'' states. Notably, Unstable patterns are more numerous than Stable ones. ''Maintain distance'' is the most common among Stable patterns. These observations align with the dynamic nature of driving. Two patterns ''Stable keep away'' and ''Unstable catch up'' are missing in the US101 dataset, which is in line with our expectations as this dataset was previously shown to have less heterogeneity. This demonstrates the potential of driving patterns in describing driving heterogeneity. The proposed framework promises advantages in addressing label scarcity in supervised learning and enhancing tasks such as driving behavior modeling and driving trajectory prediction.

北京阿比特科技有限公司