亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Sentence embeddings induced with various transformer architectures encode much semantic and syntactic information in a distributed manner in a one-dimensional array. We investigate whether specific grammatical information can be accessed in these distributed representations. Using data from a task developed to test rule-like generalizations, our experiments on detecting subject-verb agreement yield several promising results. First, we show that while the usual sentence representations encoded as one-dimensional arrays do not easily support extraction of rule-like regularities, a two-dimensional reshaping of these vectors allows various learning architectures to access such information. Next, we show that various architectures can detect patterns in these two-dimensional reshaped sentence embeddings and successfully learn a model based on smaller amounts of simpler training data, which performs well on more complex test data. This indicates that current sentence embeddings contain information that is regularly distributed, and which can be captured when the embeddings are reshaped into higher dimensional arrays. Our results cast light on representations produced by language models and help move towards developing few-shot learning approaches.

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · · 情景 · 代碼 · 論文 ·
2024 年 2 月 5 日

Using dominating sets to separate vertices of graphs is a well-studied problem in the larger domain of identification problems. In such problems, the objective is typically to separate any two vertices of a graph by their unique neighbourhoods in a suitably chosen dominating set of the graph. Such a dominating and separating set is often referred to as a \emph{code} in the literature. Depending on the types of dominating and separating sets used, various problems arise under various names in the literature. In this paper, we introduce a new problem in the same realm of identification problems whereby the code, called the \emph{open-separating dominating code}, or the \emph{OSD-code} for short, is a dominating set and uses open neighbourhoods for separating vertices. The paper studies the fundamental properties concerning the existence, hardness and minimality of OSD-codes. Due to the emergence of a close and yet difficult to establish relation of the OSD-codes with another well-studied code in the literature called the open locating dominating codes, or OLD-codes for short, we compare the two on various graph classes. Finally, we also provide an equivalent reformulation of the problem of finding OSD-codes of a graph as a covering problem in a suitable hypergraph and discuss the polyhedra associated with OSD-codes, again in relation to OLD-codes of some graph classes already studied in this context.

When extending inferences from a randomized trial to a new target population, an assumption of transportability of difference effect measures (e.g., conditional average treatment effects) -- or even stronger assumptions of transportability in expectation or distribution of potential outcomes -- is invoked to identify the marginal causal mean difference in the target population. However, many clinical investigators believe that relative effect measures conditional on covariates, such as conditional risk ratios and mean ratios, are more likely to be ``transportable'' across populations compared with difference effect measures. Here, we examine the identification and estimation of the marginal counterfactual mean difference and ratio under a transportability assumption for conditional relative effect measures. We obtain identification results for two scenarios that often arise in practice when individuals in the target population (1) only have access to the control treatment, or (2) have access to the control and other treatments but not necessarily the experimental treatment evaluated in the trial. We then propose multiply robust and nonparametric efficient estimators that allow for the use of data-adaptive methods (e.g., machine learning techniques) to model the nuisance parameters. We examine the performance of the methods in simulation studies and illustrate their use with data from two trials of paliperidone for patients with schizophrenia. We conclude that the proposed methods are attractive when background knowledge suggests that the transportability assumption for conditional relative effect measures is more plausible than alternative assumptions.

This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $\pi$-weighted cost-sensitive learning model, the SIC model does not have an external $\pi$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the scale-class-imbalance ratio between the training and test datasets. This adaptation is achieved by creating a skewed hyperplane equation. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms $\pi$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

This paper introduces three key initiatives in the pursuit of a hybrid decoding framework characterized by superior decoding performance, high throughput, low complexity, and independence from channel noise variance. Firstly, adopting a graphical neural network perspective, we propose a design methodology for a family of neural min-sum variants. Our exploration delves into the frame error rates associated with different decoding variants and the consequential impact of decoding failures on subsequent ordered statistics decoding. Notably, these neural min-sum variants exhibit generally indistinguishable performance, hence the simplest member is chosen as the constituent of the hybrid decoding. Secondly, to address computational complexities arising from exhaustive searches for authentic error patterns in cases of decoding failure, two alternatives for ordered statistics decoding implementation are proposed. The first approach involves uniformly grouping test error patterns, while the second scheme dynamically generates qualified searching test error patterns with varied sizes for each group. In both methods, group priorities are determined empirically. Thirdly, iteration diversity is highlighted in the case of LDPC codes requiring high maximum iterations of decoding. This is achieved by segmenting the long iterative decoding trajectory of a decoding failure into shorter segments, which are then independently fed to small models to enhance the chances of acquiring the authentic error pattern. These ideas are substantiated through extensive simulation results covering the codes with block lengths ranging from one hundred to several hundreds.

Automatic Speech Recognition (ASR) systems are used in the financial domain to enhance the caller experience by enabling natural language understanding and facilitating efficient and intuitive interactions. Increasing use of ASR systems requires that such systems exhibit very low error rates. The predominant ASR models to collect numeric data are large, general-purpose commercial models -- Google Speech-to-text (STT), or Amazon Transcribe -- or open source (OpenAI's Whisper). Such ASR models are trained on hundreds of thousands of hours of audio data and require considerable resources to run. Despite recent progress large speech recognition models, we highlight the potential of smaller, specialized "micro" models. Such light models can be trained perform well on number recognition specific tasks, competing with general models like Whisper or Google STT while using less than 80 minutes of training time and occupying at least an order of less memory resources. Also, unlike larger speech recognition models, micro-models are trained on carefully selected and curated datasets, which makes them highly accurate, agile, and easy to retrain, while using low compute resources. We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns. Our work contributes to domain-specific ASR models, improving digit recognition accuracy, and privacy of data. An added advantage, their low resource consumption allows them to be hosted on-premise, keeping private data local instead uploading to an external cloud. Our results indicate that our micro-model makes less errors than the best-of-breed commercial or open-source ASRs in recognizing digits (1.8% error rate of our best micro-model versus 5.8% error rate of Whisper), and has a low memory footprint (0.66 GB VRAM for our model versus 11 GB VRAM for Whisper).

A rectangulation is a decomposition of a rectangle into finitely many rectangles. Via natural equivalence relations, rectangulations can be seen as combinatorial objects with a rich structure, with links to lattice congruences, flip graphs, polytopes, lattice paths, Hopf algebras, etc. In this paper, we first revisit the structure of the respective equivalence classes: weak rectangulations that preserve rectangle-segment adjacencies, and strong rectangulations that preserve rectangle-rectangle adjacencies. We thoroughly investigate posets defined by adjacency in rectangulations of both kinds, and unify and simplify known bijections between rectangulations and permutation classes. This yields a uniform treatment of mappings between permutations and rectangulations that unifies the results from earlier contributions, and emphasizes parallelism and differences between the weak and the strong cases. Then, we consider the special case of guillotine rectangulations, and prove that they can be characterized - under all known mappings between permutations and rectangulations - by avoidance of two mesh patterns that correspond to "windmills" in rectangulations. This yields new permutation classes in bijection with weak guillotine rectangulations, and the first known permutation class in bijection with strong guillotine rectangulations. Finally, we address enumerative issues and prove asymptotic bounds for several families of strong rectangulations.

Neural operators (NO) are discretization invariant deep learning methods with functional output and can approximate any continuous operator. NO have demonstrated the superiority of solving partial differential equations (PDEs) over other deep learning methods. However, the spatial domain of its input function needs to be identical to its output, which limits its applicability. For instance, the widely used Fourier neural operator (FNO) fails to approximate the operator that maps the boundary condition to the PDE solution. To address this issue, we propose a novel framework called resolution-invariant deep operator (RDO) that decouples the spatial domain of the input and output. RDO is motivated by the Deep operator network (DeepONet) and it does not require retraining the network when the input/output is changed compared with DeepONet. RDO takes functional input and its output is also functional so that it keeps the resolution invariant property of NO. It can also resolve PDEs with complex geometries whereas NO fail. Various numerical experiments demonstrate the advantage of our method over DeepONet and FNO.

This article presents factor copula approaches to model temporal dependency of non- Gaussian (continuous/discrete) longitudinal data. Factor copula models are canonical vine copulas which explain the underlying dependence structure of a multivariate data through latent variables, and therefore can be easily interpreted and implemented to unbalanced longitudinal data. We develop regression models for continuous, binary and ordinal longitudinal data including covariates, by using factor copula constructions with subject-specific latent variables. Considering homogeneous within-subject dependence, our proposed models allow for feasible parametric inference in moderate to high dimensional situations, using two-stage (IFM) estimation method. We assess the finite sample performance of the proposed models with extensive simulation studies. In the empirical analysis, the proposed models are applied for analysing different longitudinal responses of two real world data sets. Moreover, we compare the performances of these models with some widely used random effect models using standard model selection techniques and find substantial improvements. Our studies suggest that factor copula models can be good alternatives to random effect models and can provide better insights to temporal dependency of longitudinal data of arbitrary nature.

Random probabilities are a key component to many nonparametric methods in Statistics and Machine Learning. To quantify comparisons between different laws of random probabilities several works are starting to use the elegant Wasserstein over Wasserstein distance. In this paper we prove that the infinite-dimensionality of the space of probabilities drastically deteriorates its sample complexity, which is slower than any polynomial rate in the sample size. We thus propose a new distance that preserves many desirable properties of the former while achieving a parametric rate of convergence. In particular, our distance 1) metrizes weak convergence; 2) can be estimated numerically through samples with low complexity; 3) can be bounded analytically from above and below. The main ingredient are integral probability metrics, which lead to the name hierarchical IPM.

AI-enabled synthetic biology has tremendous potential but also significantly increases biorisks and brings about a new set of dual use concerns. The picture is complicated given the vast innovations envisioned to emerge by combining emerging technologies, as AI-enabled synthetic biology potentially scales up bioengineering into industrial biomanufacturing. However, the literature review indicates that goals such as maintaining a reasonable scope for innovation, or more ambitiously to foster a huge bioeconomy don't necessarily contrast with biosafety, but need to go hand in hand. This paper presents a literature review of the issues and describes emerging frameworks for policy and practice that transverse the options of command-and control, stewardship, bottom-up, and laissez-faire governance. How to achieve early warning systems that enable prevention and mitigation of future AI-enabled biohazards from the lab, from deliberate misuse, or from the public realm, will constantly need to evolve, and adaptive, interactive approaches should emerge. Although biorisk is subject to an established governance regime, and scientists generally adhere to biosafety protocols, even experimental, but legitimate use by scientists could lead to unexpected developments. Recent advances in chatbots enabled by generative AI have revived fears that advanced biological insight can more easily get into the hands of malignant individuals or organizations. Given these sets of issues, society needs to rethink how AI-enabled synthetic biology should be governed. The suggested way to visualize the challenge at hand is whack-a-mole governance, although the emerging solutions are perhaps not so different either.

北京阿比特科技有限公司