亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Stencil computation is an extensively-utilized class of scientific-computing applications that can be efficiently accelerated by graphics processing units (GPUs). Out-of-core approaches enable a GPU to handle large stencil codes whose data size is beyond the memory capacity of the GPU. However, current research on out-of-core stencil computation primarily focus on minimizing the amount of data transferred between the CPU and GPU. Few studies consider simultaneously optimizing data transfer and kernel execution. To fill the research gap, this work presents a synergy between on- and off-chip data reuse for out-of-core stencil codes, termed SO2DR. First, overlapping regions between data chunks are shared in the off-chip memory to eliminate redundant CPU-GPU data transfer. Secondly, redundant computation at the off-chip memory level is intentionally introduced to decouple kernel execution from region sharing, hence enabling data reuse in the on-chip memory. Experimental results demonstrate that SO2DR significantly enhances the kernel-execution performance while reducing the CPU-GPU data-transfer time. Specifically, SO2DR achieves average speedups of 2.78x and 1.14x for five stencil benchmarks, compared to an out-of-core stencil code which is free of redundant transfer and computation, and an in-core stencil code which is free of data transfer, respectively.

相關內容

Reversible computation is an emerging computing paradigm that allows any sequence of operations to be executed in reverse order at any point during computation. Its appeal lies in its potential for lowpower computation and its relevance to a wide array of applications such as chemical reactions, quantum computation, robotics, and distributed systems. Reversing Petri nets are a recently-proposed extension of Petri nets that implements the three main forms of reversibility, namely, backtracking, causal reversing, and out-of-causal-order reversing. Their distinguishing feature is the use of named tokens that can be combined together to form bonds. Named tokens along with a history function, constitute the means of remembering past behaviour, thus, enabling reversal. In recent work, we have proposed a structural translation from a subclass of RPNs to the model of Coloured Petri Nets (CPNs), an extension of traditional Petri nets where tokens carry data values. In this paper, we extend the translation to handle RPNs with token multiplicity under the individual-token interpretation, a model which allows multiple tokens of the same type to exist in a system. To support the three types of reversibility, tokens are associated with their causal history and, while tokens of the same type are equally eligible to fire a transition when going forward, when going backwards they are able to reverse only the transitions they have previously fired. The new translation, in addition to lifting the restriction on token uniqueness, presents a refined approach for transforming RPNs to CPNs through a unifying approach that allows instantiating each of the three types of reversibility. The paper also reports on a tool that implements this translation, paving the way for automated translations and analysis of reversible systems using CPN Tools.

It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of dovetail-joint shape optimization for stiffness using gradient-based optimization. This optimization requires a differentiable simulator that is capable of modeling the contact between the two parts of a joint, making it possible to reason about the gradient of the stiffness with respect to shape parameters. Our simulation approach uses a penalty method that alternates between optimizing each side of the joint, using the adjoint method to compute gradients. We test our method by optimizing the joint shapes in three different joint shape spaces, and evaluate optimized joint shapes in both simulation and real-world tests. The experiments show that optimized joint shapes achieve higher stiffness, both synthetically and in real-world tests.

Eigenvector continuation is a computational method for parametric eigenvalue problems that uses subspace projection with a basis derived from eigenvector snapshots from different parameter sets. It is part of a broader class of subspace-projection techniques called reduced-basis methods. In this colloquium article, we present the development, theory, and applications of eigenvector continuation and projection-based emulators. We introduce the basic concepts, discuss the underlying theory and convergence properties, and present recent applications for quantum systems and future prospects.

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds -- which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.

Goal-directed manipulation of representations is a key element of human flexible behaviour, while consciousness is often related to several aspects of higher-order cognition and human flexibility. Currently these two phenomena are only partially integrated (e.g., see Neurorepresentationalism) and this (a) limits our understanding of neuro-computational processes that lead conscious states to produce flexible goal-directed behaviours, (b) prevents a computational formalisation of conscious goal-directed manipulations of representations occurring in the brain, and (c) inhibits the exploitation of this knowledge for modelling and technological purposes. Addressing these issues, here we extend our `three-component theory of flexible cognition' by proposing the `Goal-Aligning Representations Internal Manipulation' (GARIM) theory of conscious and flexible goal-directed cognition. The central idea of the theory is that conscious states support the active manipulation of goal-relevant internal representations (e.g., of world states, objects, and action sequences) to make them more aligned with the pursued goals. This leads to the generation of the knowledge which is necessary to face novel situations/goals, thus increasing the flexibility of goal-directed behaviours. The GARIM theory integrates key aspects of the main theories of consciousness into the functional neuro-computational framework of goal-directed behaviour. Moreover, it takes into account the subjective sensation of agency that accompanies conscious goal-directed processes (`GARIM agency'). The proposal has also implications for experimental studies on consciousness and clinical aspects of conscious goal-directed behaviour. Finally, the GARIM theory benefit technological fields such as autonomous robotics and machine learning (e.g., the manipulation process may describe the operations performed by systems based on transformers).

Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a new block-coordinate PnP (BC-PnP) method that efficiently solves this joint estimation problem by introducing learned denoisers as priors on both the unknown image and the unknown measurement operator. We present a new convergence theory for BC-PnP compatible with blind inverse problems by considering nonconvex data-fidelity terms and expansive denoisers. Our theory analyzes the convergence of BC-PnP to a stationary point of an implicit function associated with an approximate minimum mean-squared error (MMSE) denoiser. We numerically validate our method on two blind inverse problems: automatic coil sensitivity estimation in magnetic resonance imaging (MRI) and blind image deblurring. Our results show that BC-PnP provides an efficient and principled framework for using denoisers as PnP priors for jointly estimating measurement operators and images.

Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

北京阿比特科技有限公司