亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, \emph{histogram} and \emph{collect-reduce}. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.

相關內容

We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching $n^{1/2}$ -- the information-theoretic threshold in the semi-random model (Steinhardt 2017) and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige 2019 and Steinhardt 2017. Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in unbalanced bipartite Erd\H{o}s--R\'enyi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size $k =o(n^{2/3})$. We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model.

Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency. With the introduction of the backpropagation algorithm and surrogate gradient, the structure of spiking neural networks has become more complex, and the performance gap with artificial neural networks has gradually decreased. However, most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements, which significantly restricts the development of SNNs. They do not delve into the arithmetic operations between the binary spikes and synaptic weights or assume unlimited on-chip RAM resources by using overly expensive devices on small tasks. To improve arithmetic efficiency, we analyze the neural dynamics of spiking neurons, generalize the SNN arithmetic operation to the multiplex-accumulate operation, and propose a high-performance implementation of such operation by utilizing the DSP48E2 hard block in Xilinx Ultrascale FPGAs. To improve memory efficiency, we design a memory system to enable efficient synaptic weights and membrane voltage memory access with reasonable on-chip RAM consumption. Combining the above two improvements, we propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly). FireFly is the first SNN accelerator that incorporates DSP optimization techniques into SNN synaptic operations. FireFly is implemented on several FPGA edge devices with limited resources but still guarantees a peak performance of 5.53TOP/s at 300MHz. As a lightweight accelerator, FireFly achieves the highest computational density efficiency compared with existing research using large FPGA devices.

Orienting the edges of an undirected graph such that the resulting digraph satisfies some given constraints is a classical problem in graph theory, with multiple algorithmic applications. In particular, an $st$-orientation orients each edge of the input graph such that the resulting digraph is acyclic, and it contains a single source $s$ and a single sink $t$. Computing an $st$-orientation of a graph can be done efficiently, and it finds notable applications in graph algorithms and in particular in graph drawing. On the other hand, finding an $st$-orientation with at most $k$ transitive edges is more challenging and it was recently proven to be NP-hard already when $k=0$. We strengthen this result by showing that the problem remains NP-hard even for graphs of bounded diameter, and for graphs of bounded vertex degree. These computational lower bounds naturally raise the question about which structural parameters can lead to tractable parameterizations of the problem. Our main result is a fixed-parameter tractable algorithm parameterized by treewidth.

The discrete $\alpha$-neighbor $p$-center problem (d-$\alpha$-$p$CP) is an emerging variant of the classical $p$-center problem which recently got attention in literature. In this problem, we are given a discrete set of points and we need to locate $p$ facilities on these points in such a way that the maximum distance between each point where no facility is located and its $\alpha$-closest facility is minimized. The only existing algorithms in literature for solving the d-$\alpha$-$p$CP are approximation algorithms and two recently proposed heuristics. In this work, we present two integer programming formulations for the d-$\alpha$-$p$CP, together with lifting of inequalities, valid inequalities, inequalities that do not change the optimal objective function value and variable fixing procedures. We provide theoretical results on the strength of the formulations and convergence results for the lower bounds obtained after applying the lifting procedures or the variable fixing procedures in an iterative fashion. Based on our formulations and theoretical results, we develop branch-and-cut (B&C) algorithms, which are further enhanced with a starting heuristic and a primal heuristic. We evaluate the effectiveness of our B&C algorithms using instances from literature. Our algorithms are able to solve 116 out of 194 instances from literature to proven optimality, with a runtime of under a minute for most of them. By doing so, we also provide improved solution values for 116 instances.

A stable and high-order accurate solver for linear and nonlinear parabolic equations is presented. An additive Runge-Kutta method is used for the time stepping, which integrates the linear stiff terms by an implicit singly diagonally implicit Runge-Kutta (ESDIRK) method and the nonlinear terms by an explicit Runge-Kutta (ERK) method. In each time step, the implicit solve is performed by the recently developed Hierarchical Poincar\'e-Steklov (HPS) method. This is a fast direct solver for elliptic equations that decomposes the space domain into a hierarchical tree of subdomains and builds spectral collocation solvers locally on the subdomains. These ideas are naturally combined in the presented method since the singly diagonal coefficient in ESDIRK and a fixed time-step ensures that the coefficient matrix in the implicit solve of HPS remains the same for all time stages. This means that the precomputed inverse can be efficiently reused, leading to a scheme with complexity (in two dimensions) $\mathcal{O}(N^{1.5})$ for the precomputation where the solution operator to the elliptic problems is built, and then $\mathcal{O}(N)$ for each time step. The stability of the method is proved for first order in time and any order in space, and numerical evidence substantiates a claim of stability for a much broader class of time discretization methods.Numerical experiments supporting the accuracy of efficiency of the method in one and two dimensions are presented.

Experiential AI is an emerging research field that addresses the challenge of making AI tangible and explicit, both to fuel cultural experiences for audiences, and to make AI systems more accessible to human understanding. The central theme is how artists, scientists and other interdisciplinary actors can come together to understand and communicate the functionality of AI, ML and intelligent robots, their limitations, and consequences, through informative and compelling experiences. It provides an approach and methodology for the arts and tangible experiences to mediate between impenetrable computer code and human understanding, making not just AI systems but also their values and implications more transparent, and therefore accountable. In this paper, we report on an empirical case study of an experiential AI system designed for creative data exploration of a user-defined dimension, to enable creators to gain more creative control over the AI process. We discuss how experiential AI can increase legibility and agency for artists, and how the arts can provide creative strategies and methods which can add to the toolbox for human-centred XAI.

The coefficient of variation is a useful indicator for comparing the spread of values between dataset with different units or widely different means. In this paper we address the problem of investigating the equality of the coefficients of variation from two independent populations. In order to do this we rely on the Bayesian Discrepancy Measure recently introduced in the literature. Computing this Bayesian measure of evidence is straightforward when the coefficient of variation is a function of a single parameter of the distribution. In contrast, it becomes difficult when it is a function of more parameters, often requiring the use of MCMC methods. We calculate the Bayesian Discrepancy Measure by considering a variety of distributions whose coefficients of variation depend on more than one parameter. We consider also applications to real data. As far as we know, some of the examined problems have not yet been covered in the literature.

The emergence of noisy intermediate-scale quantum (NISQ) computers has important consequences for cryptographic algorithms. It is theoretically well-established that key algorithms used in cybersecurity are vulnerable to quantum computers due to the fact that theoretical security guarantees, designed based on algorithmic complexity for classical computers, are not sufficient for quantum circuits. Many different quantum algorithms have been developed, which have potentially broad applications on future computing systems. However, this potential depends on the continued maturation of quantum hardware, which remains an area of active research and development. Theoretical limits provide an upper bound on the performance for algorithms. In practice, threats to encryption can only be accurately be assessed in the context of the rapidly evolving hardware and software landscape. Software co-design refers to the concurrent design of software and hardware as a way to understand the limitations of current capabilities and develop effective strategies to advance the state of the art. Since the capabilities for classical computation currently exceed quantum capabilities, quantum emulation techniques can play an important role in the co-design process. In this paper, we describe how the {\em cuQuantum} environment can support quantum algorithm co-design activities using widely-available commodity hardware. We describe how emulation techniques can be used to assess the impact of noise on algorithms of interest, and identify limitations associated with current hardware. We present our analysis in the context of areas of priority for cybersecurity and cryptography in particular since these algorithms are extraordinarily consequential for securing information in the digital world.

Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. A predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an optimal distance for mixed-type data is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric that uses mixed kernels to measure dissimilarity, with cross-validated optimal kernel bandwidths. Our approach improves clustering accuracy when utilized for existing distance-based clustering algorithms on simulated and real-world datasets containing pure continuous, categorical, and mixed-type data.

In this work, geometry optimization of mechanical truss using computer-aided finite element analysis is presented. The shape of the truss is a dominant factor in determining the capacity of load it can bear. At a given parameter space, our goal is to find the parameters of a hull that maximize the load-bearing capacity and also don't yield to the induced stress. We rely on finite element analysis, which is a computationally costly design analysis tool for design evaluation. For such expensive to-evaluate functions, we chose Bayesian optimization as our optimization framework which has empirically proven sample efficient than other simulation-based optimization methods. By utilizing Bayesian optimization algorithms, the truss design involves iteratively evaluating a set of candidate truss designs and updating a probabilistic model of the design space based on the results. The model is used to predict the performance of each candidate design, and the next candidate design is selected based on the prediction and an acquisition function that balances exploration and exploitation of the design space. Our result can be used as a baseline for future study on AI-based optimization in expensive engineering domains especially in finite element Analysis.

北京阿比特科技有限公司