亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Models of parallel processing systems typically assume that one has $l$ workers and jobs are split into an equal number of $k=l$ tasks. Splitting jobs into $k > l$ smaller tasks, i.e. using ``tiny tasks'', can yield performance and stability improvements because it reduces the variance in the amount of work assigned to each worker, but as $k$ increases, the overhead involved in scheduling and managing the tasks begins to overtake the performance benefit. We perform extensive experiments on the effects of task granularity on an Apache Spark cluster, and based on these, developed a four-parameter model for task and job overhead that, in simulation, produces sojourn time distributions that match those of the real system. We also present analytical results which illustrate how using tiny tasks improves the stability region of split-merge systems, and analytical bounds on the sojourn and waiting time distributions of both split-merge and single-queue fork-join systems with tiny tasks. Finally we combine the overhead model with the analytical models to produce an analytical approximation to the sojourn and waiting time distributions of systems with tiny tasks which include overhead. Though no longer strict analytical bounds, these approximations matched the Spark experimental results very well in both the split-merge and fork-join cases.

相關內容

Two novel parallel Newton-Krylov Balancing Domain Decomposition by Constraints (BDDC) and Dual-Primal Finite Element Tearing and Interconnecting (FETI-DP) solvers are here constructed, analyzed and tested numerically for implicit time discretizations of the three-dimensional Bidomain system of equations. This model represents the most advanced mathematical description of the cardiac bioelectrical activity and it consists of a degenerate system of two non-linear reaction-diffusion partial differential equations (PDEs), coupled with a stiff system of ordinary differential equations (ODEs). A finite element discretization in space and a segregated implicit discretization in time, based on decoupling the PDEs from the ODEs, yields at each time step the solution of a non-linear algebraic system. The Jacobian linear system at each Newton iteration is solved by a Krylov method, accelerated by BDDC or FETI-DP preconditioners, both augmented with the recently introduced {\em deluxe} scaling of the dual variables. A polylogarithmic convergence rate bound is proven for the resulting parallel Bidomain solvers. Extensive numerical experiments on linux clusters up to two thousands processors confirm the theoretical estimates, showing that the proposed parallel solvers are scalable and quasi-optimal.

While algorithms for planar graphs have received a lot of attention, few papers have focused on the additional power that one gets from assuming an embedding of the graph is available. While in the classic sequential setting, this assumption gives no additional power (as a planar graph can be embedded in linear time), we show that this is far from being the case in other settings. We assume that the embedding is straight-line, but our methods also generalize to non-straight-line embeddings. Specifically, we focus on sublinear-time computation and massively parallel computation (MPC). Our main technical contribution is a sublinear-time algorithm for computing a relaxed version of an $r$-division. We then show how this can be used to estimate Lipschitz additive graph parameters. This includes, for example, the maximum matching, maximum independent set, or the minimum dominating set. We also show how this can be used to solve some property testing problems with respect to the vertex edit distance. In the second part of our paper, we show an MPC algorithm that computes an $r$-division of the input graph. We show how this can be used to solve various classical graph problems with space per machine of $O(n^{2/3+\epsilon})$ for some $\epsilon>0$, and while performing $O(1)$ rounds. This includes for example approximate shortest paths or the minimum spanning tree. Our results also imply an improved MPC algorithm for Euclidean minimum spanning tree.

We study the distributed minimum spanning tree (MST) problem, a fundamental problem in distributed computing. It is well-known that distributed MST can be solved in $\tilde{O}(D+\sqrt{n})$ rounds in the standard CONGEST model (where $n$ is the network size and $D$ is the network diameter) and this is essentially the best possible round complexity (up to logarithmic factors). However, in resource-constrained networks such as ad hoc wireless and sensor networks, nodes spending so much time can lead to significant spending of resources such as energy. Motivated by the above consideration, we study distributed algorithms for MST under the \emph{sleeping model} [Chatterjee et al., PODC 2020], a model for design and analysis of resource-efficient distributed algorithms. In the sleeping model, a node can be in one of two modes in any round -- \emph{sleeping} or \emph{awake} (unlike the traditional model where nodes are always awake). Only the rounds in which a node is \emph{awake} are counted, while \emph{sleeping} rounds are ignored. A node spends resources only in the awake rounds and hence the main goal is to minimize the \emph{awake complexity} of a distributed algorithm, the worst-case number of rounds any node is awake. We present deterministic and randomized distributed MST algorithms that have an \emph{optimal} awake complexity of $O(\log n)$ time with a matching lower bound. We also show that our randomized awake-optimal algorithm has essentially the best possible round complexity by presenting a lower bound of $\tilde{\Omega}(n)$ on the product of the awake and round complexity of any distributed algorithm (including randomized) that outputs an MST, where $\tilde{\Omega}$ hides a $1/(\text{polylog } n)$ factor.

Maximal Independent Set (MIS) is one of the central and most well-studied problems in distributed computing. Even after four decades of intensive research, the best-known (randomized) MIS algorithms take $O(\log{n})$ worst-case rounds on general graphs (where $n$ is the number of nodes), while the best-known lower bound is $\Omega\left(\sqrt{\frac{\log{n}}{\log{\log{n}}}}\right)$ rounds. Breaking past the $O(\log{n})$ worst-case bound or showing stronger lower bounds have been longstanding open problems. Our main contribution is that we show that MIS can be computed in (worst-case) awake complexity of $O(\log \log n)$ rounds that is (essentially) exponentially better compared to the (traditional) round complexity lower bound of $\Omega\left(\sqrt{\frac{\log{n}}{\log{\log{n}}}}\right)$. Specifically, we present the following results. (1) We present a randomized distributed (Monte Carlo) algorithm for MIS that with high probability computes an MIS and has $O(\log\log{n})$-rounds awake complexity. This algorithm has (traditional) {\em round complexity} that is $O(poly(n))$. Our bounds hold in the $CONGEST(O(polylog n))$ model where only $O(polylog n)$ (specifically $O(\log^3 n)$) bits are allowed to be sent per edge per round. (2) We also show that we can drastically reduce the round complexity at the cost of a slight increase in awake complexity by presenting a randomized MIS algorithm with $O(\log \log n \log^* n )$ awake complexity and $O(\log^3 n \log \log n \log^*n)$ round complexity in the $CONGEST(O(polylog n))$ model.

We provide a decision theoretic analysis of bandit experiments. The setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for bandit experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit experiment with a PDE which can be efficiently solved using sparse matrix routines. We derive the optimal Bayes and minimax policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such as Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process sequentially across devices, which incurs significant training latency especially when the number of devices is large. In this paper, we design a novel SL scheme to reduce the training latency, named Cluster-based Parallel SL (CPSL) which conducts model training in a "first-parallel-then-sequential" manner. Specifically, the CPSL is to partition devices into several clusters, parallelly train device-side models in each cluster and aggregate them, and then sequentially train the whole AI model across clusters, thereby parallelizing the training process and reducing training latency. Furthermore, we propose a resource management algorithm to minimize the training latency of CPSL considering device heterogeneity and network dynamics in wireless networks. This is achieved by stochastically optimizing the cut layer selection, real-time device clustering, and radio spectrum allocation. The proposed two-timescale algorithm can jointly make the cut layer selection decision in a large timescale and device clustering and radio spectrum allocation decisions in a small timescale. Extensive simulation results on non-independent and identically distributed data demonstrate that the proposed solutions can greatly reduce the training latency as compared with the existing SL benchmarks, while adapting to network dynamics.

The metriplectic formalism is useful for describing complete dynamical systems which conserve energy and produce entropy. This creates challenges for model reduction, as the elimination of high-frequency information will generally not preserve the metriplectic structure which governs long-term stability of the system. Based on proper orthogonal decomposition, a provably convergent metriplectic reduced-order model is formulated which is guaranteed to maintain the algebraic structure necessary for energy conservation and entropy formation. Numerical results on benchmark problems show that the proposed method is remarkably stable, leading to improved accuracy over long time scales at a moderate increase in cost over naive methods.

While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.

Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.

As a crucial component of most modern deep recommender systems, feature embedding maps high-dimensional sparse user/item features into low-dimensional dense embeddings. However, these embeddings are usually assigned a unified dimension, which suffers from the following issues: (1) high memory usage and computation cost. (2) sub-optimal performance due to inferior dimension assignments. In order to alleviate the above issues, some works focus on automated embedding dimension search by formulating it as hyper-parameter optimization or embedding pruning problems. However, they either require well-designed search space for hyperparameters or need time-consuming optimization procedures. In this paper, we propose a Single-Shot Embedding Dimension Search method, called SSEDS, which can efficiently assign dimensions for each feature field via a single-shot embedding pruning operation while maintaining the recommendation accuracy of the model. Specifically, it introduces a criterion for identifying the importance of each embedding dimension for each feature field. As a result, SSEDS could automatically obtain mixed-dimensional embeddings by explicitly reducing redundant embedding dimensions based on the corresponding dimension importance ranking and the predefined parameter budget. Furthermore, the proposed SSEDS is model-agnostic, meaning that it could be integrated into different base recommendation models. The extensive offline experiments are conducted on two widely used public datasets for CTR prediction tasks, and the results demonstrate that SSEDS can still achieve strong recommendation performance even if it has reduced 90\% parameters. Moreover, SSEDS has also been deployed on the WeChat Subscription platform for practical recommendation services. The 7-day online A/B test results show that SSEDS can significantly improve the performance of the online recommendation model.

北京阿比特科技有限公司