亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This work is concerned with the classical wave equation with a high-contrast coefficient in the spatial derivative operator. We first treat the periodic case, where we derive a new limit in the one-dimensional case. The behavior is illustrated numerically and contrasted to the higher-dimensional case. For general unstructured high-contrast coefficients, we present the Localized Orthogonal Decomposition and show a priori error estimates in suitably weighted norms. Numerical experiments illustrate the convergence rates in various settings.

相關內容

In Federated Learning (FL) client devices connected over the internet collaboratively train a machine learning model without sharing their private data with a central server or with other clients. The seminal Federated Averaging (FedAvg) algorithm trains a single global model by performing rounds of local training on clients followed by model averaging. FedAvg can improve the communication-efficiency of training by performing more steps of Stochastic Gradient Descent (SGD) on clients in each round. However, client data in real-world FL is highly heterogeneous, which has been extensively shown to slow model convergence and harm final performance when $K > 1$ steps of SGD are performed on clients per round. In this work we propose decaying $K$ as training progresses, which can jointly improve the final performance of the FL model whilst reducing the wall-clock time and the total computational cost of training compared to using a fixed $K$. We analyse the convergence of FedAvg with decaying $K$ for strongly-convex objectives, providing novel insights into the convergence properties, and derive three theoretically-motivated decay schedules for $K$. We then perform thorough experiments on four benchmark FL datasets (FEMNIST, CIFAR100, Sentiment140, Shakespeare) to show the real-world benefit of our approaches in terms of real-world convergence time, computational cost, and generalisation performance.

Many convolutional neural networks (CNNs) rely on progressive downsampling of their feature maps to increase the network's receptive field and decrease computational cost. However, this comes at the price of losing granularity in the feature maps, limiting the ability to correctly understand images or recover fine detail in dense prediction tasks. To address this, common practice is to replace the last few downsampling operations in a CNN with dilated convolutions, allowing to retain the feature map resolution without reducing the receptive field, albeit increasing the computational cost. This allows to trade off predictive performance against cost, depending on the output feature resolution. By either regularly downsampling or not downsampling the entire feature map, existing work implicitly treats all regions of the input image and subsequent feature maps as equally important, which generally does not hold. We propose an adaptive downsampling scheme that generalizes the above idea by allowing to process informative regions at a higher resolution than less informative ones. In a variety of experiments, we demonstrate the versatility of our adaptive downsampling strategy and empirically show that it improves the cost-accuracy trade-off of various established CNNs.

The prediction of academic dropout, with the aim of preventing it, is one of the current challenges of higher education institutions. Machine learning techniques are a great ally in this task. However, attention is needed in the way that academic data are used by such methods, so that it reflects the reality of the prediction problem under study and allows achieving good results. In this paper, we study strategies for splitting and using academic data in order to create training and testing sets. Through a conceptual analysis and experiments with data from a public higher education institution, we show that a random proportional data splitting, and even a simple temporal splitting are not suitable for dropout prediction. The study indicates that a temporal splitting combined with a time-based selection of the students' incremental academic histories leads to the best strategy for the problem in question.

Integrating evolutionary partial differential equations (PDEs) is an essential ingredient for studying the dynamics of the solutions. Indeed, simulations are at the core of scientific computing, but their mathematical reliability is often difficult to quantify, especially when one is interested in the output of a given simulation, rather than in the asymptotic regime where the discretization parameter tends to zero. In this paper we present a computer-assisted proof methodology to perform rigorous time integration for scalar semilinear parabolic PDEs with periodic boundary conditions. We formulate an equivalent zero-finding problem based on a variations of constants formula in Fourier space. Using Chebyshev interpolation and domain decomposition, we then finish the proof with a Newton--Kantorovich type argument. The final output of this procedure is a proof of existence of an orbit, together with guaranteed error bounds between this orbit and a numerically computed approximation. We illustrate the versatility of the approach with results for the Fisher equation, the Swift--Hohenberg equation, the Ohta--Kawasaki equation and the Kuramoto--Sivashinsky equation. We expect that this rigorous integrator can form the basis for studying boundary value problems for connecting orbits in partial differential equations.

Modern numerical analysis is executed on discrete data, of which numerical difference computation is one of the cores and is indispensable. Nevertheless, difference algorithms have a critical weakness in their sensitivity to noise, which has long posed a challenge in various fields including signal processing. Difference is an extension or generalization of differential in the discrete domain. However, due to the finite interval in discrete calculation, there is a failure in meeting the most fundamental definition of differential, where dy and dx are both infinitesimal (Leibniz) or the limit of dx is 0 (Cauchy). In this regard, the generalization of differential to difference does not hold. To address this issue, we depart from the original derivative approach, construct a finite interval-based differential, and further generalize it to obtain the difference by convolution. Based on this theory, we present a variety of difference operators suitable for practical signal processing. Experimental results demonstrate that these difference operators possess exceptional signal processing capabilities, including high noise immunity.

A plethora of modern machine learning tasks require the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities which only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. We also show the convergence of our method to the optimal point under common assumptions and settings considered in literature. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16%-99% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate 25% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.

Under some regularity assumptions, we report an a priori error analysis of a dG scheme for the Poisson and Stokes flow problem in their dual mixed formulation. Both formulations satisfy a Babu\v{s}ka-Brezzi type condition within the space H(div) x L2. It is well known that the lowest order Crouzeix-Raviart element paired with piecewise constants satisfies such a condition on (broken) H1 x L2 spaces. In the present article, we use this pair. The continuity of the normal component is weakly imposed by penalizing jumps of the broken H(div) component. For the resulting methods, we prove well-posedness and convergence with constants independent of data and mesh size. We report error estimates in the methods natural norms and optimal local error estimates for the divergence error. In fact, our finite element solution shares for each triangle one DOF with the CR interpolant and the divergence is locally the best-approximation for any regularity. Numerical experiments support the findings and suggest that the other errors converge optimally even for the lowest regularity solutions and a crack-problem, as long as the crack is resolved by the mesh.

Knowledge graphs (KGs) capture knowledge in the form of head--relation--tail triples and are a crucial component in many AI systems. There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query. Embedding-based methods solve both tasks by first computing an embedding for each entity and relation, then using them to form predictions. However, existing scalable KG embedding frameworks only support single-hop knowledge graph completion and cannot be applied to the more challenging multi-hop reasoning task. Here we present Scalable Multi-hOp REasoning (SMORE), the first general framework for both single-hop and multi-hop reasoning in KGs. Using a single machine SMORE can perform multi-hop reasoning in Freebase KG (86M entities, 338M edges), which is 1,500x larger than previously considered KGs. The key to SMORE's runtime performance is a novel bidirectional rejection sampling that achieves a square root reduction of the complexity of online training data generation. Furthermore, SMORE exploits asynchronous scheduling, overlapping CPU-based data sampling, GPU-based embedding computation, and frequent CPU--GPU IO. SMORE increases throughput (i.e., training speed) over prior multi-hop KG frameworks by 2.2x with minimal GPU memory requirements (2GB for training 400-dim embeddings on 86M-node Freebase) and achieves near linear speed-up with the number of GPUs. Moreover, on the simpler single-hop knowledge graph completion task SMORE achieves comparable or even better runtime performance to state-of-the-art frameworks on both single GPU and multi-GPU settings.

Graph Convolutional Network (GCN) has been widely applied in transportation demand prediction due to its excellent ability to capture non-Euclidean spatial dependence among station-level or regional transportation demands. However, in most of the existing research, the graph convolution was implemented on a heuristically generated adjacency matrix, which could neither reflect the real spatial relationships of stations accurately, nor capture the multi-level spatial dependence of demands adaptively. To cope with the above problems, this paper provides a novel graph convolutional network for transportation demand prediction. Firstly, a novel graph convolution architecture is proposed, which has different adjacency matrices in different layers and all the adjacency matrices are self-learned during the training process. Secondly, a layer-wise coupling mechanism is provided, which associates the upper-level adjacency matrix with the lower-level one. It also reduces the scale of parameters in our model. Lastly, a unitary network is constructed to give the final prediction result by integrating the hidden spatial states with gated recurrent unit, which could capture the multi-level spatial dependence and temporal dynamics simultaneously. Experiments have been conducted on two real-world datasets, NYC Citi Bike and NYC Taxi, and the results demonstrate the superiority of our model over the state-of-the-art ones.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

北京阿比特科技有限公司