亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

\ac{fl} proposed a distributed \ac{ml} framework where every distributed worker owns a complete copy of global model and their own data. The training is occurred locally, which assures no direct transmission of training data. However, the recent work \citep{zhu2019deep} demonstrated that input data from a neural network may be reconstructed only using knowledge of gradients of that network, which completely breached the promise of \ac{fl} and sabotaged the user privacy. In this work, we aim to further explore the theoretical limits of reconstruction, speedup and stabilize the reconstruction procedure. We show that a single input may be reconstructed with the analytical form, regardless of network depth using a fully-connected neural network with one hidden node. Then we generalize this result to a gradient averaged over batches of size $B$. In this case, the full batch can be reconstructed if the number of hidden units exceeds $B$. For a \ac{cnn}, the number of required kernels in convolutional layers is decided by multiple factors, e.g., padding, kernel and stride size, etc. We require the number of kernels $h\geq (\frac{d}{d^{\prime}})^2C$, where we define $d$ as input width, $d^{\prime}$ as output width after convolutional layer, and $C$ as channel number of input. We validate our observation and demonstrate the improvements using bio-medical (fMRI, \ac{wbc}) and benchmark data (MNIST, Kuzushiji-MNIST, CIFAR100, ImageNet and face images).

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

Consider a random graph process with $n$ vertices corresponding to points $v_{i} \sim {Unif}[0,1]$ embedded randomly in the interval, and where edges are inserted between $v_{i}, v_{j}$ independently with probability given by the graphon $w(v_{i},v_{j}) \in [0,1]$. Following Chuangpishit et al. (2015), we call a graphon $w$ diagonally increasing if, for each $x$, $w(x,y)$ decreases as $y$ moves away from $x$. We call a permutation $\sigma \in S_{n}$ an ordering of these vertices if $v_{\sigma(i)} < v_{\sigma(j)}$ for all $i < j$, and ask: how can we accurately estimate $\sigma$ from an observed graph? We present a randomized algorithm with output $\hat{\sigma}$ that, for a large class of graphons, achieves error $\max_{1 \leq i \leq n} | \sigma(i) - \hat{\sigma}(i)| = O^{*}(\sqrt{n})$ with high probability; we also show that this is the best-possible convergence rate for a large class of algorithms and proof strategies. Under an additional assumption that is satisfied by some popular graphon models, we break this "barrier" at $\sqrt{n}$ and obtain the vastly better rate $O^{*}(n^{\epsilon})$ for any $\epsilon > 0$. These improved seriation bounds can be combined with previous work to give more efficient and accurate algorithms for related tasks, including: estimating diagonally increasing graphons, and testing whether a graphon is diagonally increasing.

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.

Implicit neural networks have been successfully used for surface reconstruction from point clouds. However, many of them face scalability issues as they encode the isosurface function of a whole object or scene into a single latent vector. To overcome this limitation, a few approaches infer latent vectors on a coarse regular 3D grid or on 3D patches, and interpolate them to answer occupancy queries. In doing so, they loose the direct connection with the input points sampled on the surface of objects, and they attach information uniformly in space rather than where it matters the most, i.e., near the surface. Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point. We then perform a learning-based interpolation on nearest neighbors using inferred weights. Experiments on both object and scene datasets show that our approach significantly outperforms other methods on most classical metrics, producing finer details and better reconstructing thinner volumes. The code is available at //github.com/valeoai/POCO.

Deep learning has achieved great success in many applications. However, its deployment in practice has been hurdled by two issues: the privacy of data that has to be aggregated centrally for model training and high communication overhead due to transmission of a large amount of data usually geographically distributed. Addressing both issues is challenging and most existing works could not provide an efficient solution. In this paper, we develop FedPC, a Federated Deep Learning Framework for Privacy Preservation and Communication Efficiency. The framework allows a model to be learned on multiple private datasets while not revealing any information of training data, even with intermediate data. The framework also minimizes the amount of data exchanged to update the model. We formally prove the convergence of the learning model when training with FedPC and its privacy-preserving property. We perform extensive experiments to evaluate the performance of FedPC in terms of the approximation to the upper-bound performance (when training centrally) and communication overhead. The results show that FedPC maintains the performance approximation of the models within $8.5\%$ of the centrally-trained models when data is distributed to 10 computing nodes. FedPC also reduces the communication overhead by up to $42.20\%$ compared to existing works.

Image reconstruction of low-count positron emission tomography (PET) data is challenging. Kernel methods address the challenge by incorporating image prior information in the forward model of iterative PET image reconstruction. The kernelized expectation-maximization (KEM) algorithm has been developed and demonstrated to be effective and easy to implement. A common approach for a further improvement of the kernel method would be adding an explicit regularization, which however leads to a complex optimization problem. In this paper, we propose an implicit regularization for the kernel method by using a deep coefficient prior, which represents the kernel coefficient image in the PET forward model using a convolutional neural-network. To solve the maximum-likelihood neural network-based reconstruction problem, we apply the principle of optimization transfer to derive a neural KEM algorithm. Each iteration of the algorithm consists of two separate steps: a KEM step for image update from the projection data and a deep-learning step in the image domain for updating the kernel coefficient image using the neural network. This optimization algorithm is guaranteed to monotonically increase the data likelihood. The results from computer simulations and real patient data have demonstrated that the neural KEM can outperform existing KEM and deep image prior methods.

Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.

Alternating Direction Method of Multipliers (ADMM) is a widely used tool for machine learning in distributed settings, where a machine learning model is trained over distributed data sources through an interactive process of local computation and message passing. Such an iterative process could cause privacy concerns of data owners. The goal of this paper is to provide differential privacy for ADMM-based distributed machine learning. Prior approaches on differentially private ADMM exhibit low utility under high privacy guarantee and often assume the objective functions of the learning problems to be smooth and strongly convex. To address these concerns, we propose a novel differentially private ADMM-based distributed learning algorithm called DP-ADMM, which combines an approximate augmented Lagrangian function with time-varying Gaussian noise addition in the iterative process to achieve higher utility for general objective functions under the same differential privacy guarantee. We also apply the moments accountant method to bound the end-to-end privacy loss. The theoretical analysis shows that DP-ADMM can be applied to a wider class of distributed learning problems, is provably convergent, and offers an explicit utility-privacy tradeoff. To our knowledge, this is the first paper to provide explicit convergence and utility properties for differentially private ADMM-based distributed learning algorithms. The evaluation results demonstrate that our approach can achieve good convergence and model accuracy under high end-to-end differential privacy guarantee.

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice. We investigate how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs. We confirm that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. We further find it possible to train using batch sizes considerably larger than are standard, without negatively affecting sample complexity or final performance. We leverage these facts to build a unified framework for parallelization that dramatically hastens experiments in both classes of algorithm. All neural network computations use GPUs, accelerating both data collection and training. Our results include using an entire DGX-1 to learn successful strategies in Atari games in mere minutes, using both synchronous and asynchronous algorithms.

In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.

In this paper, a novel video classification methodology is presented that aims to recognize different categories of third-person videos efficiently. The idea is to keep track of motion in videos by following optical flow elements over time. To classify the resulted motion time series efficiently, the idea is letting the machine to learn temporal features along the time dimension. This is done by training a multi-channel one dimensional Convolutional Neural Network (1D-CNN). Since CNNs represent the input data hierarchically, high level features are obtained by further processing of features in lower level layers. As a result, in the case of time series, long-term temporal features are extracted from short-term ones. Besides, the superiority of the proposed method over most of the deep-learning based approaches is that we only try to learn representative temporal features along the time dimension. This reduces the number of learning parameters significantly which results in trainability of our method on even smaller datasets. It is illustrated that the proposed method could reach state-of-the-art results on two public datasets UCF11 and jHMDB with the aid of a more efficient feature vector representation.

北京阿比特科技有限公司