亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with N\llL variables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N^2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of...

相關內容

Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or number of iterations, is exhausted. While the final performance after tuning heavily depends on the provided budget, it is hard to pre-specify an optimal value in advance. In this work, we propose an effective and intuitive termination criterion for BO that automatically stops the procedure if it is sufficiently close to the global optimum. Our key insight is that the discrepancy between the true objective (predictive performance on test data) and the computable target (validation performance) suggests stopping once the suboptimality in optimizing the target is dominated by the statistical estimation error. Across an extensive range of real-world HPO problems and baselines, we show that our termination criterion achieves a better trade-off between the test performance and optimization time. Additionally, we find that overfitting may occur in the context of HPO, which is arguably an overlooked problem in the literature, and show how our termination criterion helps to mitigate this phenomenon on both small and large datasets.

Previous studies have shown that Automated Program Repair (APR) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by APR tools need to be validated by human programmers, which can be very costly, and prevents APR tools adoption in practice.Our work aims at increasing developer trust in automated patch generation by minimizing the number of plausible patches that they have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed with patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information attached to each patch. After analyzing 1910 plausible patches from 25 Java APR tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by APR tools, and provides them with new test cases that show the differences in behavior between generated patches.

We study the problem of providing channel state information (CSI) at the transmitter in multi-user massive MIMO systems operating in frequency division duplexing (FDD). The wideband MIMO channel is a vector-valued random process correlated in time, space (antennas), and frequency (subcarriers). The base station (BS) broadcasts periodically beta_tr pilot symbols from its M antenna ports to K single-antenna users (UEs). Correspondingly, the K UEs send feedback messages about their channel state using beta_fb symbols in the uplink (UL). Using results from remote rate-distortion theory, we show that, as snr reaches infty, the optimal feedback strategy achieves a channel state estimation mean squared error (MSE) that behaves as Theta(1) if beta_tr < r and as Theta(snr^(-alpha)) when beta_tr >=r, where alpha = min(beta_fb/r, 1), where r is the rank of the channel covariance matrix. The MSE-optimal rate-distortion strategy implies encoding of long sequences of channel states, which would yield completely stale CSI and therefore poor multiuser precoding performance. Hence, we consider three practical one-shot CSI strategies with minimum one-slot delay and analyze their large-SNR channel estimation MSE behavior. These are: (1) digital feedback via entropy-coded scalar quantization (ECSQ), (2) analog feedback (AF), and (3) local channel estimation at the UEs and digital feedback. These schemes have different requirements in terms of knowledge of the channel statistics at the UE and at the BS. In particular, the latter strategy requires no statistical knowledge and is closely inspired by a CSI feedback scheme currently proposed in 3GPP standardization.

It was observed in \citet{gupta2009differentially} that the Set Cover problem has strong impossibility results under differential privacy. In our work, we observe that these hardness results dissolve when we turn to the Partial Set Cover problem, where we only need to cover a $\rho$-fraction of the elements in the universe, for some $\rho\in(0,1)$. We show that this relaxation enables us to avoid the impossibility results: under loose conditions on the input set system, we give differentially private algorithms which output an explicit set cover with non-trivial approximation guarantees. In particular, this is the first differentially private algorithm which outputs an explicit set cover. Using our algorithm for Partial Set Cover as a subroutine, we give a differentially private (bicriteria) approximation algorithm for a facility location problem which generalizes $k$-center/$k$-supplier with outliers. Like with the Set Cover problem, no algorithm has been able to give non-trivial guarantees for $k$-center/$k$-supplier-type facility location problems due to the high sensitivity and impossibility results. Our algorithm shows that relaxing the covering requirement to serving only a $\rho$-fraction of the population, for $\rho\in(0,1)$, enables us to circumvent the inherent hardness. Overall, our work is an important step in tackling and understanding impossibility results in private combinatorial optimization.

Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias. According to the evaluation on five benchmark datasets, APOD outperforms the state-of-the-arts baseline methods under the limited annotation budget, and shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited.

Node classification on graphs is an important task in many practical domains. It usually requires labels for training, which can be difficult or expensive to obtain in practice. Given a budget for labelling, active learning aims to improve performance by carefully choosing which nodes to label. Previous graph active learning methods learn representations using labelled nodes and select some unlabelled nodes for label acquisition. However, they do not fully utilize the representation power present in unlabelled nodes. We argue that the representation power in unlabelled nodes can be useful for active learning and for further improving performance of active learning for node classification. In this paper, we propose a latent space clustering-based active learning framework for node classification (LSCALE), where we fully utilize the representation power in both labelled and unlabelled nodes. Specifically, to select nodes for labelling, our framework uses the K-Medoids clustering algorithm on a latent space based on a dynamic combination of both unsupervised features and supervised features. In addition, we design an incremental clustering module to avoid redundancy between nodes selected at different steps. Extensive experiments on five datasets show that our proposed framework LSCALE consistently and significantly outperforms the stateof-the-art approaches by a large margin.

The ability to learn new concepts continually is necessary in this ever-changing world. However, deep neural networks suffer from catastrophic forgetting when learning new categories. Many works have been proposed to alleviate this phenomenon, whereas most of them either fall into the stability-plasticity dilemma or take too much computation or storage overhead. Inspired by the gradient boosting algorithm to gradually fit the residuals between the target model and the previous ensemble model, we propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively. Specifically, we first dynamically expand new modules to fit the residuals between the target and the output of the original model. Next, we remove redundant parameters and feature dimensions through an effective distillation strategy to maintain the single backbone model. We validate our method FOSTER on CIFAR-100 and ImageNet-100/1000 under different settings. Experimental results show that our method achieves state-of-the-art performance. Code is available at: //github.com/G-U-N/ECCV22-FOSTER.

In this work, a method for unsupervised energy disaggregation in private households equipped with smart meters is proposed. This method aims to classify power consumption as active or passive, granting the ability to report on the residents' activity and presence without direct interaction. This lays the foundation for applications like non-intrusive health monitoring of private homes. The proposed method is based on minimizing a suitable energy functional, for which the iPALM (inertial proximal alternating linearized minimization) algorithm is employed, demonstrating that various conditions guaranteeing convergence are satisfied. In order to confirm feasibility of the proposed method, experiments on semi-synthetic test data sets and a comparison to existing, supervised methods are provided.

It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks.

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

北京阿比特科技有限公司