亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In machine learning, correlation clustering is an important problem whose goal is to partition the individuals into groups that correlate with their pairwise similarities as much as possible. In this work, we revisit the correlation clustering under the differential privacy constraints. Particularly, we improve previous results and achieve an $\Tilde{O}(n^{1.5})$ additive error compared to the optimal cost in expectation on general graphs. As for unweighted complete graphs, we improve the results further and propose a more involved algorithm which achieves $\Tilde{O}(n \sqrt{\Delta^*})$ additive error, where $\Delta^*$ is the maximum degrees of positive edges among all nodes.

相關內容

We introduce a new constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose to form a second trust region through the construction of another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial in case the old policy performs badly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed as Memory-Constrained Policy Optimization (MCPO), is examined on a diverse suite of environments including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.

We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves in the finite-horizon setting, where the transition kernel of the underlying Markov games can be parameterized by a linear function over the current state, both players' actions and the next state. In particular, we assume that we can control both players and aim to find the Nash Equilibrium by minimizing the duality gap. We propose an algorithm Nash-UCRL based on the principle "Optimism-in-Face-of-Uncertainty". Our algorithm only needs to find a Coarse Correlated Equilibrium (CCE), which is computationally efficient. Specifically, we show that Nash-UCRL can provably achieve an $\tilde{O}(dH\sqrt{T})$ regret, where $d$ is the linear function dimension, $H$ is the length of the game and $T$ is the total number of steps in the game. To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret. Our upper bound matches the lower bound up to logarithmic factors, which suggests the optimality of our algorithm.

This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large" clusters and "small" clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using simple random sampling. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study shows the practical relevance of our theoretical results.

Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem of maintaining a MaxIS over dynamic graphs has attracted increasing attention over the past few years. Due to the intractability of maintaining an exact MaxIS, this paper aims to develop efficient algorithms that can maintain an approximate MaxIS with an accuracy guarantee theoretically. In particular, we propose a framework that maintains a $(\frac{\Delta}{2} + 1)$-approximate MaxIS over dynamic graphs and prove that it achieves a constant approximation ratio in many real-world networks. To the best of our knowledge, this is the first non-trivial approximability result for the dynamic MaxIS problem. Following the framework, we implement an efficient linear-time dynamic algorithm and a more effective dynamic algorithm with near-linear expected time complexity. Our thorough experiments over real and synthetic graphs demonstrate the effectiveness and efficiency of the proposed algorithms, especially when the graph is highly dynamic.

We provide a decision theoretic analysis of bandit experiments. The setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for bandit experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit experiment with a PDE which can be efficiently solved using sparse matrix routines. We derive the optimal Bayes and minimax policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such as Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.

In the interdependent values (IDV) model introduced by Milgrom and Weber [1982], agents have private signals that capture their information about different social alternatives, and the valuation of every agent is a function of all agent signals. While interdependence has been mainly studied for auctions, it is extremely relevant for a large variety of social choice settings, including the canonical setting of public projects. The IDV model is very challenging relative to standard independent private values, and welfare guarantees have been achieved through two alternative conditions known as {\em single-crossing} and {\em submodularity over signals (SOS)}. In either case, the existing theory falls short of solving the public projects setting. Our contribution is twofold: (i) We give a workable characterization of truthfulness for IDV public projects for the largest class of valuations for which such a characterization exists, and term this class \emph{decomposable valuations}; (ii) We provide possibility and impossibility results for welfare approximation in public projects with SOS valuations. Our main impossibility result is that, in contrast to auctions, no universally truthful mechanism performs better for public projects with SOS valuations than choosing a project at random. Our main positive result applies to {\em excludable} public projects with SOS, for which we establish a constant factor approximation similar to auctions. Our results suggest that exclusion may be a key tool for achieving welfare guarantees in the IDV model.

We study the performance of a phase-noise impaired double reconfigurable intelligent surface (RIS)-aided multiuser (MU) multiple-input single-output (MISO) system under spatial correlation at both RISs and base-station (BS). The downlink achievable rate is derived in closed-form under maximum ratio transmission (MRT) precoding. In addition, we obtain the optimal phase-shift design at both RISs in closed-form for the considered channel and phase-noise models. Numerical results validate the analytical expressions, and highlight the effects of different system parameters on the achievable rate. Our analysis shows that phase-noise can severely degrade the performance when users do not have direct links to both RISs, and can only be served via the double-reflection link. Also, we show that high spatial correlation at RISs is essential for high achievable rates.

In the storied Colonel Blotto game, two colonels allocate $a$ and $b$ troops, respectively, to $k$ distinct battlefields. A colonel wins a battle if they assign more troops to that particular battle, and each colonel seeks to maximize their total number of victories. Despite the problem's formulation in 1921, the first polynomial-time algorithm to compute Nash equilibrium (NE) strategies for this game was discovered only quite recently. In 2016, \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} formulated a breakthrough algorithm to compute NE strategies for the Colonel Blotto game\footnote{To the best of our knowledge, the algorithm from \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} has computational complexity $O(k^{14}\max\{a,b\}^{13})$}, receiving substantial media coverage (e.g. \citep{Insider}, \citep{NSF}, \citep{ScienceDaily}). In this work, we present the first known $\epsilon$-approximation algorithm to compute NE strategies in the two-player Colonel Blotto game in runtime $\widetilde{O}(\epsilon^{-4} k^8 \max\{a,b\}^2)$ for arbitrary settings of these parameters. Moreover, this algorithm computes approximate coarse correlated equilibrium strategies in the multiplayer (continuous and discrete) Colonel Blotto game (when there are $\ell > 2$ colonels) with runtime $\widetilde{O}(\ell \epsilon^{-4} k^8 n^2 + \ell^2 \epsilon^{-2} k^3 n (n+k))$, where $n$ is the maximum troop count. Before this work, no polynomial-time algorithm was known to compute exact or approximate equilibrium (in any sense) strategies for multiplayer Colonel Blotto with arbitrary parameters. Our algorithm computes these approximate equilibria by a novel (to the author's knowledge) sampling technique with which we implicitly perform multiplicative weights update over the exponentially many strategies available to each player.

Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.

In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.

北京阿比特科技有限公司