亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In machine learning and big data, the optimization objectives based on set-cover, entropy, diversity, influence, feature selection, etc. are commonly modeled as submodular functions. Submodular (function) maximization is generally NP-hard, even in the absence of constraints. Recently, submodular maximization has been successfully investigated for the settings where the objective function is monotone or the constraint is computation-tractable. However, maximizing nonmonotone submodular function with complex constraints is not yet well-understood. In this paper, we consider the nonmonotone submodular maximization with a cost budget or feasibility constraint (especially from route planning) that is generally NP-hard to evaluate. Such a problem is common for machine learning, big data, and robotics. This problem is NP-hard, and on top of that, its constraint evaluation is also NP-hard, which adds an additional layer of complexity. So far, few studies have been devoted to proposing effective solutions, making this problem currently unclear. In this paper, we first propose an iterated greedy algorithm, which yields an approximation solution. Then we develop the proof machinery that shows our algorithm is a bi-criterion approximation algorithm: it can achieve a constant-factor approximation to the optimal algorithm, while keeping the over-budget tightly bounded. We also explore practical considerations of achieving a trade-off between time complexity and over-budget. Finally, we conduct numeric experiments on two concrete examples, and show our design's efficacy in practical settings.

相關內容

We revisit the estimation bias in policy gradients for the discounted episodic Markov decision process (MDP) from Deep Reinforcement Learning (DRL) perspective. The objective is formulated theoretically as the expected returns discounted over the time horizon. One of the major policy gradient biases is the state distribution shift: the state distribution used to estimate the gradients differs from the theoretical formulation in that it does not take into account the discount factor. Existing discussion of the influence of this bias was limited to the tabular and softmax cases in the literature. Therefore, in this paper, we extend it to the DRL setting where the policy is parameterized and demonstrate how this bias can lead to suboptimal policies theoretically. We then discuss why the empirically inaccurate implementations with shifted state distribution can still be effective. We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization. Specifically, we show that a smaller learning rate, or, an adaptive learning rate, such as that used by Adam and RSMProp optimizers, makes the policy optimization robust to the bias. We further draw connections between optimizers and the optimization regularization to show that both the KL and the reverse KL regularization can significantly rectify this bias. Moreover, we provide extensive experiments on continuous control tasks to support our analysis. Our paper sheds light on how successful PG algorithms optimize policies in the DRL setting, and contributes insights into the practical issues in DRL.

MapReduce (MR) frameworks for maximizing monotone, submodular functions subject to a cardinality constraint (SMCC) have currently only been shown to work with linear-adaptive (non-parallelizable) algorithms, that require large number of distributions in order to utilize the available processors, thus resulting in severe restrictions on the cardinality constraint in addition to limited scalability. Low-adaptive algorithms do not currently satisfy the requirements of these distributed MR frameworks, thereby limiting their performance. We study the SMCC problem in a distributed setting and propose the first MR algorithms with sublinear adaptive complexity. Our algorithms, R-DASH, T-DASH and G-DASH provide $0.316-\varepsilon$, $3/8 -\varepsilon$, and $1 - 1/e -\varepsilon$ approximation ratios, respectively, with nearly optimal adaptive complexity and nearly linear time complexity. Additionally, we provide a framework to increase, under some mild assumptions, the maximum permissible cardinality constraint from $O( n / \ell^2)$ of prior MR algorithms to $O( n / \ell )$, where $n$ is the data size and $\ell$ is the number of machines; under a stronger condition on the objective function, we increase the maximum constraint value to $n$. Finally, we provide empirical evidence to demonstrate that our sublinear-adaptive, distributed algorithms provide orders of magnitude faster runtime compared to current state-of-the-art distributed algorithms.

Motivated by a real-world application, we model and solve a complex staff scheduling problem. Tasks are to be assigned to workers for supervision. Multiple tasks can be covered in parallel by a single worker, with worker shifts being flexible within availabilities. Each worker has a different skill set, enabling them to cover different tasks. Tasks require assignment according to priority and skill requirements. The objective is to maximize the number of assigned tasks weighted by their priorities, while minimizing assignment penalties. We develop an adaptive large neighborhood search (ALNS) algorithm, relying on tailored destroy and repair operators. It is tested on benchmark instances derived from real-world data and compared to optimal results obtained by means of a commercial MIP-solver. Furthermore, we analyze the impact of considering three additional alternative objective functions. When applied to large-scale company data, the developed ALNS outperforms the previously applied solution approach.

Online platforms often incentivize consumers to improve user engagement and platform revenue. Since different consumers might respond differently to incentives, individual-level budget allocation is an essential task in marketing campaigns. Recent advances in this field often address the budget allocation problem using a two-stage paradigm: the first stage estimates the individual-level treatment effects using causal inference algorithms, and the second stage invokes integer programming techniques to find the optimal budget allocation solution. Since the objectives of these two stages might not be perfectly aligned, such a two-stage paradigm could hurt the overall marketing effectiveness. In this paper, we propose a novel end-to-end framework to directly optimize the business goal under budget constraints. Our core idea is to construct a regularizer to represent the marketing goal and optimize it efficiently using gradient estimation techniques. As such, the obtained models can learn to maximize the marketing goal directly and precisely. We extensively evaluate our proposed method in both offline and online experiments, and experimental results demonstrate that our method outperforms current state-of-the-art methods. Our proposed method is currently deployed to allocate marketing budgets for hundreds of millions of users on a short video platform and achieves significant business goal improvements. Our code will be publicly available.

Motivated by the increasing need for fast processing of large-scale graphs, we study a number of fundamental graph problems in a message-passing model for distributed computing, called $k$-machine model, where we have $k$ machines that jointly perform computations on $n$-node graphs. The graph is assumed to be partitioned in a balanced fashion among the $k$ machines, a common implementation in many real-world systems. Communication is point-to-point via bandwidth-constrained links, and the goal is to minimize the round complexity, i.e., the number of communication rounds required to finish a computation. We present a generic methodology that allows to obtain efficient algorithms in the $k$-machine model using distributed algorithms for the classical CONGEST model of distributed computing. Using this methodology, we obtain algorithms for various fundamental graph problems such as connectivity, minimum spanning trees, shortest paths, maximal independent sets, and finding subgraphs, showing that many of these problems can be solved in $\tilde{O}(n/k)$ rounds; this shows that one can achieve speedup nearly linear in $k$. To complement our upper bounds, we present lower bounds on the round complexity that quantify the fundamental limitations of solving graph problems distributively. We first show a lower bound of $\Omega(n/k)$ rounds for computing a spanning tree of the input graph. This result implies the same bound for other fundamental problems such as computing a minimum spanning tree, breadth-first tree, or shortest paths tree. We also show a $\tilde \Omega(n/k^2)$ lower bound for connectivity, spanning tree verification and other related problems. The latter lower bounds follow from the development and application of novel results in a random-partition variant of the classical communication complexity model.

We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at each time is scaled by a non-stationary quantity that signifies changing demand volumes. Instead of studying settings with limited non-stationarity, we investigate how online predictions on the total demand volume $Q$ allows us to improve our performance guarantees. We show that, without any prediction, any online algorithm incurs a linear-in-$T$ regret. In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. These bounds are shown to be tight in settings when prediction accuracy improves across time. Our theoretical results are corroborated by our numerical findings.

In this paper, we develop two ``Nesterov's accelerated'' variants of the well-known extragradient method to approximate a solution of a co-hypomonotone inclusion constituted by the sum of two operators, where one is Lipschitz continuous and the other is possibly multivalued. The first scheme can be viewed as an accelerated variant of Tseng's forward-backward-forward splitting method, while the second one is a variant of the reflected forward-backward splitting method, which requires only one evaluation of the Lipschitz operator, and one resolvent of the multivalued operator. Under a proper choice of the algorithmic parameters and appropriate conditions on the co-hypomonotone parameter, we theoretically prove that both algorithms achieve $\mathcal{O}(1/k)$ convergence rates on the norm of the residual, where $k$ is the iteration counter. Our results can be viewed as alternatives of a recent class of Halpern-type schemes for root-finding problems.

Learning causal relationships between variables is a fundamental task in causal inference and directed acyclic graphs (DAGs) are a popular choice to represent the causal relationships. As one can recover a causal graph only up to its Markov equivalence class from observations, interventions are often used for the recovery task. Interventions are costly in general and it is important to design algorithms that minimize the number of interventions performed. In this work, we study the problem of identifying the smallest set of interventions required to learn the causal relationships between a subset of edges (target edges). Under the assumptions of faithfulness, causal sufficiency, and ideal interventions, we study this problem in two settings: when the underlying ground truth causal graph is known (subset verification) and when it is unknown (subset search). For the subset verification problem, we provide an efficient algorithm to compute a minimum sized interventional set; we further extend these results to bounded size non-atomic interventions and node-dependent interventional costs. For the subset search problem, in the worst case, we show that no algorithm (even with adaptivity or randomization) can achieve an approximation ratio that is asymptotically better than the vertex cover of the target edges when compared with the subset verification number. This result is surprising as there exists a logarithmic approximation algorithm for the search problem when we wish to recover the whole causal graph. To obtain our results, we prove several interesting structural properties of interventional causal graphs that we believe have applications beyond the subset verification/search problems studied here.

Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference, playing a crucial role in optimal treatment allocation, generalizability, subgroup effects, and more. Many flexible methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a nonparametric model where distributional components are Holder-smooth, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. More specifically, our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods; it is shown to be minimax optimal under a condition on how accurately the covariate distribution is estimated. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid. We conclude with some discussion of a few remaining open problems.

The ongoing rise in cyberattacks and the lack of skilled professionals in the cybersecurity domain to combat these attacks show the need for automated tools capable of detecting an attack with good performance. Attackers disguise their actions and launch attacks that consist of multiple actions, which are difficult to detect. Therefore, improving defensive tools requires their calibration against a well-trained attacker. In this work, we propose a model of an attacking agent and environment and evaluate its performance using basic Q-Learning, Naive Q-learning, and DoubleQ-Learning, all of which are variants of Q-Learning. The attacking agent is trained with the goal of exfiltrating data whereby all the hosts in the network have a non-zero detection probability. Results show that the DoubleQ-Learning agent has the best overall performance rate by successfully achieving the goal in $70\%$ of the interactions.

北京阿比特科技有限公司