亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$.

相關內容

Preserving the topology from being inferred by external adversaries has become a paramount security issue for network systems (NSs), and adding random noises to the nodal states provides a promising way. Nevertheless, recent works have revealed that the topology cannot be preserved under i.i.d. noises in the asymptotic sense. How to effectively characterize the non-asymptotic preservation performance still remains an open issue. Inspired by the deviation quantification of concentration inequalities, this paper proposes a novel metric named trace-based variance-expectation ratio. This metric effectively captures the decaying rate of the topology inference error, where a slower rate indicates better non-asymptotic preservation performance. We prove that the inference error will always decay to zero asymptotically, as long as the added noises are non-increasing and independent (milder than the i.i.d. condition). Then, the optimal noise design that produces the slowest decaying rate for the error is obtained. More importantly, we amend the noise design by introducing one-lag time dependence, achieving the zero state deviation and the non-zero topology inference error in the asymptotic sense simultaneously. Extensions to a general class of noises with multi-lag time dependence are provided. Comprehensive simulations verify the theoretical findings.

Recent research has demonstrated the potential of reinforcement learning (RL) in enabling effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interests and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents a novel approach, namely PIMbot, to manipulating the reward function in multi-robot collaboration through two distinct forms of manipulation: policy and incentive manipulation. Our work introduces a new angle for manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. PIMbot has the potential for both positive and negative impacts on the task outcome, where positive impacts lead to faster convergence to the global optimum and maximized rewards for any chosen robot. Conversely, negative impacts can have a detrimental effect on the overall task performance. We present comprehensive experimental results that demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Our work provides insights into how inter-robot communication can be manipulated and has implications for various robotic applications. %, including robotics, transportation, and manufacturing.

We propose an unconditionally energy-stable, orthonormality-preserving, component-wise splitting iterative scheme for the Kohn-Sham gradient flow based model in the electronic structure calculation. We first study the scheme discretized in time but still continuous in space. The component-wise splitting iterative scheme changes one wave function at a time, similar to the Gauss-Seidel iteration for solving a linear equation system. Rigorous mathematical derivations are presented to show our proposed scheme indeed satisfies the desired properties. We then study the fully-discretized scheme, where the space is further approximated by a conforming finite element subspace. For the fully-discretized scheme, not only the preservation of orthogonality and normalization (together we called orthonormalization) can be quickly shown using the same idea as for the semi-discretized scheme, but also the highlight property of the scheme, i.e., the unconditional energy stability can be rigorously proven. The scheme allows us to use large time step sizes and deal with small systems involving only a single wave function during each iteration step. Several numerical experiments are performed to verify the theoretical analysis, where the number of iterations is indeed greatly reduced as compared to similar examples solved by the Kohn-Sham gradient flow based model in the literature.

Optimization is offered as an objective approach to resolving complex, real-world decisions involving uncertainty and conflicting interests. It drives business strategies as well as public policies and, increasingly, lies at the heart of sophisticated machine learning systems. A paradigm used to approach potentially high-stakes decisions, optimization relies on abstracting the real world to a set of decision(s), objective(s) and constraint(s). Drawing from the modeling process and a range of actual cases, this paper describes the normative choices and assumptions that are necessarily part of using optimization. It then identifies six emergent problems that may be neglected: 1) Misspecified values can yield optimizations that omit certain imperatives altogether or incorporate them incorrectly as a constraint or as part of the objective, 2) Problematic decision boundaries can lead to faulty modularity assumptions and feedback loops, 3) Failing to account for multiple agents' divergent goals and decisions can lead to policies that serve only certain narrow interests, 4) Mislabeling and mismeasurement can introduce bias and imprecision, 5) Faulty use of relaxation and approximation methods, unaccompanied by formal characterizations and guarantees, can severely impede applicability, and 6) Treating optimization as a justification for action, without specifying the necessary contextual information, can lead to ethically dubious or faulty decisions. Suggestions are given to further understand and curb the harms that can arise when optimization is used wrongfully.

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. Moreover, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits. Additionally, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. Our new definition provides a unified approach, applicable seamlessly to both Bayesian and frequentist formulations of bandits. Furthermore, our definition ensures consistent classification of two bandits offering agents indistinguishable experiences, categorizing them as either both stationary or both non-stationary. This advancement provides a more robust framework for non-stationary bandit learning.

In this manuscript we derive the optimal out-of-sample causal predictor for a linear system that has been observed in $k+1$ within-sample environments. In this model we consider $k$ shifted environments and one observational environment. Each environment corresponds to a linear structural equation model (SEM) with its own shift and noise vector, both in $L^2$. The strength of the shifts can be put in a certain order, and we may therefore speak of all shifts that are less or equally strong than a given shift. We consider the space of all shifts are $\gamma$ times less or equally strong than any weighted average of the observed shift vectors with weights on the unit sphere. For each $\beta\in\mathbb{R}^p$ we show that the supremum of the risk functions $R_{\tilde{A}}(\beta)$ over $\tilde{A}\in C^\gamma$ has a worst-risk decomposition into a (positive) linear combination of risk functions, depending on $\gamma$. We then define the causal regularizer, $\beta_\gamma$, as the argument $\beta$ that minimizes this risk. The main result of the paper is that this regularizer can be consistently estimated with a plug-in estimator outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for such estimation is that it involves the solution of a general degree polynomial which cannot be done explicitly. Therefore we also prove that an approximate plug-in estimator using the bisection method is also consistent. An interesting by-product of the proof of the main result is that the plug-in estimation of the argmin of the maxima of a finite set of quadratic risk functions is consistent outside a set of zero Lebesgue measure in the parameter space.

We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^* := \arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability decreases as $\exp(-T /\rho^*)$ for a problem-dependent constant $\rho^*$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$. As there exist environments where $\Delta_{(1)}^2/ d \ll 1/ \rho^*$, we are motivated to propose a novel algorithm $\mathsf{P1}$-$\mathsf{RAGE}$ that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of $\mathsf{P1}$-$\mathsf{RAGE}$ and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

As the network scale increases, existing fully distributed solutions start to lag behind the real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks. In this paper, we focus on hierarchical system architecture and address the problem of non-Bayesian learning over networks that are vulnerable to communication failures and adversarial attacks. On network communication, we consider packet-dropping link failures. We first propose a hierarchical robust push-sum algorithm that can achieve average consensus despite frequent packet-dropping link failures. We provide a sparse information fusion rule between the parameter server and arbitrarily selected network representatives. Then, interleaving the consensus update step with a dual averaging update with Kullback-Leibler (KL) divergence as the proximal function, we obtain a packet-dropping fault-tolerant non-Bayesian learning algorithm with provable convergence guarantees. On external adversarial attacks, we consider Byzantine attacks in which the compromised agents can send maliciously calibrated messages to others (including both the agents and the parameter server). To avoid the curse of dimensionality of Byzantine consensus, we solve the non-Bayesian learning problem via running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs. To facilitate resilient information propagation across sub-networks, we use a novel Byzantine-resilient gossiping-type rule at the parameter server.

Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house. Efficient and accurate load monitoring facilitates user profile establishment, intelligent household energy management, and peak load shifting. This is beneficial for both the end-users and utilities by improving the overall efficiency of a power distribution network. Existing approaches mainly focus on developing an individual model for each appliance. Those approaches typically rely on a large amount of household-labeled data which is hard to collect. In this paper, we propose a multi-appliance-task framework with a training-efficient sample augmentation (SA) scheme that boosts the disaggregation performance with limited labeled data. For each appliance, we develop a shared-hierarchical split structure for its regression and classification tasks. In addition, we also propose a two-dimensional attention mechanism in order to capture spatio-temporal correlations among all appliances. With only one-day training data and limited appliance operation profiles, the proposed SA algorithm can achieve comparable test performance to the case of training with the full dataset. Finally, simulation results show that our proposed approach features a significantly improved performance over many baseline models. The relative errors can be reduced by more than 50\% on average. The codes of this work are available at //github.com/jxiong22/MATNilm

We study the reverse shortest path problem on disk graphs in the plane. In this problem we consider the proximity graph of a set of $n$ disks in the plane of arbitrary radii: In this graph two disks are connected if the distance between them is at most some threshold parameter $r$. The case of intersection graphs is a special case with $r=0$. We give an algorithm that, given a target length $k$, computes the smallest value of $r$ for which there is a path of length at most $k$ between some given pair of disks in the proximity graph. Our algorithm runs in $O^*(n^{5/4})$ randomized expected time, which improves to $O^*(n^{6/5})$ for unit disk graphs, where all the disks have the same radius. Our technique is robust and can be applied to many variants of the problem. One significant variant is the case of weighted proximity graphs, where edges are assigned real weights equal to the distance between the disks or between their centers, and $k$ is replaced by a target weight $w$; that is, we seek a path whose length is at most $w$. In other variants, we want to optimize a parameter different from $r$, such as a scale factor of the radii of the disks. The main technique for the decision version of the problem (determining whether the graph with a given $r$ has the desired property) is based on efficient implementations of BFS (for the unweighted case) and of Dijkstra's algorithm (for the weighted case), using efficient data structures for maintaining the bichromatic closest pair for certain bicliques and several distance functions. The optimization problem is then solved by combining the resulting decision procedure with enhanced variants of the interval shrinking and bifurcation technique of [4].

北京阿比特科技有限公司