In this paper, we consider a Linear Program (LP)-based online resource allocation problem where a decision maker accepts or rejects incoming customer requests irrevocably in order to maximize expected revenue given limited resources. At each time, a new order/customer/bid is revealed with a request of some resource(s) and a reward. We consider a stochastic setting where all the orders are i.i.d. sampled from an unknown distribution. Such formulation gives rise to many classic applications such as the canonical (quantity-based) network revenue management problem and the Adwords problem. Instead of focusing only on regret minimization, this paper aims to provide fairness guarantees while maintaining low regret. Our definition of fairness is that a fair online algorithm should treat similar agents/customers similarly and the decision made for similar individuals should be consistent over time. We define the fair offline solution as the analytic center of the offline optimal solution set, and define \textit{cumulative unfairness} as the cumulative deviation from the online solutions to the fair offline solution. We propose a fair algorithm that uses an interior-point LP solver and dynamically detects unfair resource spending. Our algorithm can control cumulative unfairness on the scale of order $O(\log(T))$, while maintaining the regret to be bounded without dependency on $T$. Moreover, we partially remove the nondegeneracy assumptions used in early results in the literature. This paper only requires the nondegeneracy condition for the binding constraints, and allows the existence of multiple optimal solutions.
Cell-free Massive MIMO systems consist of a large number of geographically distributed access points (APs) that serve users by coherent joint transmission. Downlink power allocation is important in these systems, to determine which APs should transmit to which users and with what power. If the system is implemented correctly, it can deliver a more uniform user performance than conventional cellular networks. To this end, previous works have shown how to perform system-wide max-min fairness power allocation when using maximum ratio precoding. In this paper, we first generalize this method to arbitrary precoding, and then train a neural network to perform approximately the same power allocation but with reduced computational complexity. Finally, we train one neural network per AP to mimic system-wide max-min fairness power allocation, but using only local information. By learning the structure of the local propagation environment, this method outperforms the state-of-the-art distributed power allocation method from the Cell-free Massive MIMO literature.
This paper presents a control-theoretic framework which stably combines optimal feedback policies with online learning for control of uncertain nonlinear systems. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. The proposed framework is able to employ the certainty equivalence principle when designing optimal policies and value functions by online adjustment of the learning rate - a mechanism needed to guarantee stable learning and control. The approach is demonstrated on the familiar mountain car problem, where it is shown to yield near-optimal behavior despite the presence of parametric uncertainty.
We study a fair resource scheduling problem, where a set of interval jobs are to be allocated to heterogeneous machines controlled by agents. Each job is associated with release time, deadline, and processing time such that it can be processed if its complete processing period is between its release time and deadline. The machines gain possibly different utilities by processing different jobs, and all jobs assigned to the same machine should be processed without overlap. We consider two widely studied solution concepts, namely, maximin share fairness and envy-freeness. For both criteria, we discuss the extent to which fair allocations exist and present constant approximation algorithms for various settings.
Incorporating mobile edge computing (MEC) in Internet of Things (IoT) enables resource-limited IoT devices to offload their computation tasks to a nearby edge server. In this paper, we investigate an IoT system assisted by the MEC technique with its computation task subjected to sequential task dependency, which is critical for video stream processing and other intelligent applications. To minimize energy consumption per IoT device while limiting task processing delay, task offloading strategy, communication resource, and computation resource are optimized jointly under both slow and fast fading channels. In slow fading channels, an optimization problem is formulated, which is non-convex and involves one integer variable. To solve this challenging problem, we decompose it as a one-dimensional search of task offloading decision problem and a non-convex optimization problem with task offloading decision given. Through mathematical manipulations, the non-convex problem is transformed to be a convex one, which is shown to be solvable only with the simple Golden search method. In fast fading channels, optimal online policies depending on instant channel state are derived even though they are entangled. In addition, it is proved that the derived policy will converge to the offline policy when channel coherence time is low, which can help to save extra computation complexity. Numerical results verify the correctness of our analysis and the effectiveness of our proposed strategies over existing methods.
We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast algorithms that depend on as few parameters as possible, in particular they should be anytime and thus not depend on the time horizon. Our first and main tool, isotuning, is a generalization of the idea of balancing the trade-off of the regret. We develop a set of tools to design and analyze such learning rates easily and show that they adapts automatically to the rate of the regret (whether constant, $O(\log T)$, $O(\sqrt{T})$, etc.) within a factor 2 of the optimal learning rate in hindsight for the same observed quantities. The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous when the domain is overly large or only partially constrained. The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret, or even invalid updates. We develop a general theory using these tools and apply it to several standard algorithms. In particular, we (almost entirely) restore the adaptivity to small losses of FTRL for unbounded domains, design and prove scale-free adaptive guarantees for a variant of Mirror Descent (at least when the Bregman divergence is convex in its second argument), extend Adapt-ML-Prod to scale-free guarantees, and provide several other minor contributions about Prod, AdaHedge, BOA and Soft-Bayes.
Multi-arm bandit (MAB) is a classic online learning framework that studies the sequential decision-making in an uncertain environment. The MAB framework, however, overlooks the scenario where the decision-maker cannot take actions (e.g., pulling arms) directly. It is a practically important scenario in many applications such as spectrum sharing, crowdsensing, and edge computing. In these applications, the decision-maker would incentivize other selfish agents to carry out desired actions (i.e., pulling arms on the decision-maker's behalf). This paper establishes the incentivized online learning (IOL) framework for this scenario. The key challenge to design the IOL framework lies in the tight coupling of the unknown environment learning and asymmetric information revelation. To address this, we construct a special Lagrangian function based on which we propose a socially-optimal mechanism for the IOL framework. Our mechanism satisfies various desirable properties such as agent fairness, incentive compatibility, and voluntary participation. It achieves the same asymptotic performance as the state-of-art benchmark that requires extra information. Our analysis also unveils the power of crowd in the IOL framework: a larger agent crowd enables our mechanism to approach more closely the theoretical upper bound of social performance. Numerical results demonstrate the advantages of our mechanism in large-scale edge computing.
Motivated by many interesting real-world applications in logistics and online advertising, we consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially, sampled i.i.d. from an unknown distribution, and we need to promptly make a decision given limited resources and lower bounds requirements. First, with knowledge of the measure of feasibility, i.e., $\alpha$, we propose a new algorithm that obtains $1-O(\frac{\epsilon}{\alpha-\epsilon})$ -competitive ratio for the offline problems that know the entire requests ahead of time. Inspired by the previous studies, this algorithm adopts an innovative technique to dynamically update a threshold price vector for making decisions. Moreover, an optimization method to estimate the optimal measure of feasibility is proposed with theoretical guarantee at the end of this paper. Based on this method, if we tolerate slight violation of the lower bounds constraints with parameter $\eta$, the proposed algorithm is naturally extended to the settings without strong feasible assumption, which cover the significantly unexplored infeasible scenarios.
Rankings, especially those in search and recommendation systems, often determine how people access information and how information is exposed to people. Therefore, how to balance the relevance and fairness of information exposure is considered as one of the key problems for modern IR systems. As conventional ranking frameworks that myopically sorts documents with their relevance will inevitably introduce unfair result exposure, recent studies on ranking fairness mostly focus on dynamic ranking paradigms where result rankings can be adapted in real-time to support fairness in groups (i.e., races, genders, etc.). Existing studies on fairness in dynamic learning to rank, however, often achieve the overall fairness of document exposure in ranked lists by significantly sacrificing the performance of result relevance and fairness on the top results. To address this problem, we propose a fair and unbiased ranking method named Maximal Marginal Fairness (MMF). The algorithm integrates unbiased estimators for both relevance and merit-based fairness while providing an explicit controller that balances the selection of documents to maximize the marginal relevance and fairness in top-k results. Theoretical and empirical analysis shows that, with small compromises on long list fairness, our method achieves superior efficiency and effectiveness comparing to the state-of-the-art algorithms in both relevance and fairness for top-k rankings.
This paper presents our semantic parsing system for the evaluation task of open domain semantic parsing in NLPCC 2019. Many previous works formulate semantic parsing as a sequence-to-sequence(seq2seq) problem. Instead, we treat the task as a sketch-based problem in a coarse-to-fine(coarse2fine) fashion. The sketch is a high-level structure of the logical form exclusive of low-level details such as entities and predicates. In this way, we are able to optimize each part individually. Specifically, we decompose the process into three stages: the sketch classification determines the high-level structure while the entity labeling and the matching network fill in missing details. Moreover, we adopt the seq2seq method to evaluate logical form candidates from an overall perspective. The co-occurrence relationship between predicates and entities contribute to the reranking as well. Our submitted system achieves the exactly matching accuracy of 82.53% on full test set and 47.83% on hard test subset, which is the 3rd place in NLPCC 2019 Shared Task 2. After optimizations for parameters, network structure and sampling, the accuracy reaches 84.47% on full test set and 63.08% on hard test subset(Our code and data are available at //github.com/zechagl/NLPCC2019-Semantic-Parsing).
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.