In this paper, we propose the first optimum process scheduling algorithm for an increasingly prevalent type of heterogeneous multicore (HEMC) system that combines high-performance big cores and energy-efficient small cores with the same instruction-set architecture (ISA). Existing algorithms are all heuristics-based, and the well-known IPC-driven approach essentially tries to schedule high scaling factor processes on big cores. Our analysis shows that, for optimum solutions, it is also critical to consider placing long running processes on big cores. Tests of SPEC 2006 cases on various big-small core combinations show that our proposed optimum approach is up to 34% faster than the IPC-driven heuristic approach in terms of total workload completion time. The complexity of our algorithm is O(NlogN) where N is the number of processes. Therefore, the proposed optimum algorithm is practical for use.
Many real-world optimization problems such as engineering design can be eventually modeled as the corresponding multiobjective optimization problems (MOPs) which must be solved to obtain approximate Pareto optimal fronts. Multiobjective evolutionary algorithm based on decomposition (MOEA/D) has been regarded as a very promising approach for solving MOPs. Recent studies have shown that MOEA/D with uniform weight vectors is well-suited to MOPs with regular Pareto optimal fronts, but its performance in terms of diversity deteriorates on MOPs with irregular Pareto optimal fronts such as highly nonlinear and convex. In this way, the solution set obtained by the algorithm can not provide more reasonable choices for decision makers. In order to efficiently overcome this drawback, in this paper, we propose an improved MOEA/D algorithm by virtue of the well-known Pascoletti-Serafini scalarization method and a new strategy of multi-reference points. Specifically, this strategy consists of the setting and adaptation of reference points generated by the techniques of equidistant partition and projection. For performance assessment, the proposed algorithm is compared with existing four state-of-the-art multiobjective evolutionary algorithms on both benchmark test problems with various types of Pareto optimal fronts and two real-world MOPs including the hatch cover design and the rocket injector design in engineering optimization. According to the experimental results, the proposed algorithm exhibits better diversity performance than that of the other compared algorithms.
We propose a kernel-based partial permutation test for checking the equality of functional relationship between response and covariates among different groups. The main idea, which is intuitive and easy to implement, is to keep the projections of the response vector $\boldsymbol{Y}$ on leading principle components of a kernel matrix fixed and permute $\boldsymbol{Y}$'s projections on the remaining principle components. The proposed test allows for different choices of kernels, corresponding to different classes of functions under the null hypothesis. First, using linear or polynomial kernels, our partial permutation tests are exactly valid in finite samples for linear or polynomial regression models with Gaussian noise; similar results straightforwardly extend to kernels with finite feature spaces. Second, by allowing the kernel feature space to diverge with the sample size, the test can be large-sample valid for a wider class of functions. Third, for general kernels with possibly infinite-dimensional feature space, the partial permutation test is exactly valid when the covariates are exactly balanced across all groups, or asymptotically valid when the underlying function follows certain regularized Gaussian processes. We further suggest test statistics using likelihood ratio between two (nested) GPR models, and propose computationally efficient algorithms utilizing the EM algorithm and Newton's method, where the latter also involves Fisher scoring and quadratic programming and is particularly useful when EM suffers from slow convergence. Extensions to correlated and non-Gaussian noises have also been investigated theoretically or numerically. Furthermore, the test can be extended to use multiple kernels together and can thus enjoy properties from each kernel. Both simulation study and application illustrate the properties of the proposed test.
We consider an information update system consisting of $N$ sources sending status packets at random instances according to a Poisson process to a remote monitor through a single server. We assume a heteregeneous server with exponentially distributed service times which is equipped with a waiting room holding the freshest packet from each source referred to as Single Buffer Per-Source Queueing (SBPSQ). The sources are assumed to be equally important, i.e., non-weighted average AoI is used as the information freshness metric, and subsequently two symmetric scheduling policies are studied in this paper, namely First Source First Serve (FSFS) and the Earliest Served First Serve (ESFS) policies, the latter policy being proposed the first time in the current paper to the best of our knowledge. By employing the theory of Markov Fluid Queues (MFQ), an analytical model is proposed to obtain the exact distribution of the Age of Information (AoI) for each source when the FSFS and ESFS policies are employed at the server. Subsequently, a benchmark scheduling-free scheme named as Single Buffer with Replacement (SBR) that uses a single one-packet buffer shared by all sources is also studied with a similar but less complex analytical model. We comparatively study the performance of the three schemes through numerical examples and show that the proposed ESFS policy outperforms the other two schemes in terms of the average AoI and the age violation probability averaged across all sources, in a scenario of sources possessing different traffic intensities but sharing a common service time.
Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.
We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage, compression, and query processing techniques based on these desiderata. In addition to showing direct integration of existing techniques from columnar RDBMSs, we also propose novel ones that are optimized for GDBMSs. These include a novel list-based query processor, which avoids expensive data copies of traditional block-based processors under many-to-many joins, a new data structure we call single-indexed edge property pages and an accompanying edge ID scheme, and a new application of Jacobson's bit vector index for compressing NULL values and empty lists. We integrated our techniques into the GraphflowDB in-memory GDBMS. Through extensive experiments, we demonstrate the scalability and query performance benefits of our techniques.
Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modelled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel planning method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions.
Recent work has proposed stochastic Plackett-Luce (PL) ranking models as a robust choice for optimizing relevance and fairness metrics. Unlike their deterministic counterparts that require heuristic optimization algorithms, PL models are fully differentiable. Theoretically, they can be used to optimize ranking metrics via stochastic gradient descent. However, in practice, the computation of the gradient is infeasible because it requires one to iterate over all possible permutations of items. Consequently, actual applications rely on approximating the gradient via sampling techniques. In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Our experimental analysis shows that PL-Rank has a greater sample-efficiency and is computationally less costly than existing policy gradients, resulting in faster convergence at higher performance. PL-Rank further enables the industry to apply PL models for more relevant and fairer real-world ranking systems.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.
Model-based methods for recommender systems have been studied extensively in recent years. In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult. To overcome the calculation barriers, models such as matrix factorization resort to inner product form (i.e., model user-item preference as the inner product of user, item latent factors) and indexes to facilitate efficient approximate k-nearest neighbor searches. However, it still remains challenging to incorporate more expressive interaction forms between user and item features, e.g., interactions through deep neural networks, because of the calculation cost. In this paper, we focus on the problem of introducing arbitrary advanced models to recommender systems with large corpus. We propose a novel tree-based method which can provide logarithmic complexity w.r.t. corpus size even with more expressive models such as deep neural networks. Our main idea is to predict user interests from coarse to fine by traversing tree nodes in a top-down fashion and making decisions for each user-node pair. We also show that the tree structure can be jointly learnt towards better compatibility with users' interest distribution and hence facilitate both training and prediction. Experimental evaluations with two large-scale real-world datasets show that the proposed method significantly outperforms traditional methods. Online A/B test results in Taobao display advertising platform also demonstrate the effectiveness of the proposed method in production environments.
Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.