Cloud Technology is adopted to process video streams because of the great features provided to video stream providers such as the high flexibility of using virtual machines and storage servers at low rates. Video stream providers prepare several formats of the same video to satisfy all users' devices' specifications. Video streams in the cloud are either transcoded or stored. However, storing all formats of videos is still costly. In this research, we develop an approach that optimizes cloud storage. Particularly, we propose a method that decides which video in which cloud storage should be stored to minimize the overall cost of cloud services. The results of the proposed approach are promising, it shows effectiveness when the number of frequently accessed video grow in a repository, and when the views of videos increases. The proposed method decreases the cost of using cloud services by up to 22%.
Due to its communication efficiency and privacy-preserving capability, federated learning (FL) has emerged as a promising framework for machine learning in 5G-and-beyond wireless networks. Of great interest is the design and optimization of new wireless network structures that support the stable and fast operation of FL. Cell-free massive multiple-input multiple-output (CFmMIMO) turns out to be a suitable candidate, which allows each communication round in the iterative FL process to be stably executed within a large-scale coherence time. Aiming to reduce the total execution time of the FL process in CFmMIMO, this paper proposes choosing only a subset of available users to participate in FL. An optimal selection of users with favorable link conditions would minimize the execution time of each communication round, while limiting the total number of communication rounds required. Toward this end, we formulate a joint optimization problem of user selection, transmit power, and processing frequency, subject to a predefined minimum number of participating users to guarantee the quality of learning. We then develop a new algorithm that is proven to converge to the neighbourhood of the stationary points of the formulated problem. Numerical results confirm that our proposed approach significantly reduces the FL total execution time over baseline schemes. The time reduction is more pronounced when the density of access point deployments is moderately low.
Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. The proposed ILD incorporates the differentiable physics simulator as a physics prior into its computational graph for policy learning. It unrolls the dynamics by sampling actions from a parameterized policy, simply minimizing the distance between the expert trajectory and the agent trajectory, and back-propagating the gradient into the policy via temporal physics operators. With the physics prior, ILD policies can not only be transferable to unseen environment specifications but also yield higher final performance on a variety of tasks. In addition, ILD naturally forms a single-loop structure, which significantly improves the stability and training speed. To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization. In our experiments, we show that ILD outperforms state-of-the-art methods in a variety of continuous control tasks with Brax, requiring only one expert demonstration. In addition, ILD can be applied to challenging deformable object manipulation tasks and can be generalized to unseen configurations.
Mobile crowd sensing and computing (MCSC) enables heterogeneous users (workers) to contribute real-time sensed, generated, and pre-processed data from their mobile devices to the MCSC platform, for intelligent service provisioning. This paper investigates a novel hybrid worker recruitment problem where the MCSC platform employs workers to serve MCSC tasks with diverse quality requirements and budget constraints, while considering uncertainties in workers' participation and their local workloads. We propose a hybrid worker recruitment framework consisting of offline and online trading modes. The former enables the platform to overbook long-term workers (services) to cope with dynamic service supply via signing contracts in advance, which is formulated as 0-1 integer linear programming (ILP) with probabilistic constraints related to service quality and budget. Besides, motivated by the existing uncertainties which may render long-term workers fail to meet the service quality requirement of each task, we augment our methodology with an online temporary worker recruitment scheme as a backup Plan B to support seamless service provisioning for MCSC tasks, which also represents a 0-1 ILP problem. To tackle these problems which are proved to be NP-hard, we develop three algorithms, namely, i) exhaustive searching, ii) unique index-based stochastic searching with risk-aware filter constraint, and iii) geometric programming-based successive convex algorithm, which achieve the optimal (with high computational complexity) or sub-optimal (with low complexity) solutions. Experimental results demonstrate the effectiveness of our proposed hybrid worker recruitment mechanism in terms of service quality, time efficiency, etc.
The free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image. However, due to a large number of parameters, it is challenging to fit the free-form deformation model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape. In this paper, we cast the registration task as a multi-objective optimization problem (MOP) according to the fact that regions affected by each control point overlap with each other. Specifically, by partitioning the template image into several regions and measuring the similarity of each region independently, multiple objectives are built and deformation estimation can thus be realized by solving the MOP with off-the-shelf multi-objective evolutionary algorithms (MOEAs). In addition, a coarse-to-fine strategy is realized by image pyramid combined with control point mesh subdivision. Specifically, the optimized candidate solutions of the current image level are inherited by the next level, which increases the ability to deal with large deformation. Also, a post-processing procedure is proposed to generate a single output utilizing the Pareto optimal solutions. Comparative experiments on both synthetic and real-world images show the effectiveness and usefulness of our deformation estimation method.
In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $\ln(m)/\eta$ in all normal-form games with m actions and fixed step-sizes $\eta$. Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $\eta$ is upper bounded by $\frac{1}{(n-1)V}$, where $n$ is the number of agents and $[0,V]$ is the payoff range, then the CMWU updates can be computed linearly fast via a contraction map. This implementation results in an uncoupled online learning dynamic that admits a $o (\log T)$-sparse sub-sequence where each agent experiences at most $O(nV\log m)$ regret. This implies that the CMWU dynamics converge with rate $O(nV \log mW( T) / T)$ to a Coarse Correlated Equilibrium where $W(T)$ is the inverse of the function $g(t):=t\cdot 2^t$. The latter improves on the current state-of-the-art convergence rate of uncoupled online learning dynamics.
Due to the scarcity of video processing methodologies, image processing operations are naively extended to the video domain by processing each frame independently. This disregard for the temporal connection in video processing often leads to severe temporal inconsistencies. State-of-the-art techniques that address these inconsistencies rely on the availability of unprocessed videos to siphon consistent video dynamics to restore the temporal consistency of frame-wise processed videos. We propose a novel general framework for this task that learns to infer consistent motion dynamics from inconsistent videos to mitigate the temporal flicker while preserving the perceptual quality for both the temporally neighboring and relatively distant frames. The proposed framework produces state-of-the-art results on two large-scale datasets, DAVIS and videvo.net, processed by numerous image processing tasks in a feed-forward manner. The code and the trained models will be released upon acceptance.
Landscape-aware algorithm selection approaches have so far mostly been relying on landscape feature extraction as a preprocessing step, independent of the execution of optimization algorithms in the portfolio. This introduces a significant overhead in computational cost for many practical applications, as features are extracted and computed via sampling and evaluating the problem instance at hand, similarly to what the optimization algorithm would perform anyway within its search trajectory. As suggested in Jankovic et al. (EvoAPPs 2021), trajectory-based algorithm selection circumvents the problem of costly feature extraction by computing landscape features from points that a solver sampled and evaluated during the optimization process. Features computed in this manner are used to train algorithm performance regression models, upon which a per-run algorithm selector is then built. In this work, we apply the trajectory-based approach onto a portfolio of five algorithms. We study the quality and accuracy of performance regression and algorithm selection models in the scenario of predicting different algorithm performances after a fixed budget of function evaluations. We rely on landscape features of the problem instance computed using one portion of the aforementioned budget of the same function evaluations. Moreover, we consider the possibility of switching between the solvers once, which requires them to be warm-started, i.e. when we switch, the second solver continues the optimization process already being initialized appropriately by making use of the information collected by the first solver. In this new context, we show promising performance of the trajectory-based per-run algorithm selection with warm-starting.
Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with Video Track Consistency (ODCT) approach compared to prior work, which did not leverage temporal information. In addition we show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.
Image segmentation is an important component of many image understanding systems. It aims to group pixels in a spatially and perceptually coherent manner. Typically, these algorithms have a collection of parameters that control the degree of over-segmentation produced. It still remains a challenge to properly select such parameters for human-like perceptual grouping. In this work, we exploit the diversity of segments produced by different choices of parameters. We scan the segmentation parameter space and generate a collection of image segmentation hypotheses (from highly over-segmented to under-segmented). These are fed into a cost minimization framework that produces the final segmentation by selecting segments that: (1) better describe the natural contours of the image, and (2) are more stable and persistent among all the segmentation hypotheses. We compare our algorithm's performance with state-of-the-art algorithms, showing that we can achieve improved results. We also show that our framework is robust to the choice of segmentation kernel that produces the initial set of hypotheses.