With the rapid popularization of big data, the dichotomy between tractable and intractable problems in big data computing has been shifted. Sublinear time, rather than polynomial time, has recently been regarded as the new standard of tractability in big data computing. This change brings the demand for new methodologies in computational complexity theory in the context of big data. Based on the prior work for sublinear-time complexity classes \cite{DBLP:journals/tcs/GaoLML20}, this paper focuses on sublinear-time reductions specialized for problems in big data computing. First, the pseudo-sublinear-time reduction is proposed and the complexity classes \Pproblem and \PsT are proved to be closed under it. To establish \PsT-intractability for certain problems in \Pproblem, we find the first problem in $\Pproblem \setminus \PsT$. Using the pseudo-sublinear-time reduction, we prove that the nearest edge query is in \PsT but the algebraic equation root problem is not. Then, the pseudo-polylog-time reduction is introduced and the complexity class \PsPL is proved to be closed under it. The \PsT-completeness under it is regarded as an evidence that some problems can not be solved in polylogarithmic time after a polynomial-time preprocessing, unless \PsT = \PsPL. We prove that all \PsT-complete problems are also \Pproblem-complete, which gives a further direction for identifying \PsT-complete problems.
In this paper, we aim at unifying, simplifying and improving the convergence rate analysis of Lagrangian-based methods for convex optimization problems. We first introduce the notion of nice primal algorithmic map, which plays a central role in the unification and in the simplification of the analysis of most Lagrangian-based methods. Equipped with a nice primal algorithmic map, we then introduce a versatile generic scheme, which allows for the design and analysis of Faster LAGrangian (FLAG) methods with new provably sublinear rate of convergence expressed in terms of function values and feasibility violation of the original (non-ergodic) generated sequence. To demonstrate the power and versatility of our approach and results, we show that most well-known iconic Lagrangian-based schemes admit a nice primal algorithmic map, and hence share the new faster rate of convergence results within their corresponding FLAG.
Time series (TS) are present in many fields of knowledge, research, and engineering. The processing and analysis of TS are essential in order to extract knowledge from the data and to tackle forecasting or predictive maintenance tasks among others The modeling of TS is a challenging task, requiring high statistical expertise as well as outstanding knowledge about the application of Data Mining(DM) and Machine Learning (ML) methods. The overall work with TS is not limited to the linear application of several techniques, but is composed of an open workflow of methods and tests. These workflow, developed mainly on programming languages, are complicated to execute and run effectively on different systems, including Cloud Computing (CC) environments. The adoption of CC can facilitate the integration and portability of services allowing to adopt solutions towards services Internet Technologies (IT) industrialization. The definition and description of workflow services for TS open up a new set of possibilities regarding the reduction of complexity in the deployment of this type of issues in CC environments. In this sense, we have designed an effective proposal based on semantic modeling (or vocabulary) that provides the full description of workflow for Time Series modeling as a CC service. Our proposal includes a broad spectrum of the most extended operations, accommodating any workflow applied to classification, regression, or clustering problems for Time Series, as well as including evaluation measures, information, tests, or machine learning algorithms among others.
Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
In this work we demonstrate that SVD-based model reduction techniques known for ordinary differential equations, such as the proper orthogonal decomposition, can be extended to stochastic differential equations in order to reduce the computational cost arising from both the high dimension of the considered stochastic system and the large number of independent Monte Carlo runs. We also extend the proper symplectic decomposition method to stochastic Hamiltonian systems, both with and without external forcing, and argue that preserving the underlying symplectic or variational structures results in more accurate and stable solutions that conserve energy better than when the non-geometric approach is used. We validate our proposed techniques with numerical experiments for a semi-discretization of the stochastic nonlinear Schr\"odinger equation and the Kubo oscillator.
Detailed dynamical systems models used in life sciences may include dozens or even hundreds of state variables. Models of large dimension are not only harder from the numerical perspective (e.g., for parameter estimation or simulation), but it is also becoming challenging to derive mechanistic insights from such models. Exact model reduction is a way to address this issue by finding a self-consistent lower-dimensional projection of the corresponding dynamical system. A recent algorithm CLUE allows one to construct an exact linear reduction of the smallest possible dimension such that the fixed variables of interest are preserved. However, CLUE is restricted to systems with polynomial dynamics. Since rational dynamics occurs frequently in the life sciences (e.g., Michaelis-Menten or Hill kinetics), it is desirable to extend CLUE to the models with rational dynamics. In this paper, we present an extension of CLUE to the case of rational dynamics and demonstrate its applicability on examples from literature. Our implementation is available in version 1.5 of CLUE at //github.com/pogudingleb/CLUE.
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.
We consider the model selection problem for a large class of time series models, including, multivariate count processes, causal processes with exogenous covariates. A procedure based on a general penalized contrast is proposed. Some asymptotic results for weak and strong consistency are established. The non consistency issue is addressed, and a class of penalty term, that does not ensure consistency is provided. Examples of continuous valued and multivariate count autoregressive time series are considered.
We present an algorithm for the maximum matching problem in dynamic (insertion-deletions) streams with *asymptotically optimal* space complexity: for any $n$-vertex graph, our algorithm with high probability outputs an $\alpha$-approximate matching in a single pass using $O(n^2/\alpha^3)$ bits of space. A long line of work on the dynamic streaming matching problem has reduced the gap between space upper and lower bounds first to $n^{o(1)}$ factors [Assadi-Khanna-Li-Yaroslavtsev; SODA 2016] and subsequently to $\text{polylog}{(n)}$ factors [Dark-Konrad; CCC 2020]. Our upper bound now matches the Dark-Konrad lower bound up to $O(1)$ factors, thus completing this research direction. Our approach consists of two main steps: we first (provably) identify a family of graphs, similar to the instances used in prior work to establish the lower bounds for this problem, as the only "hard" instances to focus on. These graphs include an induced subgraph which is both sparse and contains a large matching. We then design a dynamic streaming algorithm for this family of graphs which is more efficient than prior work. The key to this efficiency is a novel sketching method, which bypasses the typical loss of $\text{polylog}{(n)}$-factors in space compared to standard $L_0$-sampling primitives, and can be of independent interest in designing optimal algorithms for other streaming problems.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
This work considers the problem of provably optimal reinforcement learning for episodic finite horizon MDPs, i.e. how an agent learns to maximize his/her long term reward in an uncertain environment. The main contribution is in providing a novel algorithm --- Variance-reduced Upper Confidence Q-learning (vUCQ) --- which enjoys a regret bound of $\widetilde{O}(\sqrt{HSAT} + H^5SA)$, where the $T$ is the number of time steps the agent acts in the MDP, $S$ is the number of states, $A$ is the number of actions, and $H$ is the (episodic) horizon time. This is the first regret bound that is both sub-linear in the model size and asymptotically optimal. The algorithm is sub-linear in that the time to achieve $\epsilon$-average regret for any constant $\epsilon$ is $O(SA)$, which is a number of samples that is far less than that required to learn any non-trivial estimate of the transition model (the transition model is specified by $O(S^2A)$ parameters). The importance of sub-linear algorithms is largely the motivation for algorithms such as $Q$-learning and other "model free" approaches. vUCQ algorithm also enjoys minimax optimal regret in the long run, matching the $\Omega(\sqrt{HSAT})$ lower bound. Variance-reduced Upper Confidence Q-learning (vUCQ) is a successive refinement method in which the algorithm reduces the variance in $Q$-value estimates and couples this estimation scheme with an upper confidence based algorithm. Technically, the coupling of both of these techniques is what leads to the algorithm enjoying both the sub-linear regret property and the asymptotically optimal regret.