We consider the point-to-point lossy coding for computing and channel coding problems with two-sided information. We first unify these problems by considering a new generalized problem. Then we develop graph-based characterizations and derive interesting reductions through explicit graph operations, which reduce the number of decision variables. After that, we design alternating optimization algorithms for the unified problems, so that numerical computations for both the source and channel problems are covered. With the help of extra root-finding techniques, proper multiplier update strategies are developed. Thus our algorithms can compute the problems for a given distortion or cost constraint and the convergence can be proved. Also, extra heuristic deflation techniques are introduced which largely reduce the computational time. Numerical results show the accuracy and efficiency of our algorithms.
We study the {PAC} learnability of multiwinner voting, focusing on the class of approval-based committee scoring (ABCS) rules. These are voting rules applied on profiles with approval ballots, where each voter approves some of the candidates. According to ABCS rules, each committee of $k$ candidates collects from each voter a score, which depends on the size of the voter's ballot and on the size of its intersection with the committee. Then, committees of maximum score are the winning ones. Our goal is to learn a target rule (i.e., to learn the corresponding scoring function) using information about the winning committees of a small number of sampled profiles. Despite the existence of exponentially many outcomes compared to single-winner elections, we show that the sample complexity is still low: a polynomial number of samples carries enough information for learning the target rule with high confidence and accuracy. Unfortunately, even simple tasks that need to be solved for learning from these samples are intractable. We prove that deciding whether there exists some ABCS rule that makes a given committee winning in a given profile is a computationally hard problem. Our results extend to the class of sequential Thiele rules, which have received attention recently due to their simplicity.
The emerging theory of graph limits exhibits an analytic perspective on graphs, showing that many important concepts and tools in graph theory and its applications can be described more naturally (and sometimes proved more easily) in analytic language. We extend the theory of graph limits to the ordered setting, presenting a limit object for dense vertex-ordered graphs, which we call an orderon. As a special case, this yields limit objects for matrices whose rows and columns are ordered, and for dynamic graphs that expand (via vertex insertions) over time. Along the way, we devise an ordered locality-preserving variant of the cut distance between ordered graphs, showing that two graphs are close with respect to this distance if and only if they are similar in terms of their ordered subgraph frequencies. We show that the space of orderons is compact with respect to this distance notion, which is key to a successful analysis of combinatorial objects through their limits. We derive several applications of the ordered limit theory in extremal combinatorics, sampling, and property testing in ordered graphs. In particular, we prove a new ordered analogue of the well-known result by Alon and Stav [RS\&A'08] on the furthest graph from a hereditary property; this is the first known result of this type in the ordered setting. Unlike the unordered regime, here the random graph model $G(n, p)$ with an ordering over the vertices is not always asymptotically the furthest from the property for some $p$. However, using our ordered limit theory, we show that random graphs generated by a stochastic block model, where the blocks are consecutive in the vertex ordering, are (approximately) the furthest. Additionally, we describe an alternative analytic proof of the ordered graph removal lemma [Alon et al., FOCS'17].
Whereas Laplacian and modularity based spectral clustering is apt to dense graphs, recent results show that for sparse ones, the non-backtracking spectrum is the best candidate to find assortative clusters of nodes. Here belief propagation in the sparse stochastic block model is derived with arbitrary given model parameters that results in a non-linear system of equations; with linear approximation, the spectrum of the non-backtracking matrix is able to specify the number $k$ of clusters. Then the model parameters themselves can be estimated by the EM algorithm. Bond percolation in the assortative model is considered in the following two senses: the within- and between-cluster edge probabilities decrease with the number of nodes and edges coming into existence in this way are retained with probability $\beta$. As a consequence, the optimal $k$ is the number of the structural real eigenvalues (greater than $\sqrt{c}$, where $c$ is the average degree) of the non-backtracking matrix of the graph. Assuming, these eigenvalues $\mu_1 >\dots > \mu_k$ are distinct, the multiple phase transitions obtained for $\beta$ are $\beta_i =\frac{c}{\mu_i^2}$; further, at $\beta_i$ the number of detectable clusters is $i$, for $i=1,\dots ,k$. Inflation-deflation techniques are also discussed to classify the nodes themselves, which can be the base of the sparse spectral clustering.
Video stabilization refers to the problem of transforming a shaky video into a visually pleasing one. The question of how to strike a good trade-off between visual quality and computational speed has remained one of the open challenges in video stabilization. Inspired by the analogy between wobbly frames and jigsaw puzzles, we propose an iterative optimization-based learning approach using synthetic datasets for video stabilization, which consists of two interacting submodules: motion trajectory smoothing and full-frame outpainting. First, we develop a two-level (coarse-to-fine) stabilizing algorithm based on the probabilistic flow field. The confidence map associated with the estimated optical flow is exploited to guide the search for shared regions through backpropagation. Second, we take a divide-and-conquer approach and propose a novel multiframe fusion strategy to render full-frame stabilized views. An important new insight brought about by our iterative optimization approach is that the target video can be interpreted as the fixed point of nonlinear mapping for video stabilization. We formulate video stabilization as a problem of minimizing the amount of jerkiness in motion trajectories, which guarantees convergence with the help of fixed-point theory. Extensive experimental results are reported to demonstrate the superiority of the proposed approach in terms of computational speed and visual quality. The code will be available on GitHub.
Densest Subgraph Problem (DSP) is an important primitive problem with a wide range of applications, including fraud detection, community detection and DNA motif discovery. Edge-based density is one of the most common metrics in DSP. Although a maximum flow algorithm can exactly solve it in polynomial time, the increasing amount of data and the high complexity of algorithms motivate scientists to find approximation algorithms. Among these, its duality of linear programming derives several iterative algorithms including Greedy++, Frank-Wolfe and FISTA which redistribute edge weights to find the densest subgraph, however, these iterative algorithms vibrate around the optimal solution, which are not satisfactory for fast convergence. We propose our main algorithm Locally Optimal Weight Distribution (LOWD) to distribute the remaining edge weights in a locally optimal operation to converge to the optimal solution monotonically. Theoretically, we show that it will reach the optimal state of a specific linear programming which is called locally-dense decomposition. Besides, we show that it is not necessary to consider most of the edges in the original graph. Therefore, we develop a pruning algorithm using a modified Counting Sort to prune graphs by removing unnecessary edges and nodes, and then we can search the densest subgraph in a much smaller graph.
In a decentralized machine learning system, data is typically partitioned among multiple devices or nodes, each of which trains a local model using its own data. These local models are then shared and combined to create a global model that can make accurate predictions on new data. In this paper, we start exploring the role of the network topology connecting nodes on the performance of a Machine Learning model trained through direct collaboration between nodes. We investigate how different types of topologies impact the "spreading of knowledge", i.e., the ability of nodes to incorporate in their local model the knowledge derived by learning patterns in data available in other nodes across the networks. Specifically, we highlight the different roles in this process of more or less connected nodes (hubs and leaves), as well as that of macroscopic network properties (primarily, degree distribution and modularity). Among others, we show that, while it is known that even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. More intuitively, we also find that hubs have a more significant role than leaves in spreading knowledge, although this manifests itself not only for heavy-tailed distributions but also when "hubs" have only moderately more connections than leaves. Finally, we show that tightly knit communities severely hinder knowledge spread.
We study a portioning setting in which a public resource such as time or money is to be divided among a given set of candidates, and each agent proposes a division of the resource. We consider two families of aggregation rules for this setting - those based on coordinate-wise aggregation and those that optimize some notion of welfare - as well as the recently proposed Independent Markets mechanism. We provide a detailed analysis of these rules from an axiomatic perspective, both for classic axioms, such as strategyproofness and Pareto optimality, and for novel axioms, which aim to capture proportionality in this setting. Our results indicate that a simple rule that computes the average of all proposals satisfies many of our axioms, including some that are violated by more sophisticated rules.
Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neural ODE models that have piecewise affine dynamics, and which can be equivalently interpreted as linear hybrid automata. In this work, we observe that the utility of an abstraction depends on its use: some scenarios might require coarse abstractions that are easier to analyse, whereas others might require more complex, refined abstractions. We therefore consider neural abstractions of alternative shapes, namely either piecewise constant or nonlinear non-polynomial (specifically, obtained via sigmoidal activations). We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics. Empirically, we demonstrate the trade-off that these different neural abstraction templates have vis-a-vis their precision and synthesis time, as well as the time required for their safety verification (done via reachability computation). We improve existing synthesis techniques to enable abstraction of higher-dimensional models, and additionally discuss the abstraction of complex neural ODEs to improve the efficiency of reachability analysis for these models.
We develop synthetic notions of oracle computability and Turing reducibility in the Calculus of Inductive Constructions (CIC), the constructive type theory underlying the Coq proof assistant. As usual in synthetic approaches, we employ a definition of oracle computations based on meta-level functions rather than object-level models of computation, relying on the fact that in constructive systems such as CIC all definable functions are computable by construction. Such an approach lends itself well to machine-checked proofs, which we carry out in Coq. There is a tension in finding a good synthetic rendering of the higher-order notion of oracle computability. On the one hand, it has to be informative enough to prove central results, ensuring that all notions are faithfully captured. On the other hand, it has to be restricted enough to benefit from axioms for synthetic computability, which usually concern first-order objects. Drawing inspiration from a definition by Andrej Bauer based on continuous functions in the effective topos, we use a notion of sequential continuity to characterise valid oracle computations. As main technical results, we show that Turing reducibility forms an upper semilattice, transports decidability, and is strictly more expressive than truth-table reducibility, and prove that whenever both a predicate $p$ and its complement are semi-decidable relative to an oracle $q$, then $p$ Turing-reduces to $q$.
The crossed random-effects model is widely used in applied statistics, finding applications in various fields such as longitudinal studies, e-commerce, and recommender systems, among others. However, these models encounter scalability challenges, as the computational time grows disproportionately with the number of data points, typically following a cubic root relationship $(N^{(3/2)}$ or worse) with $N$. Our inspiration for addressing this issue comes from observing the recommender system employed by an online clothing retailer. Our dataset comprises over 700,000 clients, 5,000 items, and 5,000,000 measurements. When applying the maximum likelihood approach to fit crossed random effects, computational inefficiency becomes a significant concern, limiting the applicability of this approach in large-scale settings. To tackle the scalability issues, previous research by Ghosh et al. (2022a) and Ghosh et al. (2022b) has explored linear and logistic regression models utilizing fixed-effect features based on client and item variables, while incorporating random intercept terms for clients and items. In this study, we present a more generalized version of the problem, allowing random effect sizes/slopes. This extension enables us to capture the variability in effect size among both clients and items. Importantly, we have developed a scalable solution to address the aforementioned problem and have empirically demonstrated the consistency of our estimates. Specifically, as the number of data points increases, our estimates converge towards the true parameters. To validate our approach, we implement the proposed algorithm using Stitch Fix data.