亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of efficiently computing on encoded data. More specifically, we study the question of low-bandwidth computation of functions $F:\mathbb{F}^k \to \mathbb{F}$ of some data $x \in \mathbb{F}^k$, given access to an encoding $c \in \mathbb{F}^n$ of $x$ under an error correcting code. In our model -- relevant in distributed storage, distributed computation and secret sharing -- each symbol of $c$ is held by a different party, and we aim to minimize the total amount of information downloaded from each party in order to compute $F(x)$. Special cases of this problem have arisen in several domains, and we believe that it is fruitful to study this problem in generality. Our main result is a low-bandwidth scheme to compute linear functions for Reed-Solomon codes, even in the presence of erasures. More precisely, let $\epsilon > 0$ and let $\mathcal{C}: \mathbb{F}^k \to \mathbb{F}^n$ be a full-length Reed-Solomon code of rate $1 - \epsilon$ over a field $\mathbb{F}$ with constant characteristic. For any $\gamma \in [0, \epsilon)$, our scheme can compute any linear function $F(x)$ given access to any $(1 - \gamma)$-fraction of the symbols of $\mathcal{C}(x)$, with download bandwidth $O(n/(\epsilon - \gamma))$ bits. In contrast, the naive scheme that involves reconstructing the data $x$ and then computing $F(x)$ uses $\Theta(n \log n)$ bits. Our scheme has applications in distributed storage, coded computation, and homomorphic secret sharing.

相關內容

Synthetic polymer-based storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits $\{0,1\}$, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the stored data requires the use of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the \emph{composition}, i.e. the number of $1$s and $0$s in the concerned substring can be inferred. Prior works have dealt with the problem of unique string reconstruction from the set of all possible compositions, called \emph{composition multiset}. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work builds on this research by generalizing previously considered error models, mainly confined to substitution of compositions. To this end, we define new error models that consider insertions of spurious compositions and deletions of existing ones, thereby corrupting the composition multiset. We analyze if the reconstruction codebook proposed by Pattabiraman \emph{et al.} is indeed robust to such errors, and if not, propose new coding constraints to remedy this.

We establish several Schur-convexity type results under fixed variance for weighted sums of independent gamma random variables and obtain nonasymptotic bounds on their R\'enyi entropies. In particular, this pertains to the recent results by Bartczak-Nayar-Zwara as well as Bobkov-Naumov-Ulyanov, offering simple proofs of the former and extending the latter.

In many real-life situations, we are often faced with combinatorial problems under linear cost functions. In this paper, we propose a fast method for exactly enumerating a very large number of all lower cost solutions for various combinatorial problems. Our method is based on backtracking for a given decision diagram which represents the feasible solutions of a problem. The main idea is to memoize the intervals of cost bounds to avoid duplicate search in the backtracking process. Although existing state-of-the-art methods with dynamic programming requires a pseudo-polynomial time with the total cost values, it may take an exponential time when the cost values become large. In contrast, the computation time of the proposed method does not directly depend on the total cost values, but is bounded by the input and output size of the decision diagrams. Therefore, it can be much faster than the existing methods if the cost values are large but the output of the decision diagrams are well-compressed. Our experimental results show that, for some practical instances of the Hamiltonian path problem, we succeeded in exactly enumerating billions of all lower cost solutions in a few seconds, which is hundred or more times faster than existing methods. Our method can be regarded as a novel search algorithm which integrates the two classical techniques, branch-and-bound and dynamic programming. This method would have many applications in various fields, including operations research, data mining, statistical testing, hardware/software system design, etc.

$k$-truss model is a typical cohesive subgraph model and has been received considerable attention recently. However, the $k$-truss model only considers the direct common neighbors of an edge, which restricts its ability to reveal fine-grained structure information of the graph. Motivated by this, in this paper, we propose a new model named $(k, \tau)$-truss that considers the higher-order neighborhood ($\tau$ hop) information of an edge. Based on the $(k, \tau)$-truss model, we study the higher-order truss decomposition problem which computes the $(k, \tau)$-trusses for all possible $k$ values regarding a given $\tau$. Higher-order truss decomposition can be used in the applications such as community detection and search, hierarchical structure analysis, and graph visualization. To address this problem, we first propose a bottom-up decomposition paradigm in the increasing order of $k$ values to compute the corresponding $(k, \tau)$-truss. Based on the bottom-up decomposition paradigm, we further devise three optimization strategies to reduce the unnecessary computation. We evaluate our proposed algorithms on real datasets and synthetic datasets, the experimental results demonstrate the efficiency, effectiveness and scalability of our proposed algorithms.

Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.

We study the problem of {\sl certification}: given queries to a function $f : \{0,1\}^n \to \{0,1\}$ with certificate complexity $\le k$ and an input $x^\star$, output a size-$k$ certificate for $f$'s value on $x^\star$. This abstractly models a central problem in explainable machine learning, where we think of $f$ as a blackbox model that we seek to explain the predictions of. For monotone functions, a classic local search algorithm of Angluin accomplishes this task with $n$ queries, which we show is optimal for local search algorithms. Our main result is a new algorithm for certifying monotone functions with $O(k^8 \log n)$ queries, which comes close to matching the information-theoretic lower bound of $\Omega(k \log n)$. The design and analysis of our algorithm are based on a new connection to threshold phenomena in monotone functions. We further prove exponential-in-$k$ lower bounds when $f$ is non-monotone, and when $f$ is monotone but the algorithm is only given random examples of $f$. These lower bounds show that assumptions on the structure of $f$ and query access to it are both necessary for the polynomial dependence on $k$ that we achieve.

A \emph{general branch-and-bound tree} is a branch-and-bound tree which is allowed to use general disjunctions of the form $\pi^{\top} x \leq \pi_0 \,\vee\, \pi^{\top}x \geq \pi_0 + 1$, where $\pi$ is an integer vector and $\pi_0$ is an integer scalar, to create child nodes. We construct a packing instance, a set covering instance, and a Traveling Salesman Problem instance, such that any general branch-and-bound tree that solves these instances must be of exponential size. We also verify that an exponential lower bound on the size of general branch-and-bound trees persists when we add Gaussian noise to the coefficients of the cross polytope, thus showing that polynomial-size "smoothed analysis" upper bound is not possible. The results in this paper can be viewed as the branch-and-bound analog of the seminal paper by Chv\'atal et al. \cite{chvatal1989cutting}, who proved lower bounds for the Chv\'atal-Gomory rank.

We present a fast direct solver for boundary integral equations on complex surfaces in three dimensions, using an extension of the recently introduced strong recursive skeletonization scheme. For problems that are not highly oscillatory, our algorithm computes an ${LU}$-like hierarchical factorization of the dense system matrix, permitting application of the inverse in $O(N)$ time, where $N$ is the number of unknowns on the surface. The factorization itself also scales linearly with the system size, albeit with a somewhat larger constant. The scheme is built on a level-restricted, adaptive octree data structure and therefore it is compatible with highly nonuniform discretizations. Furthermore, the scheme is coupled with high-order accurate locally-corrected Nystr\"om quadrature methods to integrate the singular and weakly-singular Green's functions used in the integral representations. Our method has immediate application to a variety of problems in computational physics. We concentrate here on studying its performance in acoustic scattering (governed by the Helmholtz equation) at low to moderate frequencies.

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司