Gap-Hamming problem

From KYNNpedia
Revision as of 01:36, 1 February 2023 by imported>Giftlite (→‎History: +early 2000's)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In communication complexity, the gap-Hamming problem asks, if Alice and Bob are each given a (potentially different) string, what is the minimal number of bits that they need to exchange in order for Alice to approximately compute the Hamming distance between their strings. The solution to the problem roughly states that, if Alice and Bob are each given a string, then any communication protocol used to compute the Hamming distance between their strings does (asymptotically) no better than Bob sending his whole string to Alice. More specifically, if Alice and Bob are each given <math>n</math>-bit strings, there exists no communication protocol that lets Alice compute the hamming distance between their strings to within <math>\pm\sqrt{n}</math> using less than <math>\Omega(n)</math> bits.

The gap-Hamming problem has applications to proving lower bounds for many streaming algorithms, including moment frequency estimation<ref name=":0">Indyk, Piotr; Woodruff, David (2005). "Optimal approximations of the frequency moments of data streams". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing - STOC '05. p. 202. doi:10.1145/1060590.1060621. ISBN 9781581139600. S2CID 7911758.</ref> and entropy estimation.<ref>Chakrabarti, Amit; Cormode, Graham; Mcgregor, Andrew (2010). "A Near-optimal Algorithm for Estimating the Entropy of a Stream". ACM Transactions on Algorithms. 6 (3): 1–21. CiteSeerX 10.1.1.190.5419. doi:10.1145/1798596.1798604. ISSN 1549-6325. S2CID 6733816.</ref>

Formal statement

In this problem, Alice and Bob each receive a string, <math>x \in \{\pm 1\}^n</math> and <math>y \in \{\pm 1\}^n</math>, respectively, while Alice is required to compute the (partial) function,

<math>\operatorname{GHD}(x, y) = \begin{cases}

+1 & D_H(x, y) \ge \frac{n}{2} + \sqrt{n}\\ -1 & D_H(x, y) \le \frac{n}{2} - \sqrt{n}\\

  • & \text{otherwise},

\end{cases}</math>

using the least amount of communication possible. Here, <math>*</math> indicates that Alice can return either of <math>\pm 1</math> and <math>D_H(x, y)</math> is the Hamming distance between <math>x</math> and <math>y</math>. In other words, Alice needs to return whether Bob's string is significantly similar or significantly different from hers while minimizing the number of bits she exchanges with Bob.

The problem's solution states that computing <math>\operatorname{GHD}</math> requires at least <math>\Omega(n)</math> communication. In particular, it requires <math>\Omega(n)</math> communication even when <math>x</math> and <math>y</math> are chosen uniformly at random from <math>\{\pm 1\}^n</math>.

History

The gap-Hamming problem was originally proposed by Indyk and Woodruff in the early 2000's, who initially proved a linear lower bound on the one-way communication complexity of the problem (where Alice is only allowed to receive data from Bob) and conjectured a linear lower bound in the general case.<ref>Indyk, P.; Woodruff, D. (2003). "Tight lower bounds for the distinct elements problem". 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. pp. 283–288. doi:10.1109/SFCS.2003.1238202. ISBN 9780769520407. S2CID 7648045.</ref> The question of the infinite-round case (in which Alice and Bob are allowed to exchange as many messages as desired) remained open until Chakrabarti and Regev proved, via an anti-concentration argument, that the general problem also has linear lower bound complexity, thus settling the original question completely.<ref>Chakrabarti, Amit; Regev, Oded (2011). "An optimal lower bound on the communication complexity of gap-hamming-distance". Proceedings of the 43rd annual ACM symposium on Theory of computing - STOC '11. p. 51. arXiv:1009.3460. doi:10.1145/1993636.1993644. ISBN 9781450306911. S2CID 10274326.</ref> This result was followed by a series of other papers that sought to simplify or find new approaches to proving the desired lower bound, notably first by Vidick<ref>Vidick, Thomas (2012). "A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gap-Hamming-Distance problem". Chicago Journal of Theoretical Computer Science. 18: 1–12. doi:10.4086/cjtcs.2012.001.</ref> and later by Sherstov,<ref>Sherstov, Alexander A. (2012-05-17). "The Communication Complexity of Gap Hamming Distance". Theory of Computing. 8 (1): 197–208. doi:10.4086/toc.2012.v008a008. ISSN 1557-2862.</ref> and, recently, with an information-theoretic approach by Hadar, Liu, Polyanskiy, and Shayevitz.<ref>Shayevitz, Ofer; Polyanskiy, Yury; Liu, Jingbo; Hadar, Uri (2019-01-25). "Communication Complexity of Estimating Correlations". arXiv:1901.09100v2 [cs.IT].</ref>

References

<references group="" responsive="1"></references>