Here is an interesting problem given by Muthukrishnan during his
talk in the New Horizons workshop.
Start with an array A of n+1 entries each consisting of an integer
between 1 and n. By the pigeonhole principle there must be some i≠j
and a w such that A(i)=A(j)=w. The goal is to find w. Depending on A
there may be several such w, we want to find any one of them. The
catch is that you only get to use O(log n) bits of memory.
First a warm-up puzzle: Find w using only O(n) queries to entries of
A (remember you only get O(log n) space). Hint: Use pointer chasing.
Now suppose A is streamed, that is we get in order
A(1),A(2),…,A(n+1) and then get another pass etc. How many
passes do you need to find a w?
You can find w in n+1 passes just by trying each possible value for
w. With a little work you can use O(log n) passes doing a binary
search.
Muthukrishnan asks whether the number of passes needed is
Ω(log n) or O(1) or something in between.
"First a warm-up puzzle: Find w using only O(n) queries to entries of A (remember you only get O(log n) space). Hint: Use pointer chasing." Ans: create a linked list kind of strucutre where there are n+1 nodes, a) value at node i, v[i]=A[i] b) pointer from node i to node v[i]. check for loops in the linked list by using pointer chasing, starting from node n+1.
As an aside, note that if exactly one element is duplicated it can be done in one pass using O(log n) space: simply add all the numbers, substract n(n+1)/2 and that is the duplicated number.
you can sum all numbers since sum 1..n = n(n-1)/2 you need around 2 log n = O(log n) bits there the correct answer is sum - n(n-1)/2 Definitly O (log n) bits is enough
in case of stream one pass is enough just sum numbers again :)
just a joke
it would be interestig to solve a problem in general case assume k numbers out of n are duplicated So by suming you get sum a_1 + \cdot a_k = sum - n(n-1)/2
it may happen that you need small amount of memory/computations to find out which exactly numbers are duplivated if you know sum
Yes, but Lance also asked: "Now suppose A is streamed, that is we get in order A(1),A(2),�,A(n+1) and then get another pass etc. How many passes do you need to find a w?"
The binary search method solved this variant of the problem, unless I'm mistaken.
binsearch(a , b) begin -> choose mid = (a + b)/2 -> count no. of elements <, = and > mid, say n1, n2 and n3 -> if n2 >= 2 end -> else if n1 >= n3 binsearch(a , mid -1) -> else binsearch(mid + 1, b) end
Now, just call binsearch(1 , n).
We need to maintain the values of n1, n2 and n3 in each pass and the values of a and b all the time. All of these require O(log n) space.
The following is a partial answer to the problem above. [ I was attending the New Horizons workshop in Kyoto, Japan last week; after Muthu's talk I was discussing the following with Muthu. Jun Tarui (tarui+at-mark+ice.uec.ac.jp)]
Finding duplicates in the stream setting above requires Omega(log n/ loglogn) passes.
Proof Sketch:
Assume n is odd. Consider a restricted class of inputs such that A(1), A(2), ..., A((n+1)/2) are distinct and A((n+1)/2 + 1), ..., A(n+1) are distinct. A solution for this class of inputs with s bits of memory and r stream-passes implies a two-party protocol for the following communication game with r rounds and s communication bits in each round.
Alice and Bob get size-(n+1)/2 subsets A and B of {1,...,n} respectively and the task is to find and agree on some w that is in the intersection of A and B.
This task is equivalent to the monotone Karchmer-Wigderson game for the Majority funtion of n variables (Alice and Bob get a max term and a min term). A protocol with r rounds and s bits per round for this game correnponds to a monotone (i.e. AND/OR) circuit computing the Majority function with depth at most r and fan-in at most 2^s. Hastad's size lower bound for such circuits imply the claimed bound on the number of passes above for s=O(log n).
When n is even, consider the complexity of any function that outputs 1 if the number of 1's is more than n/2 and outputs 0 if it is less than n/2. (end-of-proof-sketch)