※ 引述《stimim (qqaa)》之銘言:
: 剛剛和 Leon 討論了一下,發現我們對題目的了解有一點不同
: 原題目:
: Given a document and a query of K words, how do u find the smallest window
: that covers all the words at least once in that document? (given you know the
: inverted lists of all K words, that is, for each word, you have a list of all
: its occurrrences). This one is really hard. Could someone propose an
: algorithm in O(n)?
: 我們可以確定:
: 1. document is given
: 2. 知道我們關心的是哪 K 個字
: 3. 這 K 個字的 occurrence list 也是給定的
: 問題是,n 指的是什麼?
: 1. n = document size (in number of words)
: 在這種情況下, O(n) 是可能的,方法基本上就是 Leon 提出來的。
恕刪.
我覺得那個 N 的定義比較有可能是 document length..
不過下面那個定義, 我想我的方法應該也可以啦.
Here is the modification.
: 2. n = 這 K 個字出現的總次數
: Ex:
: K = 3, 要找出包含 {a, b, c} 的最小 window
: occurrences:
: a = { 1, 1000, 2500}
: b = {400, 1001, 2000}
: c = {500, 1002, 1500}
: document size = 2500 甚至更高 (其他的位置是其他的字)
: n = 3 + 3 + 3 = 9 而不是 2500
: 在這種情況下,已知可以做到 O(n logK) (RockLee 寫的方法)
: 不過,在這種狀況下,給定原本的 document 似乎就沒什麼用了
: 而且在讀入 input 時,
: 讀入原本的 document 會花掉 O(document size) 的時間
: 所以在和 Leon 討論時,他對這種解釋方法有一點懷疑
So, when you build the occurance list from document,
actually you will have scaned this array first..
{1, 400, 500, 1000, .... 2500} which represents {a, b, c, a..}
We need to keep this too. Call it "Show_list"
Then, use my algorithm, the left boundary are choosen sequentally
from "Show_list".
The first step will be, choosing [1] as left,
and use max-min{b, c} as right.
The second step is, drop [1] and use "Show_list(2)" as a new
left boudary.
Now, you drop at most one element in the window.
The need O(1) to find the right boundary and get back this element.
Hope it helps.