首先先感謝看本文的人,文章可能有點長。
然後我是python 超新手,某些詞彙表達不是很精確..造成困擾的話,先說聲抱歉。
基本上問題就是:
OverflowError: cannot serialize a bytes object larger than 4 GiB
*************來自github作者,聲明發生這個問題的原因*****************
Hi, this is a common problem and stems from some of the patents
having a crazily large amount of text in them.
Reduce the size of the sample on which you're running inference.
E.g., instead of 20% (0.2), reduce it to 0.05 to start with and
try ratcheting it up slowly.
*********結論:patent檔案太大了
參考
https://github.com/google/patents-public-data/issues/16
*****請問要怎麼切檔案?
他把所有的檔案,存進一個叫td的東西(在python 上面打 td,他只會出現
<train_data.LandscapeTrainingDataUtil at 0x1369595c0>
完全沒有想法要怎麼切,也不知道他長怎樣....