[問題] 網頁原始碼抓資料問題 iostream PTT批踢踢實業坊

[問題] 網頁原始碼抓資料問題

作者: iostream (徹底的覺醒) 2015-05-05 21:30:28

小弟想要抓取網頁某個值...
但如果從網頁原始碼抓會有很多相同的TAG
請問我如何抓到第N個TAG的值呢??
例如:
<td align="center" bgcolor="#FFFfff" nowrap>100</td>
<td align="center" bgcolor="#FFFfff" nowrap>200</td>
<td align="center" bgcolor="#FFFfff" nowrap>300</td>
<td align="center" bgcolor="#FFFfff" nowrap>400</td>
我用search 抓都只能抓到第一個值"100":
number = re.compile( r'nowrap>(.+)</td>.*', re.I | re.U | re.M)
content = opener.open('http://www.xxx.com.tw').read()
value = number.search( content ).groups()[ 0 ]
請問我要修改那裡??或者有其他較好用的function嗎??
謝謝

作者: dritchie (卍~邁斯納效應~卍) 2015-05-05 23:44:00

re.findall

作者: phate334 (阿賢) 2015-05-06 14:26:00

可以看看beautifulsoup

作者: ug945 (ug945) 2015-05-06 14:28:00

lxml

繼續閱讀

[問題] Django select eventgbllggi [問題] readline問題mastoid [徵才] Python 工程師 (新創公司Diuit)zxcvbnye [問題] pandas合併問題allen511081 [心得] Windows 輸出至螢幕時避免編碼錯誤danny0838 [問題] html pre tag parsingsuhang [問題] python3 set value with dictqas612820704 Re: [問題] python multiProcess效能很差？LiloHuang [問題] python該拿什麼練功?kyuudonut [問題] 在python程式裏面呼叫 rabbitvcsWYchuang