[問題] lxml遇到<br /> 該如何處理? girl5566 PTT批踢踢實業坊

[問題] lxml遇到<br /> 該如何處理?

作者: girl5566 (5566520) 2016-03-14 23:06:54

大家好最近想試著撰寫網頁爬蟲
想抓取網頁的這部分資訊

嘗試的結果為
# -*- coding: utf-8 -*-
from urllib2 import urlopen
import xml.etree.ElementTree as ET
from lxml import etree
import mechanize
import sys
url = "http://www.tham.com.tw/recipe6.php"
path = "//*[@id=\"left-inner\"]/div[2]/div[3]"
html = urlopen(url).read()
tree = etree.HTML(html)
startindex = 4
data = tree.xpath(path)
print data[0].text
Output:
>>> ================================ RESTART ================================
>>>
材料 2人份
>>>
看網頁的原始碼猜測是因為<br />阻擋了判斷的緣故
請問這個有解嗎??

作者: ckc1ark (偽物) 2016-03-15 00:37:00

//*[@id=\"left-inner\"]/div[2]/div[3]//text() 試試

作者: girl5566 (5566520) 2016-03-15 19:43:00

感謝已解決

作者: aweimeow (喵喵喵喵ヽ( ・∀・)ノ) 2016-03-16 20:18:00

path = "//*[@itemprop=\"name\"]"print title[0].text你的 XPATH 抓錯了

繼續閱讀

[問題] 爬蟲 jsp網頁亂碼aaa7513231 [問題] matplotlib在數據下積分sam122094 Re: [問題] 模組路徑Neisseria [問題] 模組路徑lihsianglin [問題] 排序多個 dict 的集合zha0 Re: [問題] 有關於寫檔及取代問題justfor0223 Re: [問題] 有關於寫檔及取代問題Neisseria [問題] 有關於寫檔及取代問題justfor0223 Re: [問題] 比較並取代字串的文字iyaicharles [問題] 比較並取代字串的文字eric2853