※ 引述《hanglong (小煥)》之銘言:
: [問題類型]:
: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
: 我想要抓台灣每次地震,固定地點的震度
: [軟體熟悉度]:
: 新手(沒寫過程式,R 是我的第一次)
: [問題敘述]:
: 以台灣時間09-11 05:24的地震為例,它的網址是:
: https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636
: 以此網址為例,我可以擷取固定地點,例如 玉山 的資料
: 但是地震發生很多次,每次的網址都不一樣,不可能每次都手動抓網址,
: 再用R抓到玉山的震度。
: 我在地震資料的網址中,可以從原始碼中看到每個地震的連結,
: https://scweb.cwb.gov.tw/zh-tw/earthquake/data/
: 因此,如果有辦法可以從這裡擷取到每次地震的網址,
: 應該就可以完成我的需求,但是這個網址的部分,不知道該如何擷取,
: 在請版上的各位先進幫忙,謝謝。
: [程式範例]:
:
: 在https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636之下
: 利用以下程式,可以看到玉山的震度:
: data <-
: read_html("https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636")
: ths <- xml_find_all(data, "//div/ul/li")
: xml_text(ths)[substring(xml_text(ths),1,2) == "玉山"]
: [1] "玉山 1"
: 但在https://scweb.cwb.gov.tw/zh-tw/earthquake/data/之下,
: 我想要用一樣的方式,至少先擷取出網址的位置,結果什麼都沒有...
: 程式如下:
: data <-
: read_html("https://scweb.cwb.gov.tw/zh-tw/earthquake/data/")
: ths <- xml_find_all(data, "//div/table/tbody/tr/td/a")
: xml_text(ths)
: character(0)
: [環境敘述]:
:
: [關鍵字]:
: 中央氣象局 地震
看原PO好像還沒解決問題...
我來補上我的做法
先上結果圖: https://i.imgur.com/0PWmvmf.png
邏輯解釋:
基本上觀察一下network可以找到相關的id
它便是用ajaxhandler這個API去撈地震資料下來
所以我們可以在network裡面看到這個網址:
https://scweb.cwb.gov.tw/zh-tw/earthquake/ajaxhandler
然後Network裡面會跟你說它是用POST,然後打了一個很長的form (下面POST的body)
只要乖乖照著打就可以拿到10筆了
然後只要把length那個參數改掉就能拿更多筆 (九月現在最多25筆)
然後改Search就能切換月份,講完了.... 直接上程式
程式碼:
library(httr)
library(pipeR)
library(xml2)
library(stringr)
url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/ajaxhandler"
referer_url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/data/"
response_data <- POST(url, body = list("draw" = "1",
"columns[0][data]" = "0",
"columns[0][name]" = "EventNo",
"columns[0][searchable]" = "false",
"columns[0][orderable]" = "true",
"columns[0][search][value]" = "",
"columns[0][search][regex]" = "false",
"columns[1][data]" = "1",
"columns[1][name]" = "MaxIntensity",
"columns[1][searchable]" = "true",
"columns[1][orderable]" = "true",
"columns[1][search][value]" = "",
"columns[1][search][regex]" = "false",
"columns[2][data]" = "2",
"columns[2][name]" = "OriginTime",
"columns[2][searchable]" = "true",
"columns[2][orderable]" = "true",
"columns[2][search][value]" = "",
"columns[2][search][regex]" = "false",
"columns[3][data]" = "3",
"columns[3][name]" = "MagnitudeValue",
"columns[3][searchable]" = "true",
"columns[3][orderable]" = "true",
"columns[3][search][value]" = "",
"columns[3][search][regex]" = "false",
"columns[4][data]" = "4",
"columns[4][name]" = "Depth",
"columns[4][searchable]" = "true",
"columns[4][orderable]" = "true",
"columns[4][search][value]" = "",
"columns[4][search][regex]" = "false",
"columns[5][data]" = "5",
"columns[5][name]" = "Description",
"columns[5][searchable]" = "true",
"columns[5][orderable]" = "true",
"columns[5][search][value]" = "",
"columns[5][search][regex]" = "false",
"columns[6][data]" = "6",
"columns[6][name]" = "Description",
"columns[6][searchable]" = "true",
"columns[6][orderable]" = "true",
"columns[6][search][value]" = "",
"columns[6][search][regex]" = "false",
"order[0][column]" = "2",
"order[0][dir]" = "desc",
"start" = "0",
"length" = "10",
"search[value]" = "",
"search[regex]" = "false",
"Search" = "2019年9月",
"txtSDate" = "",
"txtEDate" = "",
"txtSscale" = "",
"txtEscale" = "",
"txtSdepth" = "",
"txtEdepth" = "",
"txtLonS" = "",
"txtLonE" = "",
"txtLatS" = "",
"txtLatE" = "",
"ddlCity" = "",
"ddlCitySta" = "",
"txtIntensityB" = "",
"txtIntensityE" = "",
"txtLon" = "",
"txtLat" = "",
"txtKM" = "",
"ddlStationName" = ""),
encode = "form",
add_headers(Referer = referer_url)) %>>% content
data_ids <- sapply(response_data$data, `[[`, 1L)
details_url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/details/"
detail_urls <- paste0(details_url, data_ids)
eq_details <- lapply(detail_urls, function(url){
GET(url) %>>% content %>>%
xml_find_all("//ul[@class='eqResultBoxRight BulSet BkeyinList']") %>>%
xml_find_all("li") %>>%
xml_text %>>%
str_replace_all("[\\s]", "") %>>%
`[`(2L:6L)
})
有任何問題再推文問吧