[心得] sqldf效率問題

作者: kenshin528 (成立奧凶帝國!!)   2014-07-24 13:30:54
[關鍵字]:sqldf tapply
[重點摘要]:
剛開始學R的時候對於R的指令很不熟,所以大部分都習慣用sqldf來寫查詢指令
但是隨著對R的熟悉,最近也在嘗試用R內建的fnction來查詢資料
所以才想說來比較一下兩者的效能差異
實驗的DATA很簡單(大至長這樣,實驗方法就是增加row的數量)
Category FREQ
T 0.2
T 0.3
T 0.4
F 0.5
F 0.6
F 0.7
目的是依照category來sum FREQ
原始碼
#產生DATASET
x <- data.frame(Freq=runif(1000000,0,1),Category=c("T","F"))
##測試SQL
ptm_sql <- proc.time()
result<-sqldf("SELECT Category, sum(Freq)
FROM x
GROUP BY Category
")
ptm_sql <- proc.time() - ptm_sql
ptm_sql
##測試tapply
ptm_tapply <- proc.time()
result<-tapply(x$Freq, x$Category, FUN=sum)
ptm_tapply <- proc.time() - ptm_tapply
ptm_tapply
測試結果:
當rows = 10,000時
user system elapsed
SQLDF 0.05 0.00 0.94
TAPPLY 0.00 0.00 0.34

Links booklink

Contact Us: admin [ a t ] ucptt.com