[問題] 製作dummy variable矩陣效能問題

作者: mowgur (PINNNNN)   2017-12-29 14:01:27
*[m- 問題: 當你想要問問題時,請使用這個類別。
建議先到 http://tinyurl.com/mnerchs 搜尋本板舊文。
[問題類型]:
效能諮詢(我想讓R 跑更快)
[軟體熟悉度]:
使用者(已經有用R 做過不少作品)
[問題敘述]:
大家好 我的資料是紀錄籃球比賽每個play是哪5個進攻及防守球員在場上
想做的事情是: 假設總共有500位球員 做出一個n(750000) x p(1000)的矩陣
前500欄為進攻 後500欄為防守
矩陣內的元素為1代表球員在場上進攻(防守為-1) 不在場上為0
所以每列會有5個1及5個-1還有很多個0
資料大概長這樣
data$p.combination data$p.com.allowed
1 A, B, C, D, E J, K, L, M, N
2 A, C, F, H, I K, L, M, N, O
3 C, D, X, Y, Z K, M, O, Q, R
... ... ...
人名之間是用逗號和一個空格分開
用我自己寫的已經跑了快12小時還沒跑完
想請教版上各位大大有沒有更好的寫法
[程式範例]:
https://ideone.com/PaBtM4
library(magrittr)
p.combination = character(1000)
for(i in 1:length(p.combination)){
p.combination[i] = LETTERS[sample(1:26,5)] %>% paste0(collapse = ", ")
}
p.com.allowed = character(1000)
for(i in 1:length(p.com.allowed)){
p.com.allowed[i] = LETTERS[sample(1:26,5)] %>% paste0(collapse = ", ")
}
data = data.frame(p.combination = p.combination,
p.com.allowed = p.com.allowed)
player = LETTERS[1:26]
input.matrix0 = function(data, player, off){
X = matrix(ncol = length(player), nrow = dim(data)[1])
for(i in 1:dim(data)[1]){
if(off) {
colnames(X) = paste0("O_",player)
coding = 1
pp = data$p.combination
} else {
colnames(X) = paste0("D_",player)
coding = -1
pp = data$p.com.allowed
}
player.temp = pp[i] %>% gsub(", ", "|",.)
index = grep(player.temp, player)
X[i,index] = coding
X[i,-index] = 0
}
return(X)
}
input.matrix = function(data, player){
X.off = input.matrix0(data, player, T)
X.def = input.matrix0(data, player, F)
return(cbind(X.off, X.def))
}
out = input.matrix(data,player)

Links booklink

Contact Us: admin [ a t ] ucptt.com