[問題] dataframe includes date with caret

作者: babysian7 (Babysian)   2015-11-03 04:18:09
文章分類提示:
- 問題: 當你想要問問題時,請使用這個類別
[問題類型]:
程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
[軟體熟悉度]:
入門
[問題敘述]:
我有一個dataframe,裡面包含日期變數,
'data.frame': 1000 obs. of 49 variables:
$ estate_Post : int 10069 10065 10044 10044 10044 10045 10044
10045 10044 10045 ...
$ estate_TransType : int 3 1 4 2 4 4 4 4 4 4 ...
$ estate_LandArea : num 15.54 47.3 20.89 1.99 23.98 ...
$ estate_ZoneUse : int 2 2 3 3 3 3 3 3 3 3 ...
$ estate_TransDate : Date, format: "1989-03-01" "1998-01-01"
"2015-01-01" "2015-01-01" ...
$ estate_Land : int 1 1 1 0 1 1 1 1 1 1 ...
$ estate_House : int 1 0 1 0 1 1 1 1 1 1 ...
$ estate_ParkingLot : int 0 0 2 2 2 1 3 3 4 3 ...
$ estate_TransFloor : int 5 -99 17 -4 11 6 6 5 15 5 ...
$ estate_TotalFloor : int 5 -99 31 31 31 31 31 31 31 31 ...
$ estate_HouseType : int 1 12 2 12 2 2 2 2 2 2 ...
$ estate_HouseUse : int 1 -99 1 3 1 1 1 1 1 1 ...
$ estate_HouseMaterials: int 5 -99 13 13 13 13 13 13 13 13 ...
$ estate_HouseDate : Date, format: "1967-05-19" NA "2013-11-29"
"2013-11-29" ...
$ estate_HouseArea : num 35.1 0 442.7 62.1 507.1 ...
$ estate_HouseRoom_1 : int 1 0 5 0 5 4 4 4 3 4 ...
$ estate_HouseRoom_2 : int 1 0 2 0 2 2 2 2 2 2 ...
$ estate_HouseRoom_3 : int 1 0 6 0 6 3 3 3 3 3 ...
$ estate_HouseRoom_4 : int 1 1 1 1 1 1 1 1 1 1 ...
$ estate_Guards : int 2 2 2 2 2 2 2 2 2 2 ...
$ estate_Price : int 3535 54299 164882 -99 195808 181428 174799
175356 190717 165250 ...
$ estate_ParkingType : int -99 -99 3 4 3 4 4 4 4 4 ...
$ estate_ParkingArea : num 0 0 13.2 32.2 27.5 ...
$ estate_ParkingPrice : int 0 0 0 5600000 0 0 0 0 8400000 0 ...
$ estate_Lng : num 122 122 122 122 122 ...
$ estate_Lat : num 25 25 25 25 25 ...
$ Aport_Distance : num 7.3 6.7 5.3 5.3 5.3 5.3 5.3 5.3 5.3 5.3 ...
$ ParkB_Distance : num 0.29 0.785 0.214 0.217 0.215 ...
$ Univ_Distance : num 1.7 1 1 1 1 1 1 1 1 1 ...
$ ParkR_Distance : num 1.4 2 1.7 1.7 1.7 1.6 1.7 1.7 1.7 1.6 ...
$ MRT_StationDistance : num 0.914 0.327 0.403 0.401 0.402 ...
$ MRT_LineDistance : num 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_EntranceDistance: int 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_LineDistance : int 999 999 999 999 999 999 999 999 999 999 ...
$ TRA_StationDistance : num 1 1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ THSR_StationDistance : num 3.1 2.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ River_Distance : num 999 1.84 1.49 1.48 1.49 ...
$ Schools_Distance : num 0.2 0.2 0.7 0.7 0.7 0.8 0.7 0.7 0.7 0.8 ...
$ Lib_Distance : num 0.8 0.9 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 ...
$ Sport_Distance : num 2.4 1.8 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ ParkS_Distance : num 0.6 1 0.6 0.6 0.6 0.7 0.6 0.6 0.6 0.7 ...
$ Hyper_Distance : num 1.3 0.6 1.2 1.2 1.2 1.1 1.2 1.2 1.2 1.1 ...
$ Shop_Distance : num 1.7 1 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Post_Distance : num 0.5 0.2 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Hosp_Distance : num 0.7 0.4 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ Gas_Distance : num 0.5 0.4 1.4 1.4 1.4 1.4 1.4 1.5 1.4 1.4 ...
$ Incin_Distance : num 10.9 10.2 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 ...
$ Mort_Distance : num 6.3 5.7 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 ...
$ estate_TotalPrice : num 124117 2568347 73000000 5600000 99300000 ...
當我將日期變數寫成as.Date後,在挑選參數時會有錯誤訊息
Error in { :
task 1 failed - "rfe is expecting 48 importance values but only has 46"
In addition: Warning messages:
1: In predict.lm(object, x) :
prediction from a rank-deficient fit may be misleading
請問我該怎麼改才好
[程式範例]:
library(mlbench)
library(caret)
library(maps)
library(rgdal)
library(raster)
library(sp)
library(spdep)
library(GWmodel)
library(e1071)
library(plyr)
library(kernlab)
library(zoo)
mydata <-
read.csv("E:/SupportVectorRegression/Realestatedata_1000_delete_date.csv",
header=TRUE)
mydata$estate_TransDate<-as.Date(paste(mydata$estate_TransDate,1,sep="-"),format="%Y-%m-%d")
mydata$estate_HouseDate<-as.Date(mydata$estate_HouseDate,format="%Y-%m-%d")
rfectrl <- rfeControl(functions=lmFuncs,
method="cv",number=10,verbose=TRUE,returnResamp = "final")
results <- rfe(mydata[,1:4],mydata[,49],sizes =
c(1:49),rfeControl=rfectrl,method = "svmRadial")
#metric = "Rsquared"
print(results)
predictors(results)
plot(results, type=c("g", "o"))
[環境敘述]:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
[關鍵字]:
caret、dataframe、date
作者: celestialgod (天)   2015-11-03 08:40:00
算correlation看看是不是有兩個變數跟其他變數相關係數很高這個真像實價等登錄的資料感覺是input date出錯,date是你的變數之一嗎?
作者: babysian7 (Babysian)   2015-11-03 13:42:00
您好,裡面的兩個變數date型態,我想把他們當作input,但不知道是哪裡出錯了
作者: celestialgod (天)   2015-11-03 14:08:00
http://tinyurl.com/p6hbvjy跟我想法一致XDD我自己去生成date去跑沒問題 他當成整數在run應該是你資料有一部分是相依我也試過NA沒有問題
作者: babysian7 (Babysian)   2015-11-06 16:58:00
您好:謝謝您的解答。另外在更改的過程中有新的問題,我把NA的部分都改掉,錯誤訊息是missing value where TRUE/FALSE needed In adition:There were20 warnings(use warnings() to see them)不是很明白,因為我的資料都是連續型的數值,沒有TRUE/FALSE...
作者: celestialgod (天)   2015-11-07 11:25:00
沒看到程式 我也無法隔空抓藥 如果能附資料一起 我才能重現錯誤 並嘗試找出解決方法
作者: babysian7 (Babysian)   2015-11-11 13:35:00
您好:我將資料整理好如下https://www.dropbox.com/sh/u62abna1cp4fw8n/AAC9EXdhNN8GKdVqkgOM6OQ-a?dl=0謝謝
作者: celestialgod (天)   2015-11-12 21:45:00
放棄~"~ 不知道怎麼辦qq寫信去問作者吧QQ
作者: babysian7 (Babysian)   2015-11-13 13:00:00
還是謝謝您撥空幫忙:)

Links booklink

Contact Us: admin [ a t ] ucptt.com