Performing SVM regression with Test and training sets in R

Performing SVM regression with Test and training sets in R



I have a response variable contains 100 observation and I wish to estimate them by using 8 independent variables via employing supper Vector Regression.



I have searched a lot to find a template in order to implement my SVR with training and testing sets in R, but I could not find the way which I wanted.



I have used the following code to fit the model and calculate RMSE, but I want to check my model for unseen data and I do not know how to perform this in R.



My code is as follows:


data<-read.csv("Enzyme.csv",header = T)
Testset <- data[c(11:30),]
Trainset <- data[-c(11:30), ]
#attached dependent variable
Y<-Trainset$Urease
Trainset<-Trainset[,-c(1)]
SVMUr <- svm (Urease~., data=Trainset, kernel="radial",gamma=
1,epsilon=seq(0,1,0.1), cost=10)
summary(SVMUr)

################### RMSE SVMUr ##########################
RMSE <- function(observed, predicted)
sqrt(mean((predicted - observed)^2, na.rm=TRUE))

RMSE(observed =Y,predicted = predSVMUr)

######## Check the model for unseen data via using testset ######
predicted_test <- predict(SVMUr, Testset[,-1])
RMSE(Testset$Urease, predicted_test)




1 Answer
1



The way you want to go about testing your model is to:


predict(SVMUr, Testset[,-1])


Y


RMSE()



Additional Recommendation:
I would not split the data the way you do because as you've pointed out you end up with too little training data in relation to test data. If you want to split it by 80%-20%, you can adjust from my code below:


data<-read.csv("Enzyme.csv",header = T)

split_data <- sample(nrow(data), nrow(data)*0.8)
Trainset <- data[split_data, ]
Testset <- data[-split_data, ]



That would put 80% of your data in the train set and 20% in the test set.



The rest of the code:


SVMUr <- svm (Urease~., data=Trainset, kernel="radial",gamma=
1,epsilon=seq(0,1,0.1), cost=10)
summary(SVMUr)

################### RMSE SVMUr ##########################
RMSE <- function(observed, predicted)
sqrt(mean((predicted - observed)^2, na.rm=TRUE))

RMSE(observed =Trainset$Urease, predicted = predSVMUr)

######## Check the model for unseen data via using testset ######
predicted_test <- predict(SVMUr, Testset[,-1])
RMSE(Testset$Urease, predicted_test)





Dear, I have edited the code as you told and rewrite the whole code and wanted to ask you whether everything fine with this code or not? I think it could be correct but I think I wrote about it very badly. There is also one question left if the result if test set is not good as we as trainset is there any possibility to improve the result? e.g using different tests not only one which is chosen??
– morteza
Jul 21 at 12:26






You shouldn't edit your question to include the answer as that may confuse other StackOverflow user landing on this thread. :) You may want to refer to my code above - since you are doing svm (Urease~., data=Trainset) there is no need to drop the first variable anymore like what you just did with Trainset<-Trainset[,-c(1)].
– onlyphantom
Jul 21 at 12:30



svm (Urease~., data=Trainset)


Trainset<-Trainset[,-c(1)]





Oh sorry, I did not know that :(. The other question is I applied this but the RMSE for Testset was way bigger than Training set which is not acceptable in this case. I was wondering if there is a way to improve the result in Testset, for example, using the other row of observation instead of the one which is chosen Testset <- data[c(11:30),].
– morteza
Jul 21 at 12:37





No worries @morteza; happy to help. Additional questions should be posted as separate StackOverflow questions, but to answer it succinctly, the general rule is to use 80% of the data as train, and 20% as test; It's just a general rule of thumb but more training data = better model, so I would recommend that.
– onlyphantom
Jul 21 at 12:39





No problem - I edited the answer to help you partition your train and test set using 80%-20% split. But if you like, you can change it to 75-25 by replacing 0.8 with 0.75 for example. Feel free to accept the answer if that solves your problem.
– onlyphantom
Jul 21 at 12:44


0.8


0.75






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)