Why does Random Forest variable importance not sum to 100%?
Why does Random Forest variable importance not sum to 100%?
The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?
Here's a simple reproducible example:
library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100
Thanks
$begingroup$
See Measures of variable importance in random forests
$endgroup$
– Firebug
Sep 8 '18 at 21:22
1 Answer
1
As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.
You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?
(Welcome to the site!)
Thanks for contributing an answer to Cross Validated!
But avoid …
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
$begingroup$
Why do you assume it should sum to 1? I see no reason for that belief.
$endgroup$
– Firebug
Sep 8 '18 at 18:31