Data Science Multiple Choice Questions & Answers focuses on “Caret”.
1. varImp is a wrapper around the evimp function in the _______ package.
a) numpy
b) earth
c) plot
d) none of the mentioned
Answer: b
Explanation: The earth package is an implementation of Jerome Friedman’s Multivariate Adaptive Regression Splines.
2. Point out the wrong statement.
a) The trapezoidal rule is used to compute the area under the ROC curve
b) For regression, the relationship between each predictor and the outcome is evaluated
c) An argument, para, is used to pick the model fitting technique
d) All of the mentioned
Answer: c
Explanation: An argument, nonpara, is used to pick the model fitting technique.
3. Which of the following curve analysis is conducted on each predictor for classification?
a) NOC
b) ROC
c) COC
d) All of the mentioned
Answer: b
Explanation: For two class problems, a series of cutoffs is applied to the predictor data to predict the class.
4. Which of the following function tracks the changes in model statistics?
a) varImp
b) varImpTrack
c) findTrack
d) none of the mentioned
Answer: a
Explanation: GCV change value can also be tracked.
5. Point out the correct statement.
a) The difference between the class centroids and the overall centroid is used to measure the variable influence
b) The Bagged Trees output contains variable usage statistics
c) Boosted Trees uses different approach as a single tree
d) None of the mentioned
Answer: a
Explanation: The larger the difference between the class centroid and the overall center of the data, the larger the separation between the classes.
6. Which of the following model model include a backwards elimination feature selection routine?
a) MCV
b) MARS
c) MCRS
d) All of the mentioned
Answer: b
Explanation: MARS stands for Multivariate Adaptive Regression Splines.
7. The advantage of using a model-based approach is that is more closely tied to the model performance.
a) True
b) False
Answer: a
Explanation: Model-based approach is able to incorporate the correlation structure between the predictors into the importance calculation.
8. Which of the following model sums the importance over each boosting iteration?
a) Boosted trees
b) Bagged trees
c) Partial least squares
d) None of the mentioned
Answer: a
Explanation: gbm package can be used here.
9. Which of the following argument is used to set importance values?
a) scale
b) set
c) value
d) all of the mentioned
Answer: a
Explanation: All measures of importance are scaled to have a maximum value of 100.
10. For most classification models, each predictor will have a separate variable importance for each class.
a) True
b) False
Answer: a
Explanation: The exceptions are classification trees, bagged trees and boosted trees.