Data Science Multiple Choice Questions on “caret”.
1. Which of the following can be used to generate balanced cross–validation groupings from a set of data?
a) createFolds
b) createSample
c) createResample
d) none of the mentioned
Answer: a
Explanation: createResample can be used to make simple bootstrap samples.
2. Point out the wrong statement.
a) Simple random sampling of time series is probably the best way to resample times series data.
b) Three parameters are used for time series splitting
c) Horizon parameter is the number of consecutive values in test set sample
d) All of the mentioned
Answer: a
Explanation: Simple random sampling of time series is probably not the best way to resample times series data.
3. Which of the following function can be used to maximize the minimum dissimilarities?
a) sumDiss
b) minDiss
c) avgDiss
d) all of the mentioned
Answer: d
Explanation: sumDiss can be used to maximize the total dissimilarities.
4. Which of the following function can create the indices for time series type of splitting?
a) newTimeSlices
b) createTimeSlices
c) binTimeSlices
d) none of the mentioned
Answer: b
Explanation: Rolling forecasting origin techniques are associated with time series type of splitting.
5. Point out the correct statement.
a) Asymptotics are used for inference usually
b) Caret includes several functions to pre-process the predictor data
c) The function dummyVars can be used to generate a complete set of dummy variables from one or more factors
d) All of the mentioned
Answer: d
Explanation: The function dummyVars takes a formula and a data set and outputs an object that can be used to create the dummy variables using the predict method.
6. Which of the following can be used to create sub–samples using a maximum dissimilarity approach?
a) minDissim
b) maxDissim
c) inmaxDissim
d) all of the mentioned
Answer: b
Explanation: Splitting is based on the predictors.
7. caret does not use the proxy package.
a) True
b) False
Answer: b
Explanation: caret uses the proxy package.
8. Which of the following function can be used to create balanced splits of the data?
a) newDataPartition
b) createDataPartition
c) renameDataPartition
d) none of the mentioned
Answer: b
Explanation: If the y argument to this function is a factor, the random sampling occurs within each class and should preserve the overall class distribution of the data.
9. Which of the following package tools are present in caret?
a) pre-processing
b) feature selection
c) model tuning
d) all of the mentioned
Answer: d
Explanation: There are many different modeling functions in R.
10. caret stands for classification and regression training.
a) True
b) False
Answer: a
Explanation: The caret package is a set of functions that attempt to streamline the process for creating predictive models.