Data Science Multiple Choice Questions on “Analysis and Experimental Design”.
1. If X predicts Y, it does mean X causes Y.
a) True
b) False
Answer: b
Explanation: If X predicts Y, it does not mean X causes Y.
2. Point out the correct statement.
a) If equations are known but the parameters are not, they may be inferred with data analysis
b) If equations are not known but the parameters are, they may be inferred with data analysis
c) If equations and parameter are not, they may be inferred with data analysis
d) None of the mentioned
Answer: a
Explanation: Usually the random component of data is measurement error.
3. Which of the following is the top most important thing in data science?
a) answer
b) question
c) data
d) none of the mentioned
Answer: b
Explanation: The second most important is the data.
4. Which of the following approach should be used if you can’t fix the variable?
a) randomize it
b) non stratify it
c) generalize it
d) none of the mentioned
Answer: a
Explanation: If you can’t fix the variable, stratify it.
5. Point out the wrong statement.
a) Randomized studies are not used to identify causation
b) Complication approached exist for inferring causation
c) Causal relationships may not apply to every individual
d) All of the mentioned
Answer: a
Explanation: Randomized studies are usually used to identify causation.
6. Which of the following is a good way of performing experiments in data science?
a) Measure variability
b) Generalize to the problem
c) Have Replication
d) All of the mentioned
Answer: d
Explanation: Experiments on causal relationships investigate the effect of one or more variables on one or more outcome variables.
7. Which of the following is commonly referred to as ‘data fishing’?
a) Data bagging
b) Data booting
c) Data merging
d) None of the mentioned
Answer: d
Explanation: Data dredging is sometimes referred to as “data fishing”.
8. Which of the following data mining technique is used to uncover patterns in data?
a) Data bagging
b) Data booting
c) Data merging
d) Data Dredging
Answer: d
Explanation: Data dredging, also called as data snooping, refers to the practice of misusing data mining techniques to show misleading scientific ‘research’.