250+ TOP MCQs on Plotting in Python and Answers

Data Science Multiple Choice Questions on “Plotting in Python”.

1. The plot method on Series and DataFrame is just a simple wrapper around ____________
a) gplt.plot()
b) plt.plot()
c) plt.plotgraph()
d) none of the mentioned

Answer: b
Explanation: If the index consists of dates, it calls gcf().autofmt_xdate() to try to format the x-axis nicely.

2. Point out the correct combination with regards to kind keyword for graph plotting.
a) ‘hist’ for histogram
b) ‘box’ for boxplot
c) ‘area’ for area plots
d) all of the mentioned

Answer: d
Explanation: The kind keyword argument of plot() accepts a handful of values for plots other than the default Line plot.

3. Which of the following value is provided by kind keyword for barplot?
a) bar
b) kde
c) hexbin
d) none of the mentioned

Answer: a
Explanation: bar can also be used for barplot.

4. You can create a scatter plot matrix using the __________ method in pandas.tools.plotting.
a) sca_matrix
b) scatter_matrix
c) DataFrame.plot
d) all of the mentioned

Answer: b
Explanation: You can create density plots using the Series/DataFrame.plot.

5. Point out the wrong combination with regards to kind keyword for graph plotting.
a) ‘scatter’ for scatter plots
b) ‘kde’ for hexagonal bin plots
c) ‘pie’ for pie plots
d) none of the mentioned

Answer: b
Explanation: kde is used for density plots.

6. Which of the following plots are used to check if a data set or time series is random?
a) Lag
b) Random
c) Lead
d) None of the mentioned

Answer: a
Explanation: Random data should not exhibit any structure in the lag plot.

7. Plots may also be adorned with error bars or tables.
a) True
b) False

Answer: a
Explanation: There are several plotting functions in pandas.tools.plotting.

8. Which of the following plots are often used for checking randomness in time series?
a) Autocausation
b) Autorank
c) Autocorrelation
d) None of the mentioned

Answer: c
Explanation: If the time series is random, such autocorrelations should be near zero for any and all time-lag separations.

9. __________ plots are used to visually assess the uncertainty of a statistic.
a) Lag
b) RadViz
c) Bootstrap
d) None of the mentioned

Answer: c
Explanation: Resulting plots and histograms are what constitutes the bootstrap plot.

10. Andrews curves allow one to plot multivariate data.
a) True
b) False

Answer: a
Explanation: Curves belonging to samples of the same class will usually be closer together and form larger structures.

250+ TOP MCQs on Exploratory Graphs and Answers

Data Science Multiple Choice Questions on “Exploratory Graphs”.

1. Which of the following is also referred to as overlayed 1D plot?
a) lattice
b) barplot
c) gplot
d) all of the mentioned

Answer: a
Explanation: lattice is an add-on package that implements Trellis graphics.

2. Spinning plots can be used for two dimensional data.
a) True
b) False

Answer: a
Explanation: There are many ways to create a 3D spinning plot as well.

3. Which of the following gave rise to need of graphs in data analysis?
a) Data visualization
b) Communicating results
c) Decision making
d) All of the mentioned

Answer: d
Explanation: A picture can tell better story than data.

4. Which of the following is characteristic of exploratory graph?
a) Made slowly
b) Axes are not cleaned up
c) Color is used for personal information
d) All of the mentioned

Answer: c
Explanation: A large number of exploratory graphs are made.

5. Point out the correct statement.
a) coplots are one dimensional data graph
b) Exploratory graphs are made quickly
c) Exploratory graphs are made relatively less in number
d) All of the mentioned

Answer: a
Explanation: coplot is used for two dimensional representation.

6. Which of the following graph can be used for simple summarization of data?
a) Scatterplot
b) Overlaying
c) Barplot
d) All of the mentioned

Answer: c
Explanation: A bar chart or bar graph is a chart that presents Grouped data with rectangular bars with lengths proportional to the values that they represent.

7. Color and shape are used to add dimensions to graph data.
a) True
b) False

Answer: a
Explanation: Graphs are commonly used by print and electronic media.

8. Which of the following information is not given by five-number summary?
a) Mean
b) Median
c) Mode
d) All of the mentioned

Answer: c
Explanation: The mode is the value that appears most often in a set of data.

250+ TOP MCQs on Prediction Motivation and Answers

Data Science Multiple Choice Questions on “Prediction Motivation”.

1. Which of the following is the valid component of the predictor?
a) data
b) question
c) algorithm
d) all of the mentioned

Answer: d
Explanation: A prediction is a statement about the future.

2. Point out the wrong statement.
a) In Sample Error is also called generalization error
b) Out of Sample Error is the error rate you get on the new dataset
c) In Sample Error is also called resubstitution error
d) All of the mentioned

Answer: a
Explanation: Out of Sample Error is also called generalization error.

3. Which of the following is correct order of working?
a) questions->input data ->algorithms
b) questions->evaluation ->algorithms
c) evaluation->input data ->algorithms
d) all of the mentioned

Answer: a
Explanation: Evaluation is done in the last.

4. Which of the following shows correct relative order of importance?
a) question->features->data->algorithms
b) question->data->features->algorithms
c) algorithms->data->features->question
d) none of the mentioned

Answer: b
Explanation: Garbage in should be equal to garbage out.

5. Point out the correct statement.
a) In Sample Error is the error rate you get on the same dataset used to model a predictor
b) Data have two parts-signal and noise
c) The goal of predictor is to find signal
d) None of the mentioned

Answer: d
Explanation: Perfect in sample prediction can be built.

6. Which of the following is characteristic of best machine learning method?
a) Fast
b) Accuracy
c) Scalable
d) All of the mentioned

Answer: d
Explanation: There is always a trade-off in prediction accuracy.

7. True positive means correctly rejected.
a) True
b) False

Answer: b
Explanation: True positive means correctly identified.

8. Which of the following trade-off occurs during prediction?
a) Speed vs Accuracy
b) Simplicity vs Accuracy
c) Scalability vs Accuracy
d) None of the mentioned

Answer: d
Explanation: Interpretability also matters during prediction.

9. Which of the following expression is true?
a) In sample error < out sample error
b) In sample error > out sample error
c) In sample error = out sample error
d) All of the mentioned

Answer: a
Explanation: Out of sample error is given more importance.

10. Backtesting is a key component of effective trading-system development.
a) True
b) False

Answer: a
Explanation: Backtesting is the process of applying a trading strategy or analytical method to historical data to see how accurately the strategy or method would have predicted actual results.

250+ TOP MCQs on Computational tools and Answers

Data Science Multiple Choice Questions on “Computational tools”.

1. Which of the following is used to compute the percent change over a given number of periods?
a) pct_change
b) percent_change
c) per_change
d) none of the mentioned

Answer: a
Explanation: Series, DataFrame, and Panel all have a method pct_change.

2. Point out the correct statement.
a) Pandas represents timestamps in microsecond resolution
b) Pandas is 100% thread safe
c) For Series and DataFrame objects, var normalizes by N-1 to produce unbiased estimates
d) All of the mentioned

Answer: c
Explanation: Pandas represents timestamps in nanosecond resolution.

3. Which of the following object has a method cov to compute covariance between series?
a) Series
b) DataFrame
c) Panel
d) None of the mentioned

Answer: a
Explanation: DataFrame has a method cov to compute pairwise covariances among the series in the DataFrame, also excluding NA/null values.

4. Which of the following specifies the required minimum number of observations for each column pair in order to have a valid result?
a) min_periods
b) max_periods
c) minimum_periods
d) all of the mentioned

Answer: a
Explanation: DataFrame.cov also supports an optional min_periods.

5. Point out the wrong statement.
a) lxml is very fast
b) lxml requires Cython to install correctly
c) lxml does not make any guarantees about the results of it’s parse
d) none of the mentioned

Answer: c
Explanation: There are some versioning issues surrounding the libraries that are used to parse HTML tables in the top-level pandas io function read_html.

6. Which of the following is implemented on DataFrame to compute the correlation between like-labeled Series contained in different DataFrame objects?
a) corrwith
b) corwith
c) corwit
d) none of the mentioned

Answer: a
Explanation: A score close to 1 means their tastes are very similar.

7. rolling_count function gives the number of non-null observations.
a) True
b) False

Answer: b
Explanation: The binary operators take two Series or DataFrames.

8. Which of the following method produces a data ranking with ties being assigned the mean of the ranks for the group?
a) rank
b) dense_rank
c) partition_rank
d) none of the mentioned

Answer: a
Explanation: rank is also a DataFrame method.

9. Which of the following can potentially change the dtype of a series?
a) reindex_like
b) index_like
c) itime_like
d) none of the mentioned

Answer: a
Explanation: reindex_like silently inserts NaNs and the dtype changes accordingly.

10. cov and corr supports the optional min_periods keyword.
a) True
b) False

Answer: a
Explanation: Non-numeric columns will be automatically excluded from the correlation calculation.

250+ TOP MCQs on Introduction to Reproducible Research and Answers

Data Science Multiple Choice Questions on “Introduction to Reproducible Research”.

1. Which of the following problem is solved by reproducibility?
a) Scalability
b) Data availability
c) Improved data analysis
d) None of the mentioned

Answer: b
Explanation: More transparency is achieved with reproducibility.

2. Point out the correct statement with respect to replication.
a) Focuses on the validity of the data analysis
b) Focuses on the validity of the scientific claim
c) Arguably a minimum standard for any scientific study
d) All of the mentioned

Answer: a
Explanation: Data replication if the same data is stored on multiple storage device.

3. Which of the following is effective way of checking validity of data analysis?
a) Re-run the analysis
b) Review the code
c) Check the sensitivity
d) All of the mentioned

Answer: d
Explanation: Reproducibility addresses the most “downstream” aspect of the research process.

4. Which of the following is similar to a pre-specified clinical trial protocol?
a) Caching-based Data Analysis
b) Evidence-based Data Analysis
c) Markdown-based Data Analysis
d) All of the mentioned

Answer: b
Explanation: Evidence-based Data Analysis a deterministic statistical machine.

5. Point out the wrong statement with respect to reproducibility.
a) Focuses on the validity of the data analysis
b) The ultimate standard for strengthening scientific evidence
c) Important when replication is impossible
d) None of the mentioned

Answer: b
Explanation: Replication is particularly important in studies that can impact broad policy or regulatory decisions.

6. Which of the following can be used for data analysis model?
a) CRAN
b) CPAN
c) CTAN
d) All of the mentioned

Answer: d
Explanation: Different problems require different approaches and expertise.

7. Reproducibility determines correctness of data analysis.
a) True
b) False

Answer: b
Explanation: Reproducibility has nothing to do with validity of data analysis.

8. Which of the following step is not required in data analysis?
a) Synthesize results
b) Create reproducible code
c) Interpret results
d) None of the mentioned

Answer: d
Explanation: The data set may depend on your goal.

9. Which of the following gives reviewers an important tool without dramatically increasing the burden?
a) Quality research
b) Replication research
c) Reproducible research
d) None of the mentioned

Answer: c
Explanation: Reproducible research is important, but does not necessarily solve the critical question of whether a data analysis is trustworthy.

10. Result analysis are relatively easy to replicate or reproduce.
a) True
b) False

Answer: b
Explanation: Complicated analyses should not be trusted.

250+ TOP MCQs on Cross Validation and Answers

Data Science Multiple Choice Questions on “Cross Validation”.

1. Which of the following is correct use of cross validation?
a) Selecting variables to include in a model
b) Comparing predictors
c) Selecting parameters in prediction function
d) All of the mentioned

Answer: d
Explanation: Cross-validation is also used to pick type of prediction function to be used.

2. Point out the wrong combination.
a) True negative=correctly rejected
b) False negative=correctly rejected
c) False positive=correctly identified
d) All of the mentioned

Answer: c
Explanation: False positive means incorrectly identified.

3. Which of the following is a common error measure?
a) Sensitivity
b) Median absolute deviation
c) Specificity
d) All of the mentioned

Answer: d
Explanation: Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function.

4. Which of the following is not a machine learning algorithm?
a) SVG
b) SVM
c) Random forest
d) None of the mentioned

Answer: a
Explanation: SVM stands for scalable vector machine.

5. Point out the wrong statement.
a) ROC curve stands for receiver operating characteristic
b) Foretime series, data must be in chunks
c) Random sampling must be done with replacement
d) None of the mentioned

Answer: d
Explanation: Random sampling with replacement is the bootstrap.

6. Which of the following is a categorical outcome?
a) RMSE
b) RSquared
c) Accuracy
d) All of the mentioned

Answer: c
Explanation: RMSE stands for Root Mean Squared Error.

7. For k cross-validation, larger k value implies more bias.
a) True
b) False

Answer: b
Explanation: For k cross-validation, larger k value implies less bias.

8. Which of the following method is used for trainControl resampling?
a) repeatedcv
b) svm
c) bag32
d) none of the mentioned

Answer: a
Explanation: repeatedcv stands for repeated cross-validation.

9. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned

Answer: a
Explanation: qplot() is short for a quick plot.

10. For k cross-validation, smaller k value implies less variance.
a) True
b) False

Answer: a
Explanation: Larger k value implies more variance.