250+ TOP MCQs on Pandas Data Structure and Answers

Data Science Multiple Choice Questions on “Pandas Data Structure”.

1. Which of the following thing can be data in Pandas?
a) a python dict
b) an ndarray
c) a scalar value
d) all of the mentioned

Answer: d
Explanation: The passed index is a list of axis labels.

2. Point out the correct statement.
a) If data is a list, if index is passed the values in data corresponding to the labels in the index will be pulled out
b) NaN is the standard missing data marker used in pandas
c) Series acts very similarly to a array
d) None of the mentioned

Answer: b
Explanation: If data is a dict, if index is passed the values in data corresponding to the labels in the index will be pulled out.

3. The result of an operation between unaligned Series will have the ________ of the indexes involved.
a) intersection
b) union
c) total
d) all of the mentioned

Answer: b
Explanation: If a label is not found in one Series or the other, the result will be marked as missing NaN.

4. Which of the following input can be accepted by DataFrame?
a) Structured ndarray
b) Series
c) DataFrame
d) All of the mentioned

Answer: d
Explanation: DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

5. Point out the wrong statement.
a) A DataFrame is like a fixed-size dict in that you can get and set values by index label
b) Series can be be passed into most NumPy methods expecting an ndarray
c) A key difference between Series and ndarray is that operations between Series automatically align the data based on label
d) None of the mentioned

Answer: a
Explanation: A Series is like a fixed-size dict in that you can get and set values by index label.

6. Which of the following takes a dict of dicts or a dict of array-like sequences and returns a DataFrame?
a) DataFrame.from_items
b) DataFrame.from_records
c) DataFrame.from_dict
d) All of the mentioned

Answer: a
Explanation: DataFrame.from_dict operates like the DataFrame constructor except for the orient parameter which is ‘columns’ by default.

7. Series is a one-dimensional labeled array capable of holding any data type.
a) True
b) False

Answer: a
Explanation: The axis labels are collectively referred to as the index.

8. Which of the following works analogously to the form of the dict constructor?
a) DataFrame.from_items
b) DataFrame.from_records
c) DataFrame.from_dict
d) All of the mentioned

Answer: a
Explanation: DataFrame.from_records takes a list of tuples or an ndarray with structured dtype.

9. Which of the following operation works with the same syntax as the analogous dict operations?
a) Getting columns
b) Setting columns
c) Deleting columns
d) All of the mentioned

Answer: d
Explanation: You can treat a DataFrame semantically like a dict of like-indexed Series objects.

10. If data is an ndarray, index must be the same length as data.
a) True
b) False

Answer: a
Explanation: If no index is passed, one will be created having values [0, …, len(data) – 1].

250+ TOP MCQs on knitr and Answers

Data Science Multiple Choice Questions on “knitr”.

1. Which of the following is suitable for knitr?
a) Reports
b) Data preprocessing documents
c) Technical manuals
d) All of the mentioned

Answer: a
Explanation: knitr has short technical documents.

2. Point out the correct combination related to output statements.
a) results: “asis”
b) echo: true
c) echo=false
d) none of the mentioned

Answer: a
Explanation: Global option relating to echo have values TRUE and FALSE.

3. Which of the following is required for not echoing the code?
a) echo=TRUE
b) print=TRUE
c) echo=FALSE
d) all of the mentioned

Answer: a
Explanation: Code has to be written to set the global options.

4. Which of the following global options are available for figures in knitr?
a) fig.height
b) fig.size
c) fig.breadth
d) all of the mentioned

Answer: a
Explanation: fig.height has numeric value.

5. Which of the following global option has value “hide”?
a) results
b) fig.width
c) echo
d) none of the mentioned

Answer: a
Explanation: Workflow R Markdown is a format for writing reproducible, dynamic reports with R.

6. Which of the following is the correct order of conversion?
a) .md->.Rmd->.html
b) .Rmd->.md->.html
c) .Rmd->.md->.xml
d) all of the mentioned

Answer: a
Explanation: knitr converts markdown document in to html by default.

7. knitr is good for complex time-consuming computations.
a) True
b) False

Answer: b
Explanation: knitr is poor for complex time-consuming computations.

8. Which of the following statement is used for importing knitr library?
a) library(knitr)
b) import knitr
c) lib(knitr)
d) none of the mentioned

Answer: a
Explanation: knitr is not good for documents that require precise formatting.

9. The document produced by knitr document has which of the following extension?
a) .md
b) .rmd
c) .html
d) none of the mentioned

Answer: b
Explanation: knitr produces markdown document.

10. Code chunks begin with “`{r} and end with “`.
a) True
b) False

Answer: a
Explanation: Code chunks can have names.

250+ TOP MCQs on Predicting with Regression and Answers

Data Science Multiple Choice Questions on “Predicting with Regression”.

1. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned

Answer: b
Explanation: Predicting with trees is easy to interpret.

2. Point out the wrong statement.
a) Training and testing data must be processed in different way
b) Test transformation would mostly be imperfect
c) The first goal is statistical and second is data compression in PCA
d) All of the mentioned

Answer: a
Explanation: Training and testing data must be processed in same way.

3. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned

Answer: d
Explanation: Bagging can be done using bag function as well.

4. Which of the following is correct with respect to random forest?
a) Random forest are difficult to interpret but often very accurate
b) Random forest are easy to interpret but often very accurate
c) Random forest are difficult to interpret but very less accurate
d) None of the mentioned

Answer: a
Explanation: Random forest is top performing algorithm in prediction.

5. Point out the correct statement.
a) Prediction with regression is easy to implement
b) Prediction with regression is easy to interpret
c) Prediction with regression performs well when linear model is correct
d) All of the mentioned

Answer: d
Explanation: Prediction with regression gives poor performance in non linear settings.

6. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned

Answer: a
Explanation: Boosting can be used with any subset of classifier.

7. The principal components are equal to left singular values if you first scale the variables.
a) True
b) False

Answer: b
Explanation: The principal components are equal to left singular values if you first scale the variables.

8. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost

Answer: a
Explanation: mboost is used for model based boosting.

9. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned

Answer: b
Explanation: R has multiple boosting libraries.

10. PCA is most useful for non linear type models.
a) True
b) False

Answer: b
Explanation: PCA is most useful for linear type models.

250+ TOP MCQs on Pandas and Answers

Data Science question bank focuses on “Pandas”.

1. All pandas data structures are ___ mutable but not always _______mutable.
a) size, value
b) semantic, size
c) value, size
d) none of the mentioned

Answer: c
Explanation: The length of a Series cannot be changed.

2. Point out the correct statement.
a) Pandas consist of set of labeled array data structures
b) Pandas consist of an integrated group by engine for aggregating and transforming data sets
c) Pandas consist of moving window statistics
d) All of the mentioned

Answer: d
Explanation: Some elements may be close to one another according to one distance and farther away according to another.

3. Which of the following statement will import pandas?
a) import pandas as pd
b) import panda as py
c) import pandaspy as pd
d) all of the mentioned

Answer: a
Explanation: You can read data from a CSV file using the read_csv function.

4. Which of the following object you get after reading CSV file?
a) DataFrame
b) Character Vector
c) Panel
d) All of the mentioned

Answer: a
Explanation: You get columns out of a DataFrame the same way you get elements out of a dictionary.

5. Point out the wrong statement.
a) Series is 1D labeled homogeneously-typed array
b) DataFrame is general 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed columns
c) Panel is generally 2D labeled, also size-mutable array
d) None of the mentioned

Answer: c
Explanation: Panel is generally 3D labeled.

6. Which of the following library is similar to Pandas?
a) NumPy
b) RPy
c) OutPy
d) None of the mentioned

Answer: a
Explanation: NumPy is the fundamental package for scientific computing with Python.

7. Panel is a container for Series, and DataFrame is a container for dataFrame objects.
a) True
b) False

Answer: b
Explanation: DataFrame is a container for Series, and panel is a container for dataFrame objects.

8. Which of the following is prominent python “statistics and econometrics library”?
a) Bokeh
b) Seaborn
c) Statsmodels
d) None of the mentioned

Answer: c
Explanation: Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies.

9. Which of the following is a foundational exploratory visualization package for the R language in pandas ecosystem?
a) yhat
b) Seaborn
c) Vincent
d) None of the mentioned

Answer: a
Explanation: It has great support for pandas data objects.

10. Pandas consist of static and moving window linear and panel regression.
a) True
b) False

Answer: a
Explanation: Time series and cross-sectional data are special cases of panel data.

250+ TOP MCQs on Literate Statistical Programming and Answers

Data Science Multiple Choice Questions on “Literate Statistical Programming”.

1. What is the role of processing code in the research pipeline?
a) Transforms the analytical results into figures and tables
b) Transforms the analytic data into measured data
c) Transforms the measured data into analytic data
d) All of the mentioned

Answer: c
Explanation: Data science workflow is a non-linear, iterative process.

2. Which of the following is a goal of literate statistical programming?
a) Combine explanatory text and data analysis code in a single document
b) Ensure that data analysis documents are always exported in JPEG format
c) Require those data analysis summaries are always written in R
d) None of the mentioned

Answer: a
Explanation: Literate Statistical Practice is a programming methodology.

3. What does it mean to weave a literate statistical program?
a) Convert a program from S to python
b) Convert the program into a human readable document
c) Convert a program to decompress it
d) All of the mentioned

Answer: b
Explanation: Literate Statistical Programming can be done with knitr.

4. Which of the following is required to implement a literate programming system?
a) A programming language like Perl
b) A programming language like Java
c) A programming language like R
d) All of the mentioned

Answer: c
Explanation: R is a language and environment for statistical computing and graphics.

5. What is one way in which the knitr system differs from Sweave?
a) knitr allows for the use of markdown instead of LaTeX
b) knitr is written in python instead of R
c) knitr lacks features like caching of code chunks
d) none of the mentioned

Answer: a
Explanation: knitr is an engine for dynamic report generation with R.

6. Which of the following is useful way to put text, code, data, output all in one document?
a) Literate statistical programming
b) Object oriented programming
c) Descriptive programming
d) All of the mentioned

Answer: a
Explanation: Object-oriented programming is a programming language model organized around objects rather than “actions” and data rather than logic.

7. Some chunks have to be re-computed every time you re-knit the file.
a) True
b) False

Answer: b
Explanation: All chunks have to be re-computed every time you re-knit the file.

8. Which of the following tool can be used for integrating text and code in one document?
a) knitr
b) ggplot2
c) NumPy
d) None of the mentioned

Answer: a
Explanation: knitr is a way to write LaTeX, HTML, and Markdown with R code interlaced.

9. Which of the following should be set on chunk by chunk basis to store results of computation?
a) cache=TRUE
b) cache=FALSE
c) caching=TRUE
d) none of the mentioned

Answer: a
Explanation: After the first run. The results are loaded from cache.

10. Dependencies are checked explicitly in caching caveats.
a) True
b) False

Answer: b
Explanation: Dependencies are not checked explicitly in caching caveats.

250+ TOP MCQs on Model Based Prediction and Answers

Data Science Multiple Choice Questions on “Model Based Prediction”.

1. Which of the following is correct about regularized regression?
a) Can help with bias trade-off
b) Cannot help with model selection
c) Cannot help with variance trade-off
d) All of the mentioned

Answer: a
Explanation: Regularized regression does not perform as well as random forest.

2. Point out the wrong statement.
a) Model based approach may be computationally convenient
b) Model based approach use Bayes theorem
c) Model based approach are reasonably inaccurate on real problems
d) All of the mentioned

Answer: c
Explanation: Model based approach are reasonably accurate on real problems.

3. Which of the following methods are present in caret for regularized regression?
a) ridge
b) lasso
c) relaxo
d) all of the mentioned

Answer: d
Explanation: In caret one can tune over the no of predictors to retain instead of defined values for penalty.

4. Which of the following method can be used to combine different classifiers?
a) Model stacking
b) Model combining
c) Model structuring
d) None of the mentioned

Answer: a
Explanation: Model ensembling is also used for combining different classifiers.

5. Point out the correct statement.
a) Combining classifiers improves interpretability
b) Combining classifiers reduces accuracy
c) Combining classifiers improves accuracy
d) All of the mentioned

Answer: c
Explanation: You can combine classifier by averaging.

6. Which of the following function provides unsupervised prediction?
a) cl_forecast
b) cl_nowcast
c) cl_precast
d) none of the mentioned

Answer: d
Explanation: cl_predict function is clue package provides unsupervised prediction.

7. Model based prediction considers relatively easy version for covariance matrix.
a) True
b) False

Answer: b
Explanation: Model based prediction considers relatively easy version for covariance matrix.

8. Which of the following is used to assist the quantitative trader in the development?
a) quantmod
b) quantile
c) quantity
d) mboost

Answer: a
Explanation: Quandl package is similar to quantmod.

9. Which of the following function can be used for forecasting?
a) predict
b) forecast
c) ets
d) all of the mentioned

Answer: b
Explanation: Forecasting is the process of making predictions of the future based on past and present data and analysis of trends.

10. Predictive analytics is same as forecasting.
a) True
b) False

Answer: b
Explanation: Predictive analytics goes beyond forecasting.