250+ TOP MCQs on Regular Expressions and Text Variables and Answers

Data Science Multiple Choice Questions on “Regular Expressions and Text Variables”.

1. Which of the following function is good for the automatic splitting of names?
a) split
b) strsplit
c) autsplit
d) none of the mentioned

Answer: b
Explanation: strsplit split a character string or vector of character strings using a regular expression or a literal string.

2. Point out the correct statement.
a) gsub is used for fixing character vectors
b) sub is used for finding values like grep
c) grep is used for fixing character vectors
d) none of the mentioned

Answer: a
Explanation: sub and gsub is used for fixing character vectors.

3. Which of the following function is used for fixing character vectors?
a) tolower
b) toUPPER
c) toLOWER
d) all of the mentioned

Answer: a
Explanation: It translates character to lowercase.

4. Which of the following metacharacter is used to refer to any character?
a) %
b) @
c) .
d) All of the mentioned

Answer: c
Explanation: A dot in function name can mean any of the following: nothing at all; a separator between method and class in S3 method.

5. Point out the wrong statement.
a) Variables with character values should be made less descriptive
b) Variables with character values should usually be made into factor variable
c) Common variables are used to apply transforms
d) All of the mentioned

Answer: a
Explanation: Variables with character values should be made more descriptive.

6. Which of the following is used for specifying character class with metacharacter?
a) []
b) {}
c) /+
d) All of the mentioned

Answer: a
Explanation: You can list set of characters to accept a given point in the match.

7. Regular expressions can be thought of as a combination of literals and metacharacters.
a) True
b) False

Answer: a
Explanation: Regular expressions have rich set of metacharacters.

8. Which of the following signs are used to indicate repetition?
a) #
b) *
c) –
d) All of the mentioned

Answer: b
Explanation: * and + are metacharacters for repetition of data.

9. Which of the following function is used for searching text strings by means of regular expression?
a) grepd
b) grepl
c) gepexpr
d) all of the mentioned

Answer: b
Explanation: grep, grepl, regexpr, gregexpr and regexec search for matches to argument pattern within each element of a character vector.

10. merge function is used for merging data frames.
a) True
b) False

Answer: a
Explanation: To merge two data frames horizontally, use the merge function.

250+ TOP MCQs on Residual Variation and Multivariate and Answers

Data Science Multiple Choice Questions on “Residual Variation and Multivariate”.

1. Which of the following is the correct formula for total variation?
a) Total Variation = Residual Variation – Regression Variation
b) Total Variation = Residual Variation + Regression Variation
c) Total Variation = Residual Variation * Regression Variation
d) All of the mentioned

Answer: b
Explanation: The complementary part of the total variation is called unexplained or residual.

2. Point out the correct statement.
a) A standard error is needed to create a prediction interval
b) The prediction interval must incorporate the variability in the data around the line
c) Investors use the residual variance to measure the accuracy of their predictions on the value of an asset
d) All of the mentioned

Answer: d
Explanation: In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation of a given data set.

3. Which of the following things can be accomplished with linear model?
a) Flexibly fit complicated functions
b) Uncover complex multivariate relationships
c) Build accurate prediction models
d) All of the mentioned

Answer: d
Explanation: Linear models are the single most important applied statistical and machine learning technique.

4. Which of the following statement is incorrect with respect to outliers?
a) Outliers can have varying degrees of influence
b) Outliers can be the result of spurious or real processes
c) Outliers cannot conform to the regression relationship
d) None of the mentioned

Answer: c
Explanation: Outliers can conform to the regression relationship.

5. Point out the wrong statement.
a) The fraction of variance unexplained is an established concept in the context of linear regression
b) “Explained variance” is routinely used in principal component analysis
c) The general linear model extends simple linear regression (SLR) by adding terms linearly into the model
d) None of the mentioned

Answer: d
Explanation: Linearity refers to a mathematical relationship or function that can be graphically represented as a straight line.

6. Which of the following can be useful for diagnosing data entry errors?
a) hat values
b) dffit
c) resid
d) all of the mentioned

Answer: a
Explanation: resid returns the ordinary residuals.

7. Multivariate regression estimates are exactly those having removed the linear relationship of the other variables from both the regressor and response.
a) True
b) False

Answer: a
Explanation: Multivariate Data Analysis refers to any statistical technique used to analyze data that arises from more than one variable.

8. Residual ______ plots investigate normality of the errors.
a) RR
b) PP
c) QQ
d) None of the mentioned

Answer: c
Explanation: Patterns in your residual plots generally indicate some poor aspect of model fit.

9. Which of the following show residuals divided by their standard deviations?
a) rstudent
b) cooks.distance
c) rstandard
d) all of the mentioned

Answer: c
Explanation: rstandard stands for standardized residuals.

10. The least squares estimate for the coefficient of a multivariate regression model is exactly regression through the origin with the linear relationships.
a) True
b) False

Answer: b
Explanation: Multivariate regression adjusts a coefficient for the linear impact of the other variables.

250+ TOP MCQs on Types of Questions and Answers

Data Science Questions and Answers for freshers on “Types of Questions”

1. Accurate prediction depends heavily on measuring the right variables.
a) True
b) False

Answer: a
Explanation: Prediction is very hard, especially for future references.

2. Point out the correct statement.
a) Descriptive analysis can be more useful for defining future studies
b) Correlation does imply causation
c) Inference is commonly the goal of statistical model
d) None of the mentioned

Answer: b
Explanation: Inference depends heavily on the sampling scheme.

3. Which of the following uses relatively small amount of data to estimate about bigger population?
a) Inferential
b) Exploratory
c) Causal
d) None of the mentioned

Answer: a
Explanation: Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample.

4. Which of the following analysis helps out to find the effect of variable change?
a) Inferential
b) Exploratory
c) Causal
d) None of the mentioned

Answer: c
Explanation: Causal Analysis provides the real reason why things happen and hence allows focused change activity.

5. Point out the correct statement.
a) Exploratory analyses are not usually the final way
b) Inferential models are useful for discovering new connection
c) Inference involves estimating uncertainty
d) All of the mentioned

Answer: c
Explanation: Statistical inference is the process of deducing properties of an underlying distribution by analysis of data.

6. Which of the following relationship are usually identified as average effects?
a) Descriptive
b) Causal
c) Predictive
d) None of the mentioned

Answer: b
Explanation: A correlation is a measure or degree of relationship between two variables.

7. Which of the following analysis is usually modeled by deterministic set of equations?
a) Predictive
b) Causal
c) Mechanistic
d) All of the mentioned

Answer: c
Explanation: Equations are based on physical/engineering science.

8. Which of the following analysis are incredibly hard to infer?
a) Inferential
b) Exploratory
c) Causal
d) Mechanistic

Answer: d
Explanation: Mechanistic analysis are hard to infer except for simple simulations.

250+ TOP MCQs on Graphics Devices and Answers

Advanced Data Science Questions & Answers focuses on “Graphics Devices”.

1. The most familiar place for a plot to be “sent” is screen device.
a) True
b) False

Answer: a
Explanation: On Linux, the screen device is launched with x11 function.

2. Point out the correct statement.
a) On Mac, the screen device is launched with quartz
b) On Windows, the screen device is launched with wind
c) On Unix, the screen device is launched with x12
d) All of the mentioned

Answer: a
Explanation: On Windows, the screen device is launched with window function.

3. Which of the following is an example of graphics device?
a) PDF
b) SVG
c) JPEG
d) All of the mentioned

Answer: d
Explanation: When the plot() function is invoked, R sends the data corresponding to the plot over, and the graphics device generates the plot.

4. Which of the following file format is graphic device only for windows?
a) pdf
b) svg
c) win.metafile
d) all of the mentioned

Answer: c
Explanation: Exporting graphics to a Windows MetaFile can be achieved via the win.metafile.

5. Point out the wrong statement.
a) For quick visualizations and exploratory analysis, usually you want to use the screen device
b) Functions like xyplot in lattice will not default to sending a plot to the screen device
c) Not all graphics devices are available on all platforms
d) None of the mentioned

Answer: b
Explanation: window function cannot be used on Mac.

6. Which of the following system most often don’t have postscript viewer?
a) Windows
b) Linux
c) Mac
d) All of the mentioned

Answer: a
Explanation: postscript is older format but it resizes well.

7. There are mainly three types of file devices.
a) True
b) False

Answer: b
Explanation: There are mainly basic types of file devices-vector and bitmap.

8. Which of the following is a bitmap file type?
a) tiff
b) svg
c) pdf
d) none of the mentioned

Answer: c
Explanation: TIFF is a computer file format for storing raster graphics images.

9. Which of the following function displays currently active graphics device?
a) dev.present
b) dev.cur
c) pre.cur
d) all of the mentioned

Answer: b
Explanation: You can change the active graphics device with dev.set.

250+ TOP MCQs on Binary and Count Outcomes and Answers

Data Science Multiple Choice Questions on “Binary and Count Outcomes”.

1. How many components are present in generalized linear models?
a) 2
b) 4
c) 6
d) None of the mentioned

Answer: d
Explanation: Generalized linear models involve three components.

2. Point out the wrong statement.
a) Additive response models don’t make much sense if the response is discrete, or strictly positive
b) Transformations are often easy to interpret in linear model
c) Regression models are used to predict one variable from one or more other variables
d) All of the mentioned

Answer: b
Explanation: Transformations are often hard to interpret in linear model.

3. Which of the following component is involved in generalized linear models?
a) An exponential family model for the response
b) A systematic component via a linear predictor
c) A link function that connects the means of the response to the linear predictor
d) All of the mentioned

Answer: d
Explanation: GLM is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.

4. Collection of exchangeable binary outcomes for the same covariate data are called _______ outcomes.
a) random
b) direct
c) binomial
d) none of the mentioned

Answer: c
Explanation: The multivariate regression model for binary outcomes gives odds ratios, not risk ratios.

5. Point out the wrong statement.
a) Asymptotics are used for inference usually
b) Adding squared terms makes it continuously differentiable at the knot points
c) Adding squared terms makes it twice continuously differentiable at the knot points
d) None of the mentioned

Answer: c
Explanation: Adding cubic terms makes it twice continuously differentiable at the knot points.

6. Which of the following is example use of Poisson distribution?
a) Analyzing contingency table data
b) Modeling web traffic hits
c) Incidence rates
d) All of the mentioned

Answer: d
Explanation: The Poisson distribution is a useful model for counts and rates.

7. Principal components or factor analytic models on covariates are often useful for reducing complex covariate spaces.
a) True
b) False

Answer: a
Explanation: The space of models explodes quickly as you add interactions and polynomial terms.

8. How many outcomes are possible with bernoulli trial?
a) 2
b) 3
c) 4
d) None of the mentioned

Answer: a
Explanation: Bernoulli trial is a random experiment with exactly two possible outcomes.

9. Which of the following analysis is a statistical process for estimating the relationships among variables?
a) Causal
b) Regression
c) Multivariate
d) All of the mentioned

Answer: b
Explanation: Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events.

10. Linear models are the most useful applied statistical technique.
a) True
b) False

Answer: b
Explanation: Linear model do have limitations.

250+ TOP MCQs on Big Data and Answers

Tough Data Science Questions on “Big Data”.

1. Beyond Volume, variety and velocity are the issues of big data veracity.
a) True
b) False

Answer: a
Explanation: Data Veracity is uncertain or imprecise data.

2. Point out the correct statement.
a) Machine learning focuses on prediction, based on known properties learned from the training data
b) Data Cleaning focuses on prediction, based on known properties learned from the training data
c) Representing data in a form which both mere mortals can understand and get valuable insights is as much a science as much as it is art
d) None of the mentioned

Answer: d
Explanation: Visualization is becoming a very important aspect.

3. Which of the following characteristic of big data is relatively more concerned to data science?
a) Velocity
b) Variety
c) Volume
d) None of the mentioned

Answer: b
Explanation: Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time.

4. Which of the following analytical capabilities are provided by information management company?
a) Stream Computing
b) Content Management
c) Information Integration
d) All of the mentioned

Answer: d
Explanation: With stream computing, store less, analyze more and make better decisions faster.

5. Point out the wrong statement.
a) The big volume indeed represents Big Data
b) The data growth and social media explosion have changed how we look at the data
c) Big Data is just about lots of data
d) All of the mentioned

Answer: c
Explanation: Big Data is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data.

6. Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned

Answer: a
Explanation: Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.

7. 3V’s are not sufficient to describe big data.
a) True
b) False

Answer: a
Explanation: IBM data scientists break big data into four dimensions: volume, variety, velocity and veracity.

8. Which of the following focuses on the discovery of (previously) unknown properties on the data?
a) Data mining
b) Big Data
c) Data wrangling
d) Machine Learning

Answer: a
Explanation: Data munging or data wrangling is loosely the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.