Tough Data Science Questions on “Big Data”.
1. Beyond Volume, variety and velocity are the issues of big data veracity.
a) True
b) False
Answer: a
Explanation: Data Veracity is uncertain or imprecise data.
2. Point out the correct statement.
a) Machine learning focuses on prediction, based on known properties learned from the training data
b) Data Cleaning focuses on prediction, based on known properties learned from the training data
c) Representing data in a form which both mere mortals can understand and get valuable insights is as much a science as much as it is art
d) None of the mentioned
Answer: d
Explanation: Visualization is becoming a very important aspect.
3. Which of the following characteristic of big data is relatively more concerned to data science?
a) Velocity
b) Variety
c) Volume
d) None of the mentioned
Answer: b
Explanation: Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time.
4. Which of the following analytical capabilities are provided by information management company?
a) Stream Computing
b) Content Management
c) Information Integration
d) All of the mentioned
Answer: d
Explanation: With stream computing, store less, analyze more and make better decisions faster.
5. Point out the wrong statement.
a) The big volume indeed represents Big Data
b) The data growth and social media explosion have changed how we look at the data
c) Big Data is just about lots of data
d) All of the mentioned
Answer: c
Explanation: Big Data is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data.
6. Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned
Answer: a
Explanation: Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
7. 3V’s are not sufficient to describe big data.
a) True
b) False
Answer: a
Explanation: IBM data scientists break big data into four dimensions: volume, variety, velocity and veracity.
8. Which of the following focuses on the discovery of (previously) unknown properties on the data?
a) Data mining
b) Big Data
c) Data wrangling
d) Machine Learning
Answer: a
Explanation: Data munging or data wrangling is loosely the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.