DATA MINING Multiple Choice Questions (MCQs)
1. The problem of finding hidden structure in unlabeled data is called
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
Answer: B
2. Task of inferring a model from labeled training data is called
A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
Answer: B
3. Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
Answer: D
4. Self-organizing maps are an example of
A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
D. Missing data imputation
Answer: A
5. You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Answer: A
6. Assume you want to perform supervised learning and to predict number of newborns according to size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
Answer: B
7. Discriminating between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Answer: A
8. In the example of predicting number of babies based on storks’ population size, number of babies is
A. outcome
B. feature
C. attribute
D. observation
Answer: A
9. It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.
A. True
B. False
Answer: B
10. which of the following is not involve in data mining?
A. Knowledge extraction
B. Data archaeology
C. Data exploration
D. Data transformation
Answer: D
11. Which is the right approach of Data Mining?
A. Infrastructure, exploration, analysis, interpretation, exploitation
B. Infrastructure, exploration, analysis, exploitation, interpretation
C. Infrastructure, analysis, exploration, interpretation, exploitation
D. Infrastructure, analysis, exploration, exploitation, interpretation
Answer: A
12. Which of the following issue is considered before investing in Data Mining?
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Answer: D
13. Adaptive system management is
A. It uses machine-learning techniques. Here program can learn from past experience and adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as output.
C. Science of making machines performs tasks that would require intelligence when performed by humans
D. none of these
Answer: A
14. Bayesian classifiers is
A. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
C. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Answer: A
15. Algorithm is
A. It uses machine-learning techniques. Here program can learn from past experience and adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as output
C. Science of making machines performs tasks that would require intelligence when performed by humans
D. None of these
Answer: B
16. Bias is
A.A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
C. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Answer: B
17. Background knowledge referred to
A. Additional acquaintance used by a learning algorithm to facilitate the learning process
B. A neural network that makes use of a hidden layer
C. It is a form of automatic learning.
D. None of these
Answer: A
18. Case-based learning is
A. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
c. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Answer: C
19. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Answer: A
20. Binary attribute are
A. This takes only two values. In general, these values will be 0 and 1 and .they can be coded as one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Answer: A
21. Classification accuracy is
A. A subdivision of a set of examples into a number of classes
B. Measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Answer: B
22. Biotope are
A. This takes only two values. In general, these values will be 0 and 1
and they can be coded as one bit.
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Answer: B
23. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be extracted
D. None of these
Answer: A
24. Black boxes are
A. This takes only two values. In general, these values will be 0 and 1
and they can be coded as one bit.
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
Answer: C
25. A definition of a concept is if it recognizes all the instances of that concept
A. Complete
B. Consistent
C. Constant
D. None of these
Answer: A
26. Data mining is
A. The actual discovery phase of a knowledge discovery process
B. The stage of selecting the right data for a KDD process
C. A subject-oriented integrated time variant non-volatile collection of data in support of management
D. None of these
Answer: A
27. A definition or a concept is if it classifies any examples as coming within the concept
A. Complete
B. Consistent
C. Constant
D. None of these
Answer: B
28. Data independence means
A. Data is defined separately and not included in programs
B. Programs are not dependent on the physical attributes of data.
C. Programs are not dependent on the logical attributes of data
D. Both (B) and (C).
Answer: D
29. E-R model uses this symbol to represent weak entity set?
A. Dotted rectangle
B. Diamond
C. Doubly outlined rectangle
D. None of these
Answer: C
30. SET concept is used in
A. Network Model
B. Hierarchical Model
C. Relational Model
D. None of these
Answer: D
31. Relational Algebra is
A. Data Definition Language
B. Meta Language
C. Procedural query Language
D. None of the above
Answer: C
32. Key to represent relationship between tables is called
A. Primary key
B. Secondary Key
C. Foreign Key
D. None of these
Answer: C
33. ________ produces the relation that has attributes of Ri and R2
A. Cartesian product
B. Difference
C. Intersection
D. Product
Answer: A
34. Which of the following are the properties of entities?
A. Groups
B. Table
C. Attributes
D. Switchboards
Answer: C
35. In a relation
A. Ordering of rows is immaterial
B. No two rows are identical
C. (A) and (B) both are true
D. None of these
Answer: C
36. Inductive logic programming is
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n-dimensional space
C. A prediction made using an extremely simple method, such as always predicting the same output
D. None of these
37. Machine learning is
A. An algorithm that can learn
B. A sub-discipline of computer science that deals with the design and implementation of learning algorithms
C. An approach that abstracts from the actual strategy of an individual algorithm and can therefore be applied to any other form of machine learning.
D. None of these
38. Projection pursuit is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen by the designer as the primary means of accessing the data in the table.
C. Discipline in statistics that studies ways to find the most interesting projections of multi-dimensional spaces
D. None of these
39. Node is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
40. Statistical significance is
A. The science of collecting, organizing, and applying numerical facts
B. Measure of the probability that a certain hypothesis is incorrect given certain observations.
C. One of the defining aspects of a data warehouse, which is specially built around all the existing applications of the operational data
D. None of these
41. Multi-dimensional knowledge is
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n-dimensional space
C. A prediction made using an extremely simple method, such as always predicting the same output.
D. None of these
42. Noise is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
43. Query tools are
A. A reference to the speed of an algorithm, which is quadratically dependent on the size of the data
B. Attributes of a database table that can take only numerical values.
C. Tools designed to query a database.
D. None of these
44. Operational database is
A. A measure of the desired maximal complexity of data mining algorithms
B. A database containing volatile data used for the daily operation of an organization
C. Relational database management system
D. None of these
45. Prediction is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen by the designer as the primary means of accessing the data in the table.
C. Discipline in statistics that studies ways to find the most interesting projections of multi-dimensional spaces.
D. None of these