Data Analytics MCQ''s Question Bank

 

DATA ANALYTICS MCQ's

Use CTRL+F to Search by Keyword

Question Bank with answers... 

-------------------------------------------

1. Business intelligence (BI) is a broad category of application programs which
includes __
1. Decision support
2. Data mining
3. OLAP
4. All of the mentioned
Show Answer
All of the mentioned
-------------------------------------------
2. BI can catalyze a business’s success in terms of _
1. Distinguish the products and services that drive revenues
2. Rank customers and locations based on profitability
3. Ranks customers and locations based on probability
4. All of the mentioned
Show Answer
All of the mentioned
-------------------------------------------
3. Which of the following areas are affected by BI?
1. Revenue
2. CRM
3. Sales
4. All of the mentioned
Show Answer
CRM(Customer relationship management)
-------------------------------------------
4. ___ is a performance management tool that recapitulates an organization’s
performance from several standpoints on a single page
1. Balanced Scorecard
2. Data Cube
3. Dashboard
4. All of the mentioned
Show Answer
Balanced Scorecard
-------------------------------------------
5. __ is a system where operations like data extraction, transformation and oading
operations are executed.
1. Data staging
2. Data integration
3. ETL
4. None of the mentioned
Show Answer
Data staging
-------------------------------------------
6. ______ is a category of applications and technologies for presenting and analyzing
corporate and external data.
1. Data warehouse
2. MIS
3. EIS
4. All of the mentioned
Show Answer
EIS(Enterprise Information System)
-------------------------------------------
7. Which of the following is the process of basing an organization’s actions and
decisions on actual measured results of performance?
1. Institutional performance management
2. Gap analysis
3. Slice and Dice
4. None of the mentioned
Show Answer
Institutional performance management
-------------------------------------------
8. Which of the following does not form part of BI Stack in SQL Server?
1. SSRS
2. SSIS
3. SSAS
4. OBIEE
Show Answer
OBIEE
-------------------------------------------
9. BI can catalyze a business’s success in terms of ____
1. Distinguish the products and services that drive revenues
2. Rank customers and locations based on profitability
3. Ranks customers and locations based on probability
4. All of the mentioned
Show Answer
All of the mentioned
-------------------------------------------
10. This is an approach to selling goods and services in which a prospect explicitly
agrees in advance to receive marketing information
1. customer managed relationship
2. data mining
3. permission marketing
4. one-to-one marketing
Show Answer
Data analytics mcq with answers pdf
-------------------------------------------
11. In an Internet context, this is the practice of tailoring Web pages to individual
users’ characteristics or preferences.
1. Web services
2. customer-facing
3. client/server
4. personalization
Show Answer
personalization
-------------------------------------------
12. This is the processing of data about customers and their relationship with the
enterprise in order to improve the enterprise’s future sales and service and lower
cost.
1. clickstream analysis
2. database marketing
3. customer relationship management
4. CRM analytics
Show Answer
CRM analytics
-------------------------------------------
13. This is a broad category of applications and technologies for gathering, storing,
analyzing, and providing access to data to help enterprise users make better
business decisions.
1. best practice
2. data mart
3. business information warehouse
4. business intelligence
Show Answer
business intelligence
-------------------------------------------
14. This is a systematic approach to the gathering, consolidation, and processing of
consumer data (both for customers and potential customers) that is maintained in
a company’s databases
1. database marketing
2. marketing encyclopedia
3. application integration
4. service oriented integration
Show Answer
database marketing
-------------------------------------------
15. This is an arrangement in which a company outsources some or all of its
customer relationship management functions to an application service provider
(ASP).
1. spend management
2. supplier relationship management
3. hosted CRM
4. Customer Information Control System
Show Answer
hosted CRM
-------------------------------------------
16. This is an XML-based metalanguage developed by the Business Process
Management Initiative (BPMI) as a means of modeling business processes, much as
XML is, itself, a metalanguage with the ability to model enterprise data.
1. BizTalk
2. BPML
3. e-biz
4. ebXML b
Show Answer
BPML
-------------------------------------------
17. This is a central point in an enterprise from which all customer contacts are
managed.
1. contact center
2. help system
3. multichannel marketing
4. call center
Show Answer
contact center
-------------------------------------------
18. This is the practice of dividing a customer base into groups of individuals that
are similar in specific ways relevant to marketing, such as age, gender, interests,
spending habits, and so on.
1. customer service chat
2. customer managed relationship
3. customer life cycle
4. customer segmentation
Show Answer
customer segmentation
-------------------------------------------
19. In data mining, this is a technique used to predict future behavior and
anticipate the consequences of change.
1. predictive technology
2. disaster recovery
3. phase change
4. predictive modeling
Show Answer
predictive modeling
-------------------------------------------
20. According to analysts, for what can traditional IT systems provide a foundation
when they’re integrated with big data technologies like Hadoop?”
1. Big data management and data mining
2. Data warehousing and business intelligence
3. Management of Hadoop clusters
4. Collecting and storing unstructured data
Show Answer
Big data management and data mining
-------------------------------------------
data analytics mcq questions and answers
21. All of the following accurately describe Hadoop, EXCEPT:
1. Open source
2. Real-time
3. Java-based
4. Distributed computing approach
Show Answer
Real-time
-------------------------------------------
22. ____has the world’s largest Hadoop cluster
1. Apple
2. Datamatics
3. Facebook
4. None of the mentioned
Show Answer
Facebook
-------------------------------------------
23. What are the five V’s of Big Data?
1. Volume
2. velocity
3. Variety
4. All of the above
Show Answer
All of the above
-------------------------------------------
24. ____ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.”
1. Scalding
2. Cascalog
3. Hcatalog
4. Hcalding
Show Answer
Cascalog
-------------------------------------------
25. What are the main components of Big Data?
1. MapReduce
2. HDFS
3. YARN
4. All of these
Show Answer
All of these
-------------------------------------------
26. What are the different features of Big Data Analytics?
1. Open-Source
2. Scalability
3. Data Recovery
4. All the above
Show Answer
All the above
-------------------------------------------
27. Define the Port Numbers for NameNode, Task Tracker and Job Tracker
1. NameNode
2. Task Tracker
3. Job Tracker
4. All of the above
Show Answer
All of the above
-------------------------------------------
28. Facebook Tackles Big Data With ____ based on Hadoop
1. Project Prism
2. Prism
3. ProjectData
4. ProjectBid
Show Answer
Project Prism
-------------------------------------------
29. What is a unit of data that flows through a Flume agent?
1. Record
2. Event
3. Row
4. Log
Show Answer
Event
-------------------------------------------
30. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of
students from a college. Which of the following statement is true in the following
case
1. Feature F1 is an example of nominal variable
2. Feature F1 is an example of ordinal variable
3. It doesn’t belong to any of the above category
4. Both of these
Show Answer
Feature F1 is an example of ordinal variable
-------------------------------------------
31. Which of the following is an example of a deterministic algorithm?
1. PCA
2. K-Means
3. None of the above
4. all of the above
Show Answer
PCA
-------------------------------------------
32. What is the entropy of the target variable?
1. -(5/8 log(5/8) + 3/8 log(3/8))
2. 5/8 log(5/8) + 3/8 log(3/8)
3. 5/8 log(5 8) + 3/8 log(3/8)
4. 5/8 log(3/8) – 3/8 log(5/8)
Show Answer
-(5/8 log(5/8) + 3/8 log(3/8))
-------------------------------------------
33. Point out the correct statement.
1. OLAP is an umbrella term that refers to an assortment of software
applications for analyzing an organization’s raw data for intelligent
decision making
2. Business intelligence equips enterprises to gain business advantage from
data
3. BI makes an organization agile thereby giving it a lower edge in today’s
evolving market condition
4. None of the mentioned
Show Answer
Business intelligence equips enterprises to gain business advantage from data
-------------------------------------------
34. BI can catalyze a business’s success in terms of ____
1. Distinguish the products and services that drive revenues
2. Rank customers and locations based on profitability
3. Ranks customers and locations based on probability
4. All of the mentioned
Show Answer
All of the mentioned
-------------------------------------------
35. Heuristic is
1. A set of databases from different vendors, possibly using different
database paradigms
2. An approach to a problem that is not guaranteed to work but performs
well in most cases
3. Information that is hidden in a database and that cannot be recovered by
a simple SQL query.
4. None of these
Show Answer
An approach to a problem that is not guaranteed to work but performs well in most
cases
-------------------------------------------
36. Heterogeneous databases referred to
1. A set of databases from different b vendors, possibly using different
database paradigms
2. An approach to a problem that is not guaranteed to work but performs
well in most cases.
3. Information that is hidden in a database and that cannot be recovered by
a simple SQL query.
4. None of these
Show Answer
A set of databases from different b vendors, possibly using different database paradigms
-------------------------------------------
 Data analytics UNIT 2
-------------------------------------------
1. Is it possible that Assignment of observations to clusters does not change
between successive iterations in K-Means
1. Yes
2. No
3. Can’t say
4. None of these
Show Answer
Yes
-------------------------------------------
2. Which of the following can act as possible termination conditions in K-Means?
1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between
iterations. Except for cases with a bad local minimum.
3. Centroids do not change between successive iterations.
4. Terminate when RSS falls below a threshold.
5. All of the above
Advertisement
Show Answer
All of the above
-------------------------------------------
3. Which of the following clustering algorithms suffers from the problem of
convergence at local optima?
1. K- Means clustering algorithm
2. Agglomerative clustering algorithm
3. Expectation-Maximization clustering algorithm
4. Diverse clustering algorithm
5. both a and c
Advertisement
Show Answer
both a and c
-------------------------------------------
4. How can Clustering (Unsupervised Learning) be used to improve the accuracy of
Linear Regression model (Supervised Learning):
1. Creating different models for different cluster groups.
2. Creating an input feature for cluster ids as an ordinal variable.
3. Creating an input feature for cluster centroids as a continuous variable.
4. Creating an input feature for cluster size as a continuous variable.
5. All of the above
Advertisement
Show Answer
All of the above
-------------------------------------------
5. What could be the possible reason(s) for producing two different dendrograms
using agglomerative clustering algorithm for the same dataset? because
1. Proximity function used
2. of data points used
3. of variables used
4. All of the above
Advertisement
Show Answer
All of the above
-------------------------------------------
6. In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
4. Data points with non-convex shapes
5. a, b and d
Advertisement
Show Answer
a, b and d
-------------------------------------------
7. Which of the following is/are valid iterative strategy for treating missing values
before clustering analysis?
1. Imputation with mean
2. Nearest Neighbor assignment
3. computation with Expectation
4. Maximization algorithm All of the above
Show Answer
computation with Expectation
-------------------------------------------
8. Feature scaling is an important step before applying K-Mean algorithm. What is
reason behind this?
1. In distance calculation it will give the same weights for all features
2. You always get the same clusters. If you use or don’t use feature scaling
3. In Manhattan distance it is an important step but in Euclidian it is not
4. None of these
Advertisement
Show Answer
-------------------------------------------
In distance calculation it will give the same weights for all features
9. Which of the following method is used for finding optimal of cluster in K-Mean
algorithm?
1. Elbow method
2. Manhattan method
3. Euclidian mehthod
4. All of the above
Show Answer
Elbow method
-------------------------------------------
10. What is true about K-Mean Clustering?
1. K-means is extremely sensitive to cluster center initializations
2. Bad initialization can lead to Poor convergence speed
3. Bad initialization can lead to bad overall clustering
4. None of these
Advertisement
Show Answer
None of these
-------------------------------------------
11. Which of the following can be applied to get good results for K-means
algorithm corresponding to global minima?
1. Try to run algorithm for different centroid initialization
2. Adjust number of iterations
3. Find out the optimal number of clusters
4. All of the above
Show Answer
All of the above
-------------------------------------------
12. If you are using Multinomial mixture models with the expectationmaximization algorithm for clustering a set of data points into two clusters, which
of the assumptions are important:
1. All the data points follow two Gaussian distribution
2. All the data points follow n Gaussian distribution (n >2)
3. All the data points follow two multinomial distribution
4. All the data points follow n multinomial distribution (n >2)
Advertisement
Show Answer
All the data points follow two multinomial distribution
-------------------------------------------
13. Which of the following is/are not true about Centroid based K-Means
clustering algorithm and Distribution based expectation-maximization clustering
algorithm:
1. Both starts with random initializations
2. Both are iterative algorithms
3. Both have strong assumptions that the data points must fulfill
4. Expectation maximization algorithm is a special case of K-Means
Show Answer
Expectation maximization algorithm is a special case of K-Means
-------------------------------------------
14. Which of the following is/are not true about DBSCAN clustering algorithm:
1. For data points to be in a cluster, they must be in a distance threshold to a
core point
2. It has strong assumptions for the distribution of data points in dataspace
3. It has substantially high time complexity of order O(n3)
4. It does not require prior knowledge of the no. of desired clusters
5. both b and c
Show Answer
both b and c
-------------------------------------------
15. Which of the following are the high and low bounds for the existence of FScore?
1. [0,1]
2. (0,1)
3. [-1,1]
4. None of the above
Advertisement
Show Answer
[0,1]
-------------------------------------------
16. All of the following increase the width of a confidence interval except:
1. Increased confidence level
2. Increased variability
3. Increased sample size
4. Decreased sample size
Show Answer
Increased sample size
-------------------------------------------
17. The p-value in hypothesis testing represents which of the following: Please
select the best answer of those provided below
1. The probability of failing to reject the null hypothesis, given the observed
results
2. The probability that the null hypothesis is true, given the observed results
3. The probability that the observed results are statistically significant, given
that the null hypothesis is true
4. The probability of observing results as extreme or more extreme than
currently observed, given that the null hypothesis is true
Advertisement
Show Answer
The probability of observing results as extreme or more extreme than currently observed,
given that the null hypothesis is true
-------------------------------------------
18. Assume that the difference between the observed, paired sample values is
defined in the same manner and that the specified significance level is the same for
both hypothesis tests. Using the same data, the statement that “a
paired/dependent two sample t-test is equivalent to a one sample t-test on the
paired differences, resulting in the same test statistic, same p-value, and same
conclusion” is: Please select the best answer of those provided below.
1. Always True
2. Never True
3. Sometimes True
4. Not Enough Information
Show Answer
Always True
-------------------------------------------
19. Green sea turtles have normally distributed weights, measured in kilograms,
with a mean of 134.5 and a variance of 49 0. A particular green sea turtle’s weight
has a z-score of -2.4. What is the weight of this green sea turtle? Round to the
nearest whole number.
1. 17 kg
2. 151 kg
3. 118 kg
4. 252 kg c
Advertisement
Show Answer
118 kg
-------------------------------------------
20. What percentage of measurements in a dataset fall above the median?
1. 49%
2. 50%
3. 51%
4. Cannot Be Determined
Advertisement
Show Answer
Cannot Be Determined
-------------------------------------------
21. The proportion of variation in 5k race times that can be explained by the
variation in the age of competitive male runners was approximately 0.663. What is
the value of the sample linear correlation coefficient? Round to 3 decimal places.
1. 0.663
2. 0.814
3. -0.814
4. 0.440
Show Answer
-0.814
-------------------------------------------
22. Using all of the results provided, is it reasonable to predict the 5k race time
(minutes) of a competitive male runner 73 years of age?”
1. Yes; linear correlation between age and 5k race times is statistically
significant
2. Yes; both the sample linear regression equation and an age in years is
provided
3. No; linear correlation between age and 5k race times is not statistically
significant
4. No; the age provided is beyond the scope of our available sample data” d
Advertisement
Show Answer
No; linear correlation between age and 5k race times is not statistically significant
-------------------------------------------
23. If an itemset is considered frequent, then any subset of the frequent itemset
must also be frequent.
1. Apriori Property
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 and 2
Show Answer
Both 1 and 2
-------------------------------------------
24. Algorithm is
1. It uses machine-learning techniques. Here program can learn from past
experience and adapt themselves to new situations
2. Computational procedure that takes some value as input and produces
some value as output
3. Science of making machines performs tasks that would require intelligence
when performed by humans
4. None of these
Show Answer
Computational procedure that takes some value as input and produces some value as
output
-------------------------------------------
25. Bias is
1. A class of learning algorithm that tries to find an optimum classification of
a set of examples using the probabilistic theory
2. Any mechanism employed by a learning system to constrain the search
space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the
fact that when people encounter new situations, they often explain them
by reference to familiar experiences, adapting the explanations to fit the
new situation.
4. None of these
Advertisement
Show Answer
Any mechanism employed by a learning system to constrain the search space of a
hypothesis
-------------------------------------------
26. Classification is
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given
by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Show Answer
A subdivision of a set of examples into a number of classes
-------------------------------------------
27. Binary attribute are
1. This takes only two values. In general, these values will be 0 and 1 and
.they can be coded as one bit
2. The natural environment of a certain species
3. Systems that can be used without knowledge of internal operations
4. None of these
Advertisement
Show Answer
This takes only two values. In general, these values will be 0 and 1 and .they can be
coded as one bit
-------------------------------------------
28. Cluster is
1. Group of similar objects that differ significantly from other objects
2. Operations on a database to transform or simplify data in order to prepare
it for a machine-learning algorithm
3. Symbolic representation of facts or ideas from which information can
potentially be extracted
4. None of these
Show Answer
Group of similar objects that differ significantly from other objects
-------------------------------------------
29. A definition of a concept is ______ if it recognizes all the instances of that
concept
1. Complete
2. Consistent
3. Constant
4. None of these
Advertisement
Show Answer
Complete
-------------------------------------------
30. A definition oF a concept is _______ if it classifies any examples as coming within
the concept
1. Complete
2. Consistent
3. Constant
4. None of these
Show Answer
Consistent
-------------------------------------------
31. Data selection is
1. The actual discovery phase of a knowledge discovery process
2. The stage of selecting the right data for a KDD process
3. A subject-oriented integrated time variant non-volatile collection of data
in support of management
4. None of these
Advertisement
Show Answer
The stage of selecting the right data for a KDD process
-------------------------------------------
32. Classification task referred to
1. A subdivision of a set of examples into a number of classes
2. A measure of the accuracy, of the classification of a concept that is given
by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these
Advertisement
Show Answer
The task of assigning a classification to a set of examples
-------------------------------------------
 Data Analytics UNIT 3
-------------------------------------------
1. This clustering algorithm terminates when mean values computed for the
current iteration of the algorithm are identical to the computed mean values for
the previous iteration This clustering algorithm terminates when mean values
computed for the current iteration of the algorithm are identical to the computed
mean values for the previous iteration
1. K-Means clustering
2. conceptual clustering
3. expectation maximization
4. agglomerative clustering
Advertisement
Show Answer
K-Means clustering
-------------------------------------------
2. The correlation coefficient for two real-valued attributes is –0.85. What does this
value tell you?
1. The attributes are not linearly related.
2. As the value of one attribute decreases the value of the second attribute
increases.
3. As the value of one attribute increases the value of the second attribute
also increases.
4. The attributes show a linear relationship
Advertisement
Show Answer
As the value of one attribute decreases the value of the second attribute increases.
-------------------------------------------
3. Given a rule of the form IF X THEN Y, rule confidence is defined as the
conditional probability that
1. Y is false when X is known to be false.
2. Y is true when X is known to be true.
3. X is true when Y is known to be true
4. X is false when Y is known to be false.
Advertisement
Show Answer
Y is true when X is known to be true.
-------------------------------------------
4. Chameleon is
1. Density based clustering algorithm
2. Partitioning based algorithm
3. Model based algorithm
4. Hierarchical clustering algorithm
Advertisement
Show Answer
Hierarchical clustering algorithm
-------------------------------------------
5. Find odd man out
1. DBSCAN
2. K-Mean
3. PAM
4. None of above
Advertisement
Show Answer
DBSCAN
-------------------------------------------
6. The number of iterations in apriori _
1. increases with the size of the data
2. decreases with the increase in size of the data
3. increases with the size of the maximum frequent set
4. decreases with increase in size of the maximum frequent set
Show Answer
increases with the size of the maximum frequent set
-------------------------------------------
7. Which of the following are interestingness measures for association rules?
1. Recall ‘
2. Lift
3. Accuracy
4. All of Above
Advertisement
Show Answer
Lift
-------------------------------------------
8. Given a frequent itemset L, If |L| = k, then there are
1. 2k – 1 candidate association rules
2. 2k candidate association rules
3. 2k – 2 candidate association rules
4. 2k -2 candidate association rules
Advertisement
Show Answer
2k – 2 candidate association rules (2 to power k -2)
-------------------------------------------
9. _______ is an example for case based-learning
1. Decision trees
2. Neural networks
3. Genetic algorithm
4. K-nearest neighbor
Show Answer
K-nearest neighbor
-------------------------------------------
10. The average positive difference between computed and desired outcome
values.
1. mean positive
2. error mean squared
3. error mean absolute
4. error root mean squared error
Advertisement
Show Answer
error mean absolute
-------------------------------------------
11. Frequent item sets is
1. Superset of only closed frequent item sets
2. Superset of only maximal frequent item sets
3. Subset of maximal frequent item sets
4. Superset of both closed frequent item sets and maximal frequent item sets
Advertisement
Show Answer
Superset of both closed frequent item sets and maximal frequent item sets
-------------------------------------------
12. Assume that we have a dataset containing information about 200
individuals. A supervised data mining session has discovered the following rule:
IF age < 30 & credit card insurance = yes THEN life insurance = yes Rule
Accuracy: 70% and Rule Coverage: 63% How many individuals in the class life
insurance= no have credit card insurance and are less than 30 years old?
1. 63
2. 38
3. 40
4. 89
Show Answer
38
-------------------------------------------
13. Which of the following is cluster analysis?
1. Simple segmentation
2. Grouping similar objects
3. Labeled classification
4. Query results grouping
Advertisement
Show Answer
Grouping similar objects
-------------------------------------------
14. A good clustering method will produce high quality clusters with
1. high inter class similarity
2. high intra class similarity
3. low intra class similarity
4. None of above
Show Answer
low intra class similarity
-------------------------------------------
15. Which two parameters are needed for DBSCAN
1. Min threshold
2. Min points and eps
3. Min sup and min confidence
4. Number of centroids
Show Answer
Min points and eps
-------------------------------------------
16. Which statement is true about neural network and linear regression models?
1. Both techniques build models whose output is determined by a linear
sum of weighted input attribute values.
2. The output of both models is a categorical attribute value.
3. Both models require numeric attributes to range between 0 and 1.
4. Both models require input attributes to be numeric.
Show Answer
Both models require input attributes to be numeric.
-------------------------------------------
17. In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2
item-sets are
1. 100
2. 200
3. 4950
4. 5000
Advertisement
Show Answer
4950
-------------------------------------------
18. Significant Bottleneck in the Apriori algorithm is
1. Finding frequent itemsets
2. Pruning
3. Candidate generation
4. Number of iterations
Show Answer
Candidate generation
-------------------------------------------
19. Machine learning techniques differ from statistical techniques in that machine
learning methods
1. are better able to deal with missing and noisy data
2. typically assume an underlying distribution for the data
3. have trouble with large-sized datasets
4. are not able to explain their behavior.
Advertisement
Show Answer
are better able to deal with missing and noisy data
-------------------------------------------
20. The probability of a hypothesis before the presentation of evidence.
1. a priori
2. posterior
3. conditional
4. subjective
Show Answer
a priori
-------------------------------------------
21. KDD represents extraction of
1. data
2. knowledge
3. rules
4. model
Advertisement
Show Answer
knowledge
-------------------------------------------
22. Which statement about outliers is true?
1. Outliers should be part of the training dataset but should not be present
in the test data.
2. Outliers should be identified and removed from a dataset.
3. The nature of the problem determines how outliers are used
4. Outliers should be part of the test dataset but should not be present in the
training data.
Show Answer
The nature of the problem determines how outliers are used
-------------------------------------------
23. The most general form of distance is
1. Manhattan
2. Eucledian
3. Mean
4. Minkowski
Advertisement
Show Answer
Minkowski
-------------------------------------------
24. Which Association Rule would you prefer
1. High support and medium confidence
2. High support and low confidence
3. Low support and high confidence
4. Low support and low confidence
Show Answer
Low support and high confidence
-------------------------------------------
25. In a Rule based classifier, If there is a rule for each combination of attribute
values, what do you called that rule set R
1. Exhaustive
2. Inclusive
3. Comprehensive
4. Mutually exclusive
Advertisement
Show Answer
Exhaustive
-------------------------------------------
26. The apriori property means
1. If a set cannot pass a test, its supersets will also fail the same test
2. To decrease the efficiency, do level-wise generation of frequent item sets
3. To improve the efficiency, do level-wise generation of frequent item sets
4. If a set can pass a test, its supersets will fail the same test
Show Answer
If a set cannot pass a test, its supersets will also fail the same test
-------------------------------------------
27. If an item set ‘XYZ’ is a frequent item set, then all subsets of that frequent item
set are
1. Undefined
2. Not frequent
3. Frequent
4. Can not say
Show Answer
Frequent
-------------------------------------------
28. The probability that a person owns a sports car given that they subscribe to
automotive magazine is 40%. We also know that 3% of the adult population
subscribes to automotive magazine. The probability of a person owning a sports
car given that they don subscribe to automotive magazine is 30%. Use this
information to compute the probability that a person subscribes to automotive
magazine given that they own a sports car
1. 0.0368
2. 0.0396
3. 0.0389
4. 0.0398
Show Answer
0.0396
-------------------------------------------
29. Simple regression assumes a __ relationship between the input attribute and
output attribute.
1. quadratic
2. inverse
3. linear
4. reciprocal
Show Answer
linear
-------------------------------------------
30. To determine association rules from frequent item sets
1. Only minimum confidence needed
2. Neither support not confidence needed
3. Both minimum support and confidence are needed
4. Minimum support is needed
Advertisement
Show Answer
Both minimum support and confidence are needed
-------------------------------------------
31. If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is
1. C –> A
2. D –>ABCD
3. A –> BC
4. B –> ADC
Show Answer
D –>ABCD
-------------------------------------------
32. Classification rules are extracted from _
1. decision tree
2. root node
3. branches
4. siblings
Show Answer
decision tree
-------------------------------------------
33. What does K refers in the K-Means algorithm which is a non-hierarchical
clustering approach?
1. Complexity
2. Fixed value
3. No of iterations
4. number of clusters
Advertisement
Show Answer
number of clusters
-------------------------------------------
34. If Linear regression model perfectly first i.e., train error is zero, then _________
1. Test error is also always zero
2. Test error is non zero
3. Couldn’t comment on Test error
4. Test error is equal to Train error
Show Answer
Couldn’t comment on Test error
-------------------------------------------
35. How many coefficients do you need to estimate in a simple linear regression
model (One independent variable)?
1. 1
2. 2
3. 3
4. 4
Advertisement
Show Answer
2
-------------------------------------------
36 In a simple linear regression model (One independent variable), If we change
the input variable by 1 unit. How much output variable will change?
1. by 1
2. no change
3. by intercept
4. by its slope
Show Answer
by its slope
-------------------------------------------
37. In syntax of linear model lm(formula,data,..), data refers to __
1. Matrix
2. array
3. vector
4. list
Show Answer
vector
-------------------------------------------
38. In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2)
refers to __
1. (X-intercept, Slope)
2. (Slope, X-Intercept)
3. (Y-Intercept, Slope)
4. (slope, Y-Intercept)
Advertisement
Show Answer
(Y-Intercept, Slope)
-------------------------------------------
 UNIT 4
-------------------------------------------
1. A ____ is a decision support tool that uses a tree-like graph or model of decisions
and their possible consequences, including chance event outcomes, resource costs,
and utility.
1. Decision tree
2. Graphs
3. Trees
4. Neural Networks
Advertisement
Show Answer
Decision tree
-------------------------------------------
2. What is Decision Tree?
1. Flow-Chart
2. Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class
label
3. Flow-Chart & Structure in which internal node represents test on an
attribute, each branch represents outcome of test and each leaf node
represents class label
4. None of Above
Advertisement
Show Answer
Flow-Chart & Structure in which internal node represents test on an attribute, each
branch represents outcome of test and each leaf node represents class label
-------------------------------------------
3. Decision Trees can be used for Classification Tasks.
1. TRUE
2. FALSE
Advertisement
Show Answer
TRUE
-------------------------------------------
4. Choose from the following that are Decision Tree nodes?
1. Decision Nodes
2. End Nodes
3. Chance Nodes
4. All of Above
Advertisement
Show Answer
All of Above
-------------------------------------------
5. Decision Nodes are represented by __
1. Disks
2. Squares
3. Circles
4. Triangles
Advertisement
Show Answer
Squares
-------------------------------------------
6. Chance Nodes are represented by __
1. Disks
2. Squares
3. Circles
4. Triangles
Advertisement
Show Answer
Circles
-------------------------------------------
7. End Nodes are represented by __
1. Disks
2. Squares
3. Circles
4. Triangles
Advertisement
Show Answer
Triangles
-------------------------------------------
8. Which of the following are the advantage/s of Decision Trees?
1. Possible Scenarios can be added
2. Use a white box model, If given result is provided by a model
3. Worst, best and expected values can be determined for different scenarios
4. All of Above
Show Answer
All of Above
-------------------------------------------
9. Which of the following statements about Naive Bayes is incorrect?
1. Attributes are equally important.
2. Attributes are statistically dependent of one another given the class value.
3. Attributes are statistically independent of one another given the class
value.
4. Attributes can be nominal or numeric
Advertisement
Show Answer
Attributes are statistically dependent of one another given the class value.
-------------------------------------------
10. Which of the following is not supervised learning?
1. Clustering
2. Decision Tree
3. Linear Regression
4. Naive Bayesian
Advertisement
Show Answer
Clustering
-------------------------------------------
11. How many terms are required for building a bayes model?
1. 1
2. 2
3. 3
4. 4
Show Answer
3
-------------------------------------------
12. Where does the bayes rule can be used?
1. Solving queries
2. Increasing complexity
3. Decreasing complexity
4. Answering probabilistic query
Advertisement
Show Answer
Answering probabilistic query
-------------------------------------------
13. How the bayesian network can be used to answer any query?
1. Full distribution
2. Joint distribution
3. Partial distribution
4. All of Above
Show Answer
Joint distribution
-------------------------------------------
14. What is the consequence between a node and its predecessors while creating
bayesian network?
1. Functionally dependent
2. Dependant
3. Conditionally independent
4. Both Conditionally dependant & Dependant
Advertisement
Show Answer
Conditionally independent
-------------------------------------------
15. Bayesian classifiers is
1. A class of learning algorithm that tries to find an optimum classification of
a set of examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search
space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the
fact that when people encounter new situations, they often explain them
by reference to familiar experiences, adapting the explanations to fit the
new situation.
4. None of these
Show Answer
A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory.
-------------------------------------------
16. Bias is
1. A class of learning algorithm that tries to find an optimum classification of
a set of examples using the probabilistic theory
2. Any mechanism employed by a learning system to constrain the search
space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the
fact that when people encounter new situations, they often explain them
by reference to familiar experiences, adapting the explanations to fit the
new situation.
4. None of these
Advertisement
Show Answer
Any mechanism employed by a learning system to constrain the search space of a
hypothesis
-------------------------------------------
17. Background knowledge referred to
1. Additional acquaintance used by a learning algorithm to facilitate the
learning process
2. A neural network that makes use of a hidden layer
3. It is a form of automatic learning.
4. None of these
Show Answer
Additional acquaintance used by a learning algorithm to facilitate the learning process
-------------------------------------------
18. Discriminating between spam and ham e-mails is a classification task
1. TRUE
2. FALSE
Advertisement
Show Answer
TRUE
-------------------------------------------
19. which of the following is not involve in data mining?
1. Knowledge extraction
2. Data archaeology
3. Data exploration
4. Data transformation
Show Answer
Data transformation
-------------------------------------------
20. Naive prediction is
1. A class of learning algorithms that try to derive a Prolog program from
examples
2. A table with n independent attributes can be seen as an n- dimensional
space.
3. A prediction made using an extremely simple method, such as always
predicting the same output.
4. None of these
Show Answer
A prediction made using an extremely simple method, such as always predicting the
same output.
-------------------------------------------
21. Node is ____
1. A component of a network
2. In the context of KDD and data mining, this refers to random errors in a
database table.
3. One of the defining aspects of a data warehouse
4. None of these
Advertisement
Show Answer
A component of a network
-------------------------------------------
22. Prediction is
1. The result of the application of a theory or a rule in a specific case
2. One of several possible enters within a database table that is chosen by
the designer as the primary means of accessing the data in the table.
3. Discipline in statistics that studies ways to find the most interesting
projections of multi-dimensional spaces.
4. None of these
Show Answer
The result of the application of a theory or a rule in a specific case
-------------------------------------------
23. What is the relation between the distance between clusters and the
corresponding class discriminability?
1. proportional
2. inversely-proportional
3. no-relation
4. None of these
Advertisement
Show Answer
proportional
-------------------------------------------
24. the classification method in which the upper limit of interval is same as of
lower class interval is called
1. exclusive method
2. inclusive method
3. mid point method
4. None of these
Show Answer
exclusive method
-------------------------------------------
25. larger value is 60 and the smallest value is 40 and the number of classes is 5
then the class interval is
1. 20
2. 25
3. 4
4. 15
Show Answer
4
-------------------------------------------
26. summary and presentation of data in tabular form with several non
overlapping classes is referred as
1. nominal distribution
2. frequency distribution
3. ordinal distribution
4. None of these
Advertisement
Show Answer
frequency distribution
-------------------------------------------
27. the classification method in which the upper and lower limit of interval is also
in class interval itself is called
1. exclusive method
2. inclusive method
3. mid point method
4. None of these
Show Answer
inclusive method
-------------------------------------------
28. Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.
Suppose you are using averaging as ensemble of above 25 classifiers will make a
wrong prediction? Note: all classifiers are independent of each other
1. 0.05
2. 0.06
3. 0.07
4. 0.08
Advertisement
Show Answer
0.06
-------------------------------------------
29. The most widely used metrics and tools to assess a classification model are:
1. Confusion matrix
2. Cost-sensitive accuracy
3. Area under the ROC curve
4. All of Above
Show Answer
All of Above
-------------------------------------------
30. When performing regression or classification, which of the following is the
correct way to preprocess the data?
1. Normalize the data → PCA → training
2. PCA → normalize PCA output → training
3. Normalize the data → PCA → normalize PCA output → training
4. None of these
Advertisement
Show Answer
Normalize the data → PCA → training
-------------------------------------------
31. Which of the following is true about Naive Bayes ?
1. Assumes that all the features in a dataset are equally important
2. Assumes that all the features in a dataset are independent
3. both a and b
4. None of these
Show Answer
both a and b
-------------------------------------------
32. In which of the following cases will K-means clustering fail to give good
results? 1) Data points with outliers 2) Data points with different densities 3) Data
points with nonconvex shapes
1. 1 and 2
2. 2 and 3
3. 1, 2, and 3
4. 1 and 3
Advertisement
Show Answer
1, 2, and 3
-------------------------------------------
 UNIT 5
-------------------------------------------
1. Data visualtization is realted with
1. Pictorial representaions
2. numerical representation
3. numerical calculations
4. None of these
Advertisement
Show Answer
Pictorial representaions
-------------------------------------------
2. Which of the following are Use of data visualtization
1. See context of data
2. Clear data understanding
3. finding pattern in data
4. all of above
Advertisement
Show Answer
all of above
-------------------------------------------
3. Which of the following statements are true about using visualizations to display
a dataset?
I. Visualizations are visually appealing, but don’t help the viewer understand relationships
that exist in the data
II. Visualizations like graphs, charts, or visualizations with pictures are useful for
conveying information, while tables just filled with text are not useful.
III. Patterns that exist in the data can be found more easily by using a visualization
1. I AND II
2. II AND III
3. I AND III
4. ONLY III
Advertisement
Show Answer
-------------------------------------------
ONLY III
-------------------------------------------
4. The plot method on Series and DataFrame is just a simple wrapper around __
1. gplt.plot()
2. plt.plot()
3. plt.plotgraph()
4. none of the mentioned
Advertisement
Show Answer
plt.plot()
-------------------------------------------
5. Point out the correct combination with regards to kind keyword for graph
plotting.
1. ‘hist’ for histogram
2. ‘box’ for boxplot
3. ‘area’ for area plots
4. all of the mentioned
Advertisement
Show Answer
all of the mentioned
-------------------------------------------
6. Which of the following value is provided by kind keyword for barplot?
1. bar
2. kde
3. hexbin
4. none of the mentioned
Advertisement
Show Answer
bar
-------------------------------------------
7. You can create a scatter plot matrix using the __ method in
pandas.tools.plotting.
1. sca_matrix
2. scatter_matrix
3. DataFrame.plot
4. all of the mentioned
Advertisement
Show Answer
scatter_matrix
-------------------------------------------
8. Plots may also be adorned with error bars or tables.
1. True
2. FALSE
Advertisement
Show Answer
True
-------------------------------------------
9. Which of the following plots are often used for checking randomness in time
series?
1. Autocausation
2. Autorank
3. Autocorrelation
4. none of the mentioned
Show Answer
Autocorrelation
-------------------------------------------
10. __ plots are used to visually assess the uncertainty of a statistic
1. Lag
2. RadViz
3. Bootstrap
4. All Above
Advertisement
Show Answer
Bootstrap
-------------------------------------------
11. Which of the following is not a challenge in Big Data Visualization?
1. Velocity
2. Volume
3. Version
4. Variety
Show Answer
Version
-------------------------------------------
12. Which of the following is not a problem in Big Data Visualization
1. Visual Noise
2. Scaled Data
3. Large image perception
4. Information Loss
Advertisement
Show Answer
Scaled Data
-------------------------------------------
13. Which of the following is a problem in Big Data Visualization
1. Structured Data
2. Scaled Data
3. Visual Noise
4. Multiple valued Data
Show Answer
Visual Noise
-------------------------------------------
14. Which of the candidate is suitable for interactive visualtization?
1. Type of Visual
2. Cardinality
3. Size of data
4. all of above
Advertisement
Show Answer
all of above
-------------------------------------------
15. Which of the following follows interactive visualization approach?
1. Zoom+Pan
2. Focus+Context
3. Overview+Details
4. all of above
Show Answer
all of above
-------------------------------------------
16. Visual Mapping is important for_______
1. Remapping
2. Overview+Details
3. Focus
4. Context
Advertisement
Show Answer
Remapping
-------------------------------------------
17. Data visualtization techniques are:
1. Scatter Plot
2. Line Chart
3. Pie Chart
4. all of above
Show Answer
all of above
-------------------------------------------
18. Information Visualtization techniques are
1. Flow Chart
2. Time Line
3. DFD
4. All of above
Advertisement
Show Answer
All of above
-------------------------------------------
19. Which of the following is related term with correlation?
1. Exponential
2. U-Shape
3. Null
4. All of above
Show Answer
All of above
-------------------------------------------
20. Column graph is another name for _
1. Bar Chart
2. Scatterplot
3. Histogram
4. Area Chart
Advertisement
Show Answer
Bar Chart
-------------------------------------------
21. Which of the following is category of timeline?
1. Linear Timeline
2. Modular Timeline
3. Variant Timeline
4. ER Timeline
Show Answer
Linear Timeline
-------------------------------------------
22. Which of the following specifies relationship amongst variables?
1. Scatter Plot
2. Line Chart
3. Area Chart
4. All of above
Advertisement
Show Answer
All of above
-------------------------------------------
23.Which of the following specifies category Proportions?
1. Pie Chart
2. Histogram
3. Bar chart
4. All of above
Show Answer
All of above
-------------------------------------------
24. Which of the following is category of timeline?
1. Variant Timeline
2. ER Timeline
3. Comarative Timeline
4. Modular Timeline
Advertisement
Show Answer
Comarative Timeline
-------------------------------------------
25. Information Visualtization techniques are
1. Semantic Network
2. Histogram
3. Area Chart
4. None of these
Show Answer
Semantic Network
-------------------------------------------
26. Information Visualtization techniques are
1. Scatter Plot
2. Time Line
3. Bubble Chart
4. None of these
Advertisement
Show Answer
Time Line
-------------------------------------------
27. Information Visualtization techniques are
1. Flow Chart
2. Line Chart
3. Pie Chart
4. None of these
Show Answer
Flow Chart
-------------------------------------------
28. Which of the following are Use of data visualtization
1. See context of data
2. Clear data understanding
3. finding pattern in data
4. all of above
Advertisement
Show Answer
all of above
-------------------------------------------
29. Which of the following specifies relationship amongst variables?
1. Pie Chart
2. Histogram
3. Area Chart
4. None of these
Show Answer
Area Chart
-------------------------------------------
30. Which of the following specifies category Proportions?
1. Pie Chart
2. Scatter Plot
3. Line Chart
4. None of these
Advertisement
Show Answer
Pie Chart
-------------------------------------------
1. Precies and steady format data is____
1. Structured Data
2. UnStructured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
Structured Data
-------------------------------------------
2. Inconsistant Data is______
1. Structured Data
2. Un Structured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
Un Structured Data
-------------------------------------------
3. Format that self defines itself is________
1. Structured Data
2. Un Structured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
semi Structured Data
-------------------------------------------
4. A little Bit inconsistant data is_______
1. Structured Data
2. Un Structured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
Quasi Structured Data
-------------------------------------------
5. XML is an example of_______
1. Structured Data
2. UnStructured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
semi Structured Data
-------------------------------------------
6. RDBMS Folllows__________
1. Structured Data
2. Un Structured Data
3. semi Structured Data
4. Quasi Structured Data
Advertisement
Show Answer
Structured Data
-------------------------------------------
7. Watson is developed by____
1. IBM
2. Microsoft
3. AT&T
4. Google
Advertisement
Show Answer
IBM
-------------------------------------------
8. Hadoop is _ based Framework.
1. C++
2. Python
3. JAVA
4. C#
Show Answer
JAVA
-------------------------------------------
9. Which of the following are components of Hadoop?
1. MAPREDUCE
2. YARN
3. HDFS
4. All of Above
Advertisement
Show Answer
All of Above
-------------------------------------------
10. Which of the following are components of HIVE?
1. JDBC
2. Thrift Server
3. CLI
4. All of Above
Show Answer
All of Above
-------------------------------------------
11. Mahout provides__________
1. JAVA Executable Libraries
2. C# Executables
3. Mountable Image Format
4. All of Above
Advertisement
Show Answer
JAVA Executable Libraries
-------------------------------------------
12. Which of the following are components of HIVE?
1. FLATTEN
2. Thrift Server
3. Muster
4. None of these
Show Answer
Thrift Server
-------------------------------------------
13. Which of the following is components of Hadoop?
1. Fork
2. YARN
3. CLI
4. Metadata
Advertisement
Show Answer
YARN
-------------------------------------------
14. Which of the following is a clustering techique?
1. Fuzzy K means
2. Canopy
3. K-Means
4. All of above
Show Answer
all of above
-------------------------------------------
15. Which of the following is HBASE Data Model Terminology?
1. Row
2. Table
3. Column
4. All of Above
Advertisement
Show Answer
all of above
-------------------------------------------
16. Which of the following is not a classification techique?
1. Logistic Regression
2. Random Forest
3. Recommender Algo
4. Naïve Bayes
Show Answer
Recommender Algo
-------------------------------------------
17. Which of the following is a classification techique?
1. Logistic Regression
2. Random Forest
3. Naïve Bayes
4. All of Above
Advertisement
Show Answer
all of above
-------------------------------------------
18. Which of the following is HBASE Data Model Terminology?
1. Column Family
2. Cell
3. Timestamp
4. All of Above
Show Answer
All of above
-------------------------------------------
19. Which of the following is a clustering techique?
1. Logistic Regression
2. Random Forest
3. K-Means
4. Naïve Bayes
Advertisement
Show Answer
K-Means
-------------------------------------------
20. Which of the following is HBASE Data Model Terminology?
1. Identifier
2. Variant
3. Timestamp
4. None of the above
Show Answer
Timestamp
-------------------------------------------
21. Which of the following is not a classification techique?
1. Logistic Regression
2. Random Forest
3. K-Means
4. Naïve Baye
Advertisement
Show Answer
K-Means
-------------------------------------------
22. Which of the following is HBASE Data Model Terminology?
1. Identifier
2. Variant
3. Column Qualifier
4. None of the above
Show Answer
Column Qualifier
-------------------------------------------
23. Which of the following is not a clustering techique?
1. Logistic Regression
2. Canopy
3. K-Means
4. Fuzzy K means
Advertisement
Show Answer
Logistic Regression
-------------------------------------------
24. Point out the correct statement.
1. Hadoop do need specialized hardware to process the data
2. Hadoop 2.0 allows live stream processing of real-time data
3. In Hadoop programming framework output files are divided into lines or
records
4. None of the above
Show Answer
Hadoop 2.0 allows live stream processing of real-time data
-------------------------------------------
25. What was Hadoop named after?
1. Creator Doug Cutting’s favorite circus act
2. Cutting’s high school rock band
3. The toy elephant of Cutting’s son
4. A sound Cutting’s laptop made during Hadoop development
Show Answer
The toy elephant of Cutting’s son
-------------------------------------------
26. ___________programming model used to develop Hadoop-based applications that
can process massive amounts of data.
1. MapReduce
2. Mahout
3. Oozie
4. None of the above
Show Answer
MapReduce
-------------------------------------------
27. Hadoop is a framework that works with a variety of related tools. Common
cohorts include __
1. MapReduce, Hive and HBase
2. MapReduce, MySQL and Google Apps
3. MapReduce, Hummer and Iguana
4. All of above
Advertisement
Show Answer
MapReduce, Hive and HBase
-------------------------------------------
28. NoSQL databases is used mainly for handling large volumes of __ data.
1. Structured Data
2. Un Structured Data
3. semi Structured Data
4. Quasi Structured Data
Show Answer
Un Structured Data
-------------------------------------------
29. Which of the following is not a phase of Data Analytics Life Cycle?
1. Communication
2. Recall
3. Data Preparation
4. Model Planning
Advertisement
Show Answer
Recall
-------------------------------------------
30. Which of the following is a NoSQL Database Type?
1. SQL
2. Document databases
3. JSON
4. All of above
Show Answer
Document databases
-------------------------------------------
30. Which of the following is not a NoSQL database
1. SQL Server
2. MongoDB
3. Cassandra
4. None of the above
Advertisement
Show Answer
SQL Server
-------------------------------------------

1.  Which of the following analysis are incredibly hard to infer?
Post-Process
Pre-Process
Process
All of the Mentioned Above
---------------------------------------------------------- 
2. Which of the following characteristics of big data is relatively more concerned to data science? 
Raw data is original source of data
Pre-processed data is original source of data
Raw data is the data obtained after pre-processing step
None of the Mentioned
----------------------------------------------------------
3.  Which of the following step is performed by data scientist after acquiring the data?
RMSE
RSquared
Accuracy
All of the Mentioned Above
----------------------------------------------------------
4.Where does the bayes rule can be used?  (CO2, K3)
Answering probabilistic query
Solving queries
Decreasing complexity
Increasing complexity
----------------------------------------------------------
5.Which attribute is _not_ indicative for data streaming?
Limited amount of processing time
Limited amount of processing power
Limited amount of memory
Limited amount of input data
----------------------------------------------------------
6.The namenode knows that the datanode is active using a mechanism known as
Active Pulse
Data Pulse
h-signal
HeartBeats
----------------------------------------------------------
7.As part of the HDFS high availability a pair of primary namenodes are configured. What is true for them?
As part of the HDFS high availability a pair of primary namenodes are configured. What is true for them?
One of them is active while the other one remains powered off.
The standby node takes periodic checkpoints of active namenode’s namespace.
None
----------------------------------------------------------
8. Bias is
A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory
An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
None of these
Any mechanism employed by a learning system to constrain the search space of a hypothesis
----------------------------------------------------------
9.What do you mean by support(A)?
Number of transactions not containing A / Total number of transactions
Total number of transactions containing A
Number of transactions containing A / Total number of transactions
Total Number of transactions not containing A
----------------------------------------------------------
10.Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35. Suppose you are using averaging as ensemble of above 25 classifiers will make a wrong prediction? Note: all classifiers are independent of each other
0.05
0.07
0.08
0.06
----------------------------------------------------------
11.A Bloom filter guarantees no
false positives or false negatives, depending on the Bloom filter type
false positives and false negatives
false negatives
false positives
12.Back propagation is a learning technique that adjusts weights in the neural network by propagating
weight changes.
Backward from sink to source
Forward from source to hidden nodes
Forward from source to sink
Backward from sink to hidden node
----------------------------------------------------------
13.What does FP growth algorithm do?
It mines all frequent patterns by constructing an itemsets
It mines all frequent patterns through pruning rules with lesser support
It mines all frequent patterns through pruning rules with higher support
It mines all frequent patterns by constructing a FP tree
----------------------------------------------------------
14.The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities
False
True
----------------------------------------------------------
15.Data Analysis is a process of
inspecting data
cleaning data
transforming data
All of Above
----------------------------------------------------------
16.Clustering method can be classified
Partitioning Methods
Hierarchical methods
Density-based methods
All of these
----------------------------------------------------------
17.Pig Latin statements are generally organized in one of the following ways?
A LOAD statement to read data from the file system
A series of “transformation” statements to process the data
A DUMP statement to view results or a STORE statement to save the results
All of the Mentioned
----------------------------------------------------------
18. Which of the following is not supervised learning?
Linear Regression
Decision Tree
Naïve Bayesian
Clustering
----------------------------------------------------------
19.What techniques can be used to improve the efficiency of apriori algorithm?
Hash Based Techniques
Transaction Increases
Cleaning
Sampling
----------------------------------------------------------
20.Which is the correct statement.
MapReduce tries to place the data and the compute as close as possible
Map Task in MapReduce is performed using the Mapper() function
Reduce Task in MapReduce is performed using the Map() function
None
----------------------------------------------------------
21.What action to take when IF (temperature=Warm) AND (target=Warm) THEN?
Heat
No Change
Cool
None of the Above
----------------------------------------------------------
22.What is Decision Tree?  
Flow-Chart
Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
Flow-Chart & Structure in which internal node represents test on an attribute, each branch represents outcome of test and each leaf node represents class label
None of Above
----------------------------------------------------------
23._________ is a shell utility which can be used to run Hive queries in either interactive or batch mode.
$HIVE/bin/hive
$HIVE_HOME/hive
$HIVE_HOME/bin/hive
All of the Mentioned
----------------------------------------------------------
24.What do you mean by support(A)? 
Number of transactions containing A / Total number of transactions
Number of transactions not containing A / Total number of transactions
Total number of transactions containing A
Total Number of transactions not containing A
----------------------------------------------------------
25.Which of the following is not a phase of Data Analytics Life Cycle?
Model Planning
Communication
Data Preparation
Recall
----------------------------------------------------------
26.DBSCAN stands for:
Divisive Based Clustering Method
Density-Based Clustering Method
Both a & b
None of above
----------------------------------------------------------
27.The cost parameter in the SVM means:
The number of cross-validations to be made
The kernel to be used
The tradeoff between misclassification and simplicity of the model
None of the above
----------------------------------------------------------
28.What is the main difference between standard reservoir sampling and min-wise sampling?
For larger streams, reservoir sampling creates more accurate samples than min-wise sampling.
Reservoir sampling makes use of randomly generated numbers whereas min-wise sampling does not.
Reservoir sampling requires a stream to be processed sequentially, whereas min-wise does not.
Min-wise sampling makes use of randomly generated numbers whereas reservoir sampling does not.
----------------------------------------------------------
29.You can run Pig in batch mode using __________
Pig Shell Command
Pig Scripts
Pig Options
All of the Mentioned
----------------------------------------------------------
30.Two approaches to improving the quality of hierarchical clustering:
Perform careful analysis of object “linkages” at each hierarchical partitioning, such as in CURE and Chameleon
Integrate Hierarchical agglomeration and iterative relocation by first using a hierarchical agglomerative algorithm and refining the result using an iterative relocation
Both a & b
None of these
----------------------------------------------------------
31.___ Table help and enable the end-users of the data mart to relate the data to its expanded version.
data
reference
 both a and b
none of the above
----------------------------------------------------------
32.XML is an example of________________________________
Structured Data
Semi-Structured
Unstructured Data
Quasi-Structured
----------------------------------------------------------
33. Precise and steady format data is____ 
Semi Structured Data
Unstructured Data
Structured Data
Quasi Structured Data
----------------------------------------------------------
34.What is Neuro software?
It is powerful and easy neural network
A software used to analyze neurons
It is software used by Neurosurgeon
Designed to aid experts in real world
----------------------------------------------------------
35.Naive prediction is
A class of learning algorithms that try to derive a Prolog program from examples
A table with n independent attributes can be seen as an n- dimensional space.
A prediction made using an extremely simple method, such as always predicting the same output.
None of these
----------------------------------------------------------
36.Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter. Based upon that give the answer for following question.
What would happen when you use very small C (C~0)?
Misclassification would happen
Data will be correctly classified
Can’t say
None of these
----------------------------------------------------------
37.Which of the following is the right approach to Data Mining?
Infrastructure, exploration, analysis, exploitation, interpretation
Infrastructure, exploration, analysis, interpretation, exploitation
Infrastructure, analysis, exploration, interpretation, exploitation
None of these
----------------------------------------------------------
38.Which of the following is the direct application of frequent itemset mining?
Social Network Analysis
Market Basket Analysis
Outlier Detection
Intrusion Detection
----------------------------------------------------------
39.Which of the following terms is used as a synonym for data mining?
knowledge discovery in databases
data warehousing
regression analysis
parallel processing in databases
----------------------------------------------------------
40.Which of the following statements about Naive Bayes is incorrect?
Attributes are statistically independent of one another given the class value.
Attributes are statistically dependent of one another given the class value.
Attributes can be nominal or numeric
Attributes are equally important.
----------------------------------------------------------
41.The number of maps is usually driven by the total size of ____________
Input
Output
Task
None
----------------------------------------------------------
42.Point out the correct statement.
Hive Commands are non-SQL statement such as setting a property or adding a resource
Set -v prints a list of configuration variables that are overridden by the user or Hive
Set sets a list of variables that are overridden by the user or Hive
None of the Mentioned
----------------------------------------------------------
43.Data generated from online transactions is one of the example for volume of big data
TRUE
FALSE
----------------------------------------------------------
44.What is an often occurring phenomenon when comparing simple/complex algorithms on small/big data?
On small data complex algorithms fail.
On large data complex algorithms perform much better than simple algorithms.
On large data simple algorithms work very well.
On small data simple algorithms work very well.
----------------------------------------------------------
45.Which are the two type of Hierarchical Clustering?
Agglomerative Hierarchical Clustering and Density Hierarchical Clustering
Agglomerative Hierarchical Clustering and Divisive Hierarchical Clustering
Divisive Hierarchical Clustering and Density Hierarchical Clustering
None of the above
----------------------------------------------------------
46.Frequency of occurrence of an itemset is called as _____
Support Count
Support
Confidence
Rules
----------------------------------------------------------
47.Which of the following is not a phase of Data Analytics Life Cycle?
Operationalize
Model Planning
Data Preparation
Performance Metrics
----------------------------------------------------------
48.NoSQL databases is used mainly for handling large volumes of __ data.
Quasi Structured Data
Structured Data
semi Structured Data
Un Structured Data
----------------------------------------------------------
49.Is it possible that Assignment of observations to clusters does not change between successive iterations in K-Means
Yes
No
Can’t say
None of these
----------------------------------------------------------
50.Cluster is a :
The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.
A cluster of data objects can be treated collectively as one group in many applications
Cluster analysis is an important human activity.
All of the above
----------------------------------------------------------
51.Which one of the following is not true regarding to Hadoop?
It is a Distributed Framework
The Main Algorithm Used is Map-Reduce
It Runs with Commodity Hardware
All of the Above
----------------------------------------------------------
52. How can Clustering (Unsupervised Learning) be used to improve the accuracy of Linear Regression model (Supervised Learning):   (CO2, K3)
Creating different models for different cluster groups.
Creating an input feature for cluster ids as an ordinal variable
Creating an input feature for cluster centroids as a continuous variable.
Creating an input feature for cluster size as a continuous variable.
All of the above
----------------------------------------------------------
53.There are also other operators, more linguistic in nature, called __________ that can be applied to fuzzy set theory.
Hedges
Lingual Variable
Fuzz Variable
None of the mentioned
----------------------------------------------------------

3.  Format that self defines itself is             
1.      Structured Data
2.      Un Structured Data
3.      semi Structured Data
4.      Quasi Structured Data
----------------------------------------------------------
4.  A little Bit inconsistent data is           
1.      Structured Data
2.      Un Structured Data
3.      semi Structured Data
4.      Quasi Structured Data
----------------------------------------------------------
6.     RDBMS Follows
1.      Structured Data
2.      Un Structured Data
3.      semi Structured Data
4.      Quasi Structured Data
----------------------------------------------------------
7.        Watson is developed by
1.      IBM
2.      Microsoft
3.      AT&T
4.      Google
----------------------------------------------------------
8.     Hadoop is _ based Framework.
1.      C++
2.      Python
3.      JAVA
4.      C#
----------------------------------------------------------
9.     Which of the following are components of Hadoop?
1.      MAPREDUCE
2.      YARN
3.      HDFS
4.      All of Above
----------------------------------------------------------
14.       Which of the following is a clustering technique?
1.      Fuzzy K means
2.      Canopy
3.      K-Means
4.      All of above
----------------------------------------------------------
1.      Row
2.      Table
3.      Column
4.         All of Above
----------------------------------------------------------
1.      Logistic Regression
2.      Random Forest
3.      Recommender Algo
4.      Naïve Bayes
Recommender Algo
---------------------------------------------------------- 
17.       Which of the following is a classification technique?
1.      Logistic Regression
2.      Random Forest
3.      Naïve Bayes
4.      All of Above
all of above
----------------------------------------------------------
1.      Column Family
2.      Cell
3.      Timestamp
4.      All of Above
All of above
 ----------------------------------------------------------
19.       Which of the following is a clustering technique?
1.      Logistic Regression
2.      Random Forest
3.         K-Means
4.      Naïve Bayes
----------------------------------------------------------
1.      Identifier
2.      Variant
3.      Timestamp
4.      None of the above
Timestamp
----------------------------------------------------------
21.       Which of the following is not a classification technique?
1.      Logistic Regression
2.      Random Forest
3.      K-Means
4.      Naïve Baye
K-Means
----------------------------------------------------------
1.      Identifier
2.      Variant
3.      Column Qualifier
4.      None of the above
Column Qualifier
----------------------------------------------------------
 
23.       Which of the following is not a clustering techique?
1.      Logistic Regression
2.      Canopy
3.      K-Means
4.      Fuzzy K means
 Logistic Regression
----------------------------------------------------------
1.      Hadoop do need specialized hardware to process the data
2.         Hadoop 2.0 allows live stream processing of real-time data
3.      In Hadoop programming framework output files are divided into lines or records
4.      None of the above
---------------------------------------------------------- 
1.      Creator Doug Cutting’s favorite circus act
2.      Cutting’s high school rock band
3.      The toy elephant of Cutting’s son
4.      A sound Cutting’s laptop made during Hadoop development
The toy elephant of Cutting’s son
---------------------------------------------------------- 
26.                         programming model used to develop Hadoop-based applications that can process massive amounts of data.
1.      MapReduce
2.      Mahout
3.      Oozie
4.      None of the above
MapReduce
 ----------------------------------------------------------
27.       Hadoop is a framework that works with a variety of related tools. Common cohorts include   
1.      MapReduce, Hive and HBase
2.      MapReduce, MySQL and Google Apps
3.      MapReduce, Hummer and Iguana
4.      All of above
MapReduce, Hive and HBase
----------------------------------------------------------
29.       Which of the following is not a phase of Data Analytics Life Cycle?
1.      Communication
2.      Recall
3.      Data Preparation
4.      Model Planning
Recall
----------------------------------------------------------
30.  Which of the following is not a NoSQL database
1.      SQL Server
2.      MongoDB
3.      Cassandra
4.      None of the above
SQL Server
----------------------------------------------------------
1.  Which of the following approach should be used to ask Data Analysis question?
1.      Data Integration
2.      Data Cleaning
3.      Data Replication
4.      All of the Mentioned Above
----------------------------------------------------------
2.   Which of the following model is usually gold standard for data analysis?
1.      Summarizing
2.      Inference
3.      Sub-setting
4.      None of the Mentioned

---------------------------------------------------------- 
3.   Which of the following model include a backwards elimination feature selection routing?
1.      Inferential
2.      Predictive
3.      Exploratory
4.      None of the Mentioned Above
----------------------------------------------------------

<<<Do Share>>>

Comments