Advanced Data Mining - Quiz 2
BITS WILP - Mtec Software Systems - 2017
Question 1
Which of the following is expected to be more compact in general.
Select one:
a. CAN-Tries
b. CATS-Tree
c. FP-Tree
d. CAN-Tree
Ans: a. CAN-Tries
Question 2
Consider data mining operation on a database that keeps changing. Assume there are M number of items in the database and after some time M1 number of items expires and M2 number of new items joins the database. An incremental data mining algorithm would be called efficient if it takes
Select one:
a. Order of M1+M2 time to update its mining result
b. Order of M time to update its mining result
c. Order of M2 time to update its mining result
d. Order of M+M1+M2 time to update its mining result
e. Order of M1 time to update its mining result
Ans: a. Order of M1+M2 time to update its mining result
Question 3
Which of the following algorithm could produces clusters of arbitrary shape and size. (Hint: half moon is an arbitrary shape)
Select one or more:
a. k-Means
b. DBSCAN
c. PAM
d. Single Link
Ans: b. DBSCAN , c. PAM
Question 4
Consider the paper entitled "Mining Frequent Patterns without Candidate Generation". This paper introduces which of the following algorithm
Select one:
a. CATS-Tree
b. CP-Tree
c. FP-Tree
d. CAN-Tree
Ans: c. FP-Tree
Question 5
Incremental DBSCAN scan algorithm does not update the clustering after a single update instead it waits till some adequate number of transactions (datum point) arrive and depart. This step helps to minimize the sensitivity of DBSCAN algorithm on parameters such as eps and min-pts.
Select one:
True
False
Ans: True
Question 6
Incremental DBSCAN uses a special data structure to answer neighborhood queries called
Select one:
a. Splay-Tree
b. B-Tree
c. AVL-Tree
d. None of above
Ans : d. None of above
Question 7
Consider association rule mining that involves the discovery of frequent itemsets based on support and confidence parameters. Negative border set can help in
a. Reducing number of database scans
b. Reducing the number of candidate itemsets
c. Efficient computation of support of a candidate itemset
d. Speed up in the process of construction of k+1 item sets from k itemsets
Ans: a. Reducing number of database scans
Question 8
Consider difference estimation for large itemsets (DELI) algorithm. State which of the following is NOT trure
Select one:
a. It does not process all the items.
b. Its result could sometime be wrong and the probability of mistake could NOT be bounded
c. Uses bell curve to build confidence interval
d. This algorithm uses statistical technique
Ans: b. Its result could sometime be wrong and the probability of mistake could NOT be bounded
Question 9
Consider a N-3 size data stream of positive integers where all the items are different and the maximum integer value in the stream is N. Assume that the stream is not sorted. Suppose your task is to device and algorithm to determine the numbers missing integers in the stream.
Select one:
a. Any such algorithm would need to store N-3 numbers in the memory
b. At least four integers need to be stored in the memory
c. It is sufficient to have storage of two items
d. None of the above
Ans: a. Any such algorithm would need to store N-3 numbers in the memory
Question 10
Which of the following statement about k -Means clustering algorithm is FALSE
a. It is suited for wide verity of problems and can be applied to evolving databases
b. It is used for clustering
c. Value of k is very important parameter and is supplied by the user only.
d. Cluster label for an data point is determined by its centroid.
Ans: a. It is suited for wide verity of problems and can be applied to evolving databases