Advanced Data Mining - Quiz 1 BITS WILP - Mtec Software Systems - 2017

Advanced Data Mining - Quiz 1
BITS WILP - Mtec Software Systems - 2017

1. Identify the most appropriate statement about evolutionary streams:
Select one:
a. Number of clusters can be fixed
b. Data come from one side and exit from the other side
c. Role of outliers and clusters may change
d. Data may come from many channels

Answer: Role of outliers and clusters may change

2. Identify FALSE statement about point-wise and batch-wise incremental DBSCAN:
Select one:
a. Both are producing same sets of clusters and are same as that of DBSCAN
b. Batch-wise incremental DBSCAN is faster than that of point-wise incremental DBSCAN if more number of overlapping clusters are present in new data
c. The process of addition of points in R-trees is same in both
d. Both are not suitable for mining data streams

Answer : Batch-wise incremental DBSCAN is faster than that of point-wise incremental DBSCAN if more number of overlapping clusters are present in new data

3. Consider FM-­Sketch algorithm discussed in the class. It determine number of distinct items over a data stream. Assuming, availability of only sub­linear space for the computation and the hash function used as below

                 h(x) = (5.x.x+6) mod 53

Determine the bit values of FM-Sketch after processing following data stream

      83, 63, 36, 14, 24, 57, 78, 57, 57, 24, 14, 36, 14, 36, 14, 36, 57, 57, 14, 23, 57, 36, 23, 24, 57, 14, 78, 57, 78, 83, 63, 36, 23, 14, 24

Assume size of FM­-Sketch to be 8­bit, and least significant position to at extreme right.
Select one:
a. 00101101
b. 00101011
c. 00101111
d. 00110101
e. 01100111

Answer : 00101101


4. The most import key feature of SWF is
Select one:
a. Use of phases
b. Use of Partial_min_sup
c. Use of sliding window model
d. Use of progressive Candidate itemsets

Answer : Use of Partial_min_sup

5. Identify correct statement about batch-wise incremental DBSCAN:
Select one:
a. Its equivalent to update existing clusters by processing new batch cluster by cluster
b. Finding intersection process is very costly procedure
c. Cost of finding clusters in new batch can always be compensated
d. It is equivalent to point-wise incremental DBSCAN if most of the points in the new batch are intersection points

Answer : It is equivalent to point-wise incremental DBSCAN if most of the points in the new batch are intersection points

6. Data Mining is a tool for knowledge discovery in databases (KDD). It is not related to
Select one:
a. Determining statistics about new data items
b. Highlighting outliers
c. Interpreting contents of data
d. Management of the data

Answer : Management of the data

7. Identify the statement which always holds true about incremental DBSCAN:
Select one:
a. Every time a split case may not split two clusters
b. An addition of a point will change density property of the neighboring points
c. A point added in in lesser dense region will be noise point
d. An addition of a point can cause merging of two clusters

Answer : Every time a split case may not split two clusters

8. Addition of a point in incremental DBSCAN:
Select one:
a. affects all density reachable points
b. can change core property of the points in 2-epsilon region of the point
c. affects all density connected points
d. can change core property of the points in an epsilon region of the point

Answer : can change core property of the points in an epsilon region of the point

9. Identify correct statement:
Select one:
a. Processing time for incremental updates should be proportional to the size of the increment
b. Incremental mining is easier than stream mining because it is just reapplying a mining algorithm on the whole dataset
c. There are not many applications where incremental updates are required
d. Bulk updates are always better than point wise updates

Answer : Processing time for incremental updates should be proportional to the size of the increment

10. Identify correct statement about CATS tree:
Select one:
a. The tree is optimally sized tree
b. Its construction cost is higher than CAN tree
c. Siblings are ordered by global support
d. Ordering of items within paths from roots to leaves are ordered by global support

Answer : Its construction cost is higher than CAN tree