Coding Tech Life: Advanced Data Mining Quiz

Showing posts with label Advanced Data Mining Quiz. Show all posts

Advanced Data Mining - Quiz 3 BITS WILP - Mtec Software Systems - 2017

Advanced Data Mining - Quiz 3

BITS WILP - Mtec Software Systems - 2017

Question 1

What are the stop words mean in a text document.

Select one:

a. Less commonly used words that have high information

b. Punctuation and special symbols

c. Most commonly used words in a text that contribute to no information d. Designated keywords

answer : Most commonly used words in a text that contribute to no information

Question 2

Consider tree mining in a larger tree database. Which of the following kind support does NOT maintain anti monotone property?

Select one:

a. Hybrid

b. Occurance-Based Correct

c. Transactional-Based

answer : Occurance-Based

Question 3

Consider Gibson, David, Kumar algorithm for determining dense subgraph in massive graph. Choose the best suitable choice for fingerprinting in this algorithm.

Select one:

a. Fingerprinting is parameter independent

b. Fingerprinting helps to compute Jaccard Coefficient

c. Fingerprinting can be avoided

d. Fingerprinting reduces comparison time Correct

answer : Fingerprinting reduces comparison time

Question 4

Consider representation of a text data using features. Which one of the following is typically supposed to work best

Select one:

a. tfidf Correct

b. term document index matrix

c. word wise sorted array

d. term document count matrix

answer : tfidf

Question 5

Which of the following statements are true with respect to hyperlink induced topic search.

Select one or more:

a. Base set contains both the hub and authoritative pages. Correct

b. Authoritative pages cannot point to other pages in the network

c. HUB pages contain data that is searched by user Incorrect

d. Page-rank can only efficiently be defined recursively.

answer : Page-rank can only efficiently be defined recursively., Base set contains both the hub and authoritative pages.

Question 6

Which one of the following is worst representation of a tree database

Select one:

a. DFS

b. Link List

c. Flat File (Text form) Correct

d. BFS

answer : Flat File (Text form)

Question 7

In a social network graph where node represent person and directional edge represent "following" (A is following B). Which of the statement below is true in general.

Select one:

a. Higher out-degree of a node represents influential person

b. Higher in-degree of a node represents influential person Correct

c. Both A and B

d. None of the above

answer : Higher in-degree of a node represents influential person

Question 8

Consider the following graph.

Determine similarity between node A and B using Jaccard Coefficient.

Select one:

a. 2/6

b. 2/7 Correct

c. 3/8

d. 2

answer : 2/7

Question 9

Extensible Markup Language (XML) is a format to store database in following form

Select one:

a. Sequential

b. Relational

c. Unstructured

d. Hierarchical Correct

answer : Hierarchical

Question 10

Consider a graph representing social network where nodes represent persons and edges to friends. Now consider a 2D matrix A of integers where A[i,j] represents length of the path between node i and j. Let d be the maximum value in the matrix A.

Since social networks are dynamic in nature. When number of nodes increases in a social network graph what is expected effect on the value of d?

Select one:

a. Value of d is expected to increase

b. Value of d is expected to decrease Correct

c. Value of d is independent of this change

d. None of the above.

answer : Value of d is expected to decrease

Question 11

A system that has denied 5 genuine (right person wanting access) attempts of authentication out of 20. And have allowed 5 imposer attempts (attacker wanting access to the system) out of 15. Has accuracy

Select one:

a. (20/35)*100 %

b. (10/35)*100 %

c. (15/35)*100 %

d. (25/35)*100 % Correct

answer : (25/35)*100 %

Question 12

PK-Means algorithm can sometime provide non optimal clustering

Select one or more:

a. Because of the dependence on initial choice of centroids Correct

b. Because of the dependence on processing done at combiner

c. Because of the dependence on distribution of data on map machines Incorrect

d. Because of the dependence on processing done at reducer

answer : Because of the dependence on initial choice of centroids

Question 13

What is the support of sequence <{1}{3,4}> in following database

D= <{2,3}{1,3}{3,4,5}>,<{2,5}{1}{3,4,5}>,<{1,5}{2,5}{3}{3,5}{4}>,<{1,5}{2,3,5}{3,4}{1,4,6}>

Select one:

a. 50%

b. 45%

c. 90%

d. 75% Correct

answer : 75%

Question 14

Consider following architecture of a parallel crawler.

Which part is responsible to implement freshness property

Select one:

a. URL Frontier Correct

b. Parse

c. URL Filter

d. Host Splitter

answer : URL Frontier

Question 15

Handling of Big Data is challenging because of

Select one:

a. Large number of data points or Volume

b. Data may be from various sources and could have different formatting

c. Data may be continuously arriving that makes processing difficult

d. All of above Correct

answer : All of above

Advanced Data Mining - Quiz 2 BITS WILP - Mtec Software Systems - 2017

Advanced Data Mining - Quiz 2

BITS WILP - Mtec Software Systems - 2017

Question 1
Which of the following is expected to be more compact in general.
Select one:
a. CAN-Tries
b. CATS-Tree
c. FP-Tree
d. CAN-Tree

Ans: a. CAN-Tries

Question 2
Consider data mining operation on a database that keeps changing. Assume there are M number of items in the database and after some time M1 number of items expires and M2 number of new items joins the database. An incremental data mining algorithm would be called efficient if it takes
Select one:
a. Order of M1+M2 time to update its mining result
b. Order of M time to update its mining result
c. Order of M2 time to update its mining result
d. Order of M+M1+M2 time to update its mining result
e. Order of M1 time to update its mining result

Ans: a. Order of M1+M2 time to update its mining result

Question 3
Which of the following algorithm could produces clusters of arbitrary shape and size. (Hint: half moon is an arbitrary shape)
Select one or more:
a. k-Means
b. DBSCAN
c. PAM
d. Single Link

Ans: b. DBSCAN , c. PAM

Question 4
Consider the paper entitled "Mining Frequent Patterns without Candidate Generation". This paper introduces which of the following algorithm
Select one:
a. CATS-Tree
b. CP-Tree
c. FP-Tree
d. CAN-Tree

Ans: c. FP-Tree

Question 5
Incremental DBSCAN scan algorithm does not update the clustering after a single update instead it waits till some adequate number of transactions (datum point) arrive and depart. This step helps to minimize the sensitivity of DBSCAN algorithm on parameters such as eps and min-pts.
Select one:
True
False

Ans: True

Question 6
Incremental DBSCAN uses a special data structure to answer neighborhood queries called
Select one:
a. Splay-Tree
b. B-Tree
c. AVL-Tree
d. None of above

Ans : d. None of above

Question 7
Consider association rule mining that involves the discovery of frequent itemsets based on support and confidence parameters. Negative border set can help in
a. Reducing number of database scans
b. Reducing the number of candidate itemsets
c. Efficient computation of support of a candidate itemset
d. Speed up in the process of construction of k+1 item sets from k itemsets

Ans: a. Reducing number of database scans

Question 8
Consider difference estimation for large itemsets (DELI) algorithm. State which of the following is NOT trure
Select one:
a. It does not process all the items.
b. Its result could sometime be wrong and the probability of mistake could NOT be bounded
c. Uses bell curve to build confidence interval
d. This algorithm uses statistical technique

Ans: b. Its result could sometime be wrong and the probability of mistake could NOT be bounded

Question 9
Consider a N-3 size data stream of positive integers where all the items are different and the maximum integer value in the stream is N. Assume that the stream is not sorted. Suppose your task is to device and algorithm to determine the numbers missing integers in the stream.
Select one:
a. Any such algorithm would need to store N-3 numbers in the memory
b. At least four integers need to be stored in the memory
c. It is sufficient to have storage of two items
d. None of the above

Ans: a. Any such algorithm would need to store N-3 numbers in the memory

Question 10
Which of the following statement about k -Means clustering algorithm is FALSE
a. It is suited for wide verity of problems and can be applied to evolving databases
b. It is used for clustering
c. Value of k is very important parameter and is supplied by the user only.
d. Cluster label for an data point is determined by its centroid.

Ans: a. It is suited for wide verity of problems and can be applied to evolving databases

Advanced Data Mining - Quiz 1 BITS WILP - Mtec Software Systems - 2017

Advanced Data Mining - Quiz 1

BITS WILP - Mtec Software Systems - 2017

1. Identify the most appropriate statement about evolutionary streams:
Select one:
a. Number of clusters can be fixed
b. Data come from one side and exit from the other side
c. Role of outliers and clusters may change
d. Data may come from many channels

Answer: Role of outliers and clusters may change

2. Identify FALSE statement about point-wise and batch-wise incremental DBSCAN:
Select one:
a. Both are producing same sets of clusters and are same as that of DBSCAN
b. Batch-wise incremental DBSCAN is faster than that of point-wise incremental DBSCAN if more number of overlapping clusters are present in new data
c. The process of addition of points in R-trees is same in both
d. Both are not suitable for mining data streams

Answer : Batch-wise incremental DBSCAN is faster than that of point-wise incremental DBSCAN if more number of overlapping clusters are present in new data

3. Consider FM-Sketch algorithm discussed in the class. It determine number of distinct items over a data stream. Assuming, availability of only sublinear space for the computation and the hash function used as below

h(x) = (5.x.x+6) mod 53

Determine the bit values of FM-Sketch after processing following data stream

83, 63, 36, 14, 24, 57, 78, 57, 57, 24, 14, 36, 14, 36, 14, 36, 57, 57, 14, 23, 57, 36, 23, 24, 57, 14, 78, 57, 78, 83, 63, 36, 23, 14, 24

Assume size of FM-Sketch to be 8bit, and least significant position to at extreme right.
Select one:
a. 00101101
b. 00101011
c. 00101111
d. 00110101
e. 01100111

Answer : 00101101

4. The most import key feature of SWF is
Select one:
a. Use of phases
b. Use of Partial_min_sup
c. Use of sliding window model
d. Use of progressive Candidate itemsets

Answer : Use of Partial_min_sup

5. Identify correct statement about batch-wise incremental DBSCAN:
Select one:
a. Its equivalent to update existing clusters by processing new batch cluster by cluster
b. Finding intersection process is very costly procedure
c. Cost of finding clusters in new batch can always be compensated
d. It is equivalent to point-wise incremental DBSCAN if most of the points in the new batch are intersection points

Answer : It is equivalent to point-wise incremental DBSCAN if most of the points in the new batch are intersection points

6. Data Mining is a tool for knowledge discovery in databases (KDD). It is not related to
Select one:
a. Determining statistics about new data items
b. Highlighting outliers
c. Interpreting contents of data
d. Management of the data

Answer : Management of the data

7. Identify the statement which always holds true about incremental DBSCAN:
Select one:
a. Every time a split case may not split two clusters
b. An addition of a point will change density property of the neighboring points
c. A point added in in lesser dense region will be noise point
d. An addition of a point can cause merging of two clusters

Answer : Every time a split case may not split two clusters

8. Addition of a point in incremental DBSCAN:
Select one:
a. affects all density reachable points
b. can change core property of the points in 2-epsilon region of the point
c. affects all density connected points
d. can change core property of the points in an epsilon region of the point

Answer : can change core property of the points in an epsilon region of the point

9. Identify correct statement:
Select one:
a. Processing time for incremental updates should be proportional to the size of the increment
b. Incremental mining is easier than stream mining because it is just reapplying a mining algorithm on the whole dataset
c. There are not many applications where incremental updates are required
d. Bulk updates are always better than point wise updates

Answer : Processing time for incremental updates should be proportional to the size of the increment

10. Identify correct statement about CATS tree:
Select one:
a. The tree is optimally sized tree
b. Its construction cost is higher than CAN tree
c. Siblings are ordered by global support
d. Ordering of items within paths from roots to leaves are ordered by global support

Answer : Its construction cost is higher than CAN tree