Advanced Data Mining - Quiz 3 BITS WILP - Mtec Software Systems - 2017

Advanced Data Mining - Quiz 3
BITS WILP - Mtec Software Systems - 2017


Question 1
What are the stop words mean in a text document.
Select one:
a. Less commonly used words that have high information
b. Punctuation and special symbols
c. Most commonly used words in a text that contribute to no information d. Designated keywords

answer : Most commonly used words in a text that contribute to no information

Question 2
Consider tree mining in a larger tree database. Which of the following kind support does NOT maintain anti monotone property?
Select one:
a. Hybrid
b. Occurance-Based Correct
c. Transactional-Based

answer : Occurance-Based

Question 3
Consider Gibson, David, Kumar algorithm for determining dense subgraph in massive graph. Choose the best suitable choice for fingerprinting in this algorithm.
Select one:
a. Fingerprinting is parameter independent
b. Fingerprinting helps to compute Jaccard Coefficient
c. Fingerprinting can be avoided
d. Fingerprinting reduces comparison time Correct

answer : Fingerprinting reduces comparison time

Question 4
Consider representation of a text data using features. Which one of the following is typically supposed to work best 
Select one:
a. tfidf Correct
b. term document index matrix
c. word wise sorted array
d. term document count matrix

answer : tfidf

Question 5
Which of the following statements are true with respect to hyperlink induced topic search.
Select one or more:
a. Base set contains both the hub and authoritative pages. Correct
b. Authoritative pages cannot point to other pages in the network
c. HUB pages contain data that is searched by user Incorrect
d. Page-rank can only efficiently be defined recursively.

answer : Page-rank can only efficiently be defined recursively., Base set contains both the hub and authoritative pages.

Question 6
Which one of the following is worst representation of a tree database
Select one:
a. DFS
b. Link List
c. Flat File (Text form) Correct
d. BFS

answer : Flat File (Text form)

Question 7
In a social network graph where node represent person and directional edge represent "following" (A is following B). Which of the statement below is true in general. 
Select one:
a. Higher out-degree of a node represents influential person
b. Higher in-degree of a node represents influential person Correct
c. Both A and B
d. None of the above

answer : Higher in-degree of a node represents influential person

Question 8
Consider the following graph.

Determine similarity between node A and B using Jaccard Coefficient.
Select one:
a. 2/6
b. 2/7 Correct
c. 3/8
d. 2

answer : 2/7

Question 9
Extensible Markup Language (XML) is a format to store database in following form
Select one:
a. Sequential
b. Relational
c. Unstructured
d. Hierarchical Correct

answer : Hierarchical

Question 10
Consider a graph representing social network where nodes represent persons and edges to friends. Now consider a 2D matrix A of integers where A[i,j] represents length of the path between node i and j. Let d be the maximum value in the matrix A. 

Since social networks are dynamic in nature. When number of nodes increases in a social network graph what is expected effect on the value of d?
Select one:
a. Value of d is expected to increase
b. Value of d is expected to decrease Correct
c. Value of d is independent of this change
d. None of the above.

answer : Value of d is expected to decrease

Question 11
A system that has denied 5 genuine (right person wanting access) attempts of authentication out of 20. And have allowed 5 imposer attempts (attacker wanting access to the system) out of 15. Has accuracy
Select one:
a. (20/35)*100 %
b. (10/35)*100 %
c. (15/35)*100 %
d. (25/35)*100 % Correct

answer : (25/35)*100 %

Question 12
PK-Means algorithm can sometime provide non optimal clustering
Select one or more:
a. Because of the dependence on initial choice of centroids Correct
b. Because of the dependence on processing done at combiner
c. Because of the dependence on distribution of data on map machines Incorrect
d. Because of the dependence on processing done at reducer

answer : Because of the dependence on initial choice of centroids

Question 13
What is the support of sequence <{1}{3,4}> in following database
  D= <{2,3}{1,3}{3,4,5}>,<{2,5}{1}{3,4,5}>,<{1,5}{2,5}{3}{3,5}{4}>,<{1,5}{2,3,5}{3,4}{1,4,6}>
Select one:
a. 50%
b. 45%
c. 90%
d. 75% Correct

answer : 75%

Question 14
Consider following architecture of a parallel crawler. 

Which part is responsible to implement freshness property
Select one:
a. URL Frontier Correct
b. Parse
c. URL Filter
d. Host Splitter

answer : URL Frontier

Question 15
Handling of Big Data is challenging because of  
Select one:
a. Large number of data points or Volume
b. Data may be from various sources and could have different formatting
c. Data may be continuously arriving that makes processing difficult
d. All of above Correct

answer : All of above

No comments:

Post a Comment