Information Retrieval -SSZG537- Quiz 1 - BITS PILANI
Select one:
a. All of the above
b. Web-scale indexing
c. Google data centres
d. Parallel tasking
Ans: a. All of the above
2.Which is a good idea for using skip pointers?
Select one:
a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans
Ans: c. Depends upon the no. of comparisons needed
3. Edit distance (Levenshtein distance) is a way of:
Select one:
a. Context-sensitive spelling correction
b. Document correction
c. Isolated word correction
d. Phonetic correction
Ans: c. Isolated word correction
Select one:
a. Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked search
Ans: d. Both proximity and ranked search
Select one:
a. None
b. Boolean queries
c. Phrase queries
d. Wildcard queries
Ans: d. Wildcard queries
6. A large repository of documents in IR is called as:
Select one:
a. Corpus
b. Database
c. Dictionary
d. Collection
Ans: a. Corpus
7. Benefits of using a hash table is:
Select one:
a. Do not need to rehash everything periodically if vocabulary keeps growing.
b. Lookup in a hash table is faster than lookup in a tree.
c. All of the above
d. No prefix search is required
Ans: b. Lookup in a hash table is faster than lookup in a tree.
8. Variable-size postings lists is used when:
Select one:
a. More seek time is desired and the corpus is dynamic
b. Less seek time is desired and the corpus is dynamic
c. Less seek time is desired and the corpus is static
d. More seek time is desired and the corpus is dynamic
Ans: d. More seek time is desired and the corpus is dynamic
9. An alternative to equivalence classing is to do:
Select one:
a.Asymmetric expansion
b. Symmetric expansion
c. Case folding
d. Normalization
Ans: d. Normalization
10. We need external sorting algorithms to:
Select one:
a. Maximize the disk seek time.
b. Maintain constant disk seek time
c. Minimize the disk seek time.
d. None
Ans: c. Minimize the disk seek time.
11. Benefits of using B-trees:
Select one:
a. Re-balancing is cheap
b. Balanced trees allow efficient retrieval
c. Faster O(log M)
d. Solves the prefix problem.
Ans: d. Solves the prefix problem.
Select one:
a. Document Frequency
b. DocID
c. TermID
d. Term frequency
Ans: b. DocID
13. Key idea behind Single-pass in-memory indexing is:
Select one:
a. Don’t sort, Accumulate postings in postings lists as they occur.
b. Generate separate dictionaries for each block.
c. All of the above
d. No need to maintain term-termID mapping across blocks.
Ans: c. All of the above
14. For postings of length L, no. of skip pointers required are:
Select one:
a. Use L evenly-spaced skip pointers
b. Use L^2 evenly-spaced skip pointers.
c. Use L^1/2 evenly-spaced skip pointers
d. Use 2L evenly-spaced skip pointers.
Ans: c. Use L^1/2 evenly-spaced skip pointers
15. For query optimization while intersecting two postings list, we should:
Select one:
a. Process in the order of increasing document frequency
b. Process in any order
c. None of the above
d. Process in the order of decreasing document frequency
Ans: a. Process in the order of increasing document frequency
16. The goal of IR is to:
Select one:
a.find documents relevant to an information need
b. find documents relevant to an information need from a given document set
c. find documents relevant to an information need from a large document set
d. find documents relevant to an information need from a small document set
Ans: c. find documents relevant to an information need from a large document set
17. Best implementation approach for dynamic indexing is:
Select one:
a. Periodic re-indexing
b. Using Invalidation bit-vector for deleted docs
c. None
d. Using logarithmic merge
Ans: d. Using logarithmic merge
Select one:
a. Any one
b. Index blowup due to bigger dictionary
c. Both
d. False positives
Ans: c. Both
19. Any string of terms of the following form is called an extended biword:
Select one:
a. NNX*
b. NXNN
c. *NNX
d. NX*N
Ans:d. NX*N
20. Structured data allows for:
Select one:
a. Does not depend on data complexity
b. Less complex queries
c. No relationship
d. More complex queries
Ans: d. More complex queries
21. Blocked sort-based Indexing is a method of:
Select one:
a. Sorting with more disk seeks.
b. Merging with fewer disk seeks.
c. Comparing with fewer disk seeks.
d. Sorting with fewer disk seeks.
Ans: a. Sorting with more disk seeks.
22. Term-document incidence matrix is:
Select one:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict
Ans: a. Sparse
23. Lemmatization is a technique for:
Select one:
a. Ranking documents
b. Case folding
c. Normalization
d. Tokenization
Ans: c. Normalization
24. If list lengths are x and y, merge takes:
Select one:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations
Ans: d. O(x+y) operations
25. Unstructured data tends to refer to information on the web and is processed using:
Select one:
a. Both
b. Database systems
c. IR systems
d. None
Ans: c. IR systems
تسلم
ReplyDeleteThanks :)
ReplyDelete