Information Retrieval - Quiz 1 - BITS PILANI

Information Retrieval -SSZG537- Quiz 1 - BITS PILANI 

1.Distributed indexing is used in:

Select one:
a. All of the above
b. Web-scale indexing
c. Google data centres
d. Parallel tasking

Ans: a. All of the above


2.Which is a good idea for using skip pointers?

Select one:
a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans

Ans: c. Depends upon the no. of comparisons needed

3. Edit distance (Levenshtein distance) is a way of:


Select one:
a. Context-sensitive spelling correction
b. Document correction
c. Isolated word correction
d. Phonetic correction

Ans: c. Isolated word correction

4.Boolean retrieval model does not provide provision for:

Select one:
a. Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked search

Ans: d. Both proximity and ranked search

5. Permuterm indices are used for solving:


Select one:
a. None
b. Boolean queries
c. Phrase queries
d. Wildcard queries

Ans: d. Wildcard queries

6. A large repository of documents in IR is called as:


Select one:

a. Corpus
b. Database
c. Dictionary
d. Collection

Ans: a. Corpus

7. Benefits of using a hash table is:


Select one:

a. Do not need to rehash everything periodically if vocabulary keeps growing.

b. Lookup in a hash table is faster than lookup in a tree.

c. All of the above

d. No prefix search is required

Ans: b. Lookup in a hash table is faster than lookup in a tree.

8. Variable-size postings lists is used when:


Select one:
a. More seek time is desired and the corpus is dynamic
b. Less seek time is desired and the corpus is dynamic
c. Less seek time is desired and the corpus is static
d. More seek time is desired and the corpus is dynamic

Ans: d. More seek time is desired and the corpus is dynamic

9. An alternative to equivalence classing is to do:


Select one:
a.Asymmetric expansion
b. Symmetric expansion
c. Case folding
d. Normalization

Ans: d. Normalization

10. We need external sorting algorithms to:


Select one:

a. Maximize the disk seek time.
b. Maintain constant disk seek time
c. Minimize the disk seek time.
d. None

Ans: c. Minimize the disk seek time.

11. Benefits of using B-trees:


Select one:
a. Re-balancing is cheap
b. Balanced trees allow efficient retrieval
c. Faster O(log M)
d. Solves the prefix problem.

Ans: d. Solves the prefix problem.

12. Postings list should be sorted by:


Select one:
a. Document Frequency
b. DocID
c. TermID
d. Term frequency

Ans: b. DocID

13. Key idea behind Single-pass in-memory indexing is:


Select one:
a. Don’t sort, Accumulate postings in postings lists as they occur.
b. Generate separate dictionaries for each block.
c. All of the above
d. No need to maintain term-termID mapping across blocks.

Ans: c. All of the above

14. For postings of length L, no. of skip pointers required are:

Select one:
a. Use  L evenly-spaced skip pointers

b. Use  L^2 evenly-spaced skip pointers.

c. Use L^1/2 evenly-spaced skip pointers

d. Use 2L evenly-spaced skip pointers.

Ans: c. Use L^1/2 evenly-spaced skip pointers

15. For query optimization while intersecting two postings list, we should:

Select one:
a. Process in the order of increasing document frequency
b. Process in any order
c. None of the above
d. Process in the order of decreasing document frequency

Ans: a. Process in the order of increasing document frequency

16. The goal of IR is to:


Select one:
a.find documents relevant to an information need
b. find documents relevant to an information need from a given document set
c. find documents relevant to an information need from a large document set
d. find documents relevant to an information need from a small document set

Ans: c. find documents relevant to an information need from a large document set

17. Best implementation approach for dynamic indexing is:


Select one:
a. Periodic re-indexing
b. Using Invalidation bit-vector for deleted docs
c. None
d. Using logarithmic merge

Ans: d. Using logarithmic merge

18. Issues in biword indexes are:


Select one:
a. Any one
b. Index blowup due to bigger dictionary
c. Both
d. False positives

Ans: c. Both

19. Any string of terms of the following form is called an extended biword:

Select one:
a. NNX*
b. NXNN
c. *NNX
d. NX*N

Ans:d. NX*N

20. Structured data allows for:


Select one:

a. Does not depend on data complexity

b. Less complex queries

c. No relationship

d. More complex queries

Ans: d. More complex queries

21. Blocked sort-based Indexing is a method of:


Select one:
a. Sorting with more disk seeks.
b. Merging with fewer disk seeks.
c. Comparing with fewer disk seeks.
d. Sorting with fewer disk seeks.

Ans: a. Sorting with more disk seeks.

22. Term-document incidence matrix is:


Select one:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict

Ans: a. Sparse

23. Lemmatization is a technique for:


Select one:
a. Ranking documents
b. Case folding
c. Normalization
d. Tokenization

Ans: c. Normalization

24. If list lengths are x and y, merge takes:


Select one:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations

Ans: d. O(x+y) operations

25. Unstructured data tends to refer to information on the web and is processed using:

Select one:
a. Both
b. Database systems
c. IR systems
d. None

Ans: c. IR systems

2 comments: