Machine Learning - ZC464 - Quiz 2 BITS PILANI WILP - 2017

 Machine Learning (ISZC464) Quiz 2
BITS PILANI WILP - 2017


1. A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many possible different examples are there?
Select one:
a. 12
b. 48
c. 24
d. 72

Ans: d. 72

2. Which of the following statements are true for k-NN classifiers
Select one:
a. The decision boundary is linear.
b. The decision boundary is smoother with smaller values of k.
c. k-NN does not require an explicit training step.
d. The classification accuracy is better with larger values of k.

Ans: c. k-NN does not require an explicit training step.

3. Which of the following statements are false?
Select one:
a. Decision tree is learned by maximizing information gain
b. Density estimation (using say, the kernel density estimator) can be used to perform classification.
c. No classifier can do better than a naive Bayes classifier if the distribution of the data is known
d. The training error (error on training set) of 1-NN classifier is 0

Ans: c. No classifier can do better than a naive Bayes classifier if the distribution of the data is known

4. Suppose we wish to calculate P(H | E1, E2) and we have no conditional independence information. Which of the following sets are sufficient for computing this (minimal set)?
Select one:
a. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)
b. P(E1, E2), P(H), P(E1, E2| H)
c. P(E1, E2) , P(H), P(E1|H), P(E2|H)
d. P(H), P(E1| H), P(E2|H)
Feedback

Ans: b. P(E1, E2), P(H), P(E1, E2| H)

5. In neural networks, nonlinear activation functions such as sigmoid and tanh
Select one:
a. help to learn nonlinear decision boundaries
b. always output values between 0 and 1
c. speed up the gradient calculation in backpropagation, as compared to linear units
d. are applied only to the output units
Feedback

Ans: a. help to learn nonlinear decision boundaries

6. Which of the following statements about Naive Bayes is incorrect?
Select one:
a. Attributes are statistically independent of one another given the class value.
b. Attributes are statistically dependent of one another given the class value.
c. Attributes can be nominal or numeric
d. Attributes are equally important.

Ans: b. Attributes are statistically dependent of one another given the class value.

7. As the number of training examples goes to infinity, your model trained on that data will have:
Select one:
a. Lower variance
b. None of the other options
c. Higher Variance
d. Does not affect variance
Feedback

Ans: a. Lower variance

8. Which of the following statements are true?
Select one:
a. The depth of a learned decision tree can be larger than the number of training examples used to create the tree.
b. Suppose data has R records, the maximum depth of the decision tree must be less than 1 + log2R
c. Cross validation can be used detect and reduce overfitting
d. As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.

Ans: c. Cross validation can be used detect and reduce overfitting

9. Which of the following strategies cannot help reduce overfitting in decision trees?
Select one:
a. Make sure each leaf node is one pure class
b. Enforce a maximum depth for the tree
c. Enforce a minimum number of samples in leaf nodes
d. Pruning
Feedback

Ans: a. Make sure each leaf node is one pure class

10. If A and B are conditionally independent given C, are A and B independent, which of the following is not true?
Select one:
a. P(B|A, C) = P(B|C)
b. P(A,B| C) = P(A) P(B)
c. P(A,B,C) = P(C) P(A|C) P(B|C)
d. P(A|B, C) = P(A|C)
Feedback

Ans: b. P(A,B| C) = P(A) P(B)

11. Which of the following statements are false?
Select one:
a. We can get multiple local optimum solutions if we solve a linear regression problem by minimizing the sum of squared errors using gradient descent.
b. When a decision tree is grown to full depth, it is more likely to fit the noise in the data
c. When the hypothesis space is richer, over fitting is more likely
d. We can use gradient descent to learn a Gaussian Mixture Model.

Ans: a. We can get multiple local optimum solutions if we solve a linear regression problem by minimizing the sum of squared errors using gradient descent.

12. Suppose we wish to calculate P(H | E1, E2) and we know that P(E1| H, E2) = P(E1|H) for all the values of H, E1, E2. Now which of the following sets are sufficient?
Select one:
a. P(E1, E2) , P(H), P(E1|H), P(E2|H)
b. P(E1, E2), P(H), P(E1, E2| H)
c. P(H), P(E1| H), P(E2|H)
d. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)

Ans: b. P(E1, E2), P(H), P(E1, E2| H)

13. As the number of training examples goes to infinity, your model trained on that data will have:
Select one:
a. Lower Bias
b. Same Bias
c. Higher Bias
d. None of the other options

Ans: b. Same Bias

14. For polynomial regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting
Select one:
a. The assumed variance of the Gaussian noise
b. Whether we learn the weights by gradient descent
c. The use of a constant-term unit input
d. The polynomial degree

Ans: d. The polynomial degree

15. High entropy means that the partitions in decision tree classification are
Select one:
a. Not pure
b. Pure
c. Useful
d. Useless

Ans: a. Not pure


No comments:

Post a Comment