Coding Tech Life: Machine Learning Quiz

Machine Learning (ISZC464) Quiz 2

BITS PILANI WILP - 2017

1. A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many possible different examples are there?
Select one:
a. 12
b. 48
c. 24
d. 72

Ans: d. 72

2. Which of the following statements are true for k-NN classifiers
Select one:
a. The decision boundary is linear.
b. The decision boundary is smoother with smaller values of k.
c. k-NN does not require an explicit training step.
d. The classification accuracy is better with larger values of k.

Ans: c. k-NN does not require an explicit training step.

3. Which of the following statements are false?
Select one:
a. Decision tree is learned by maximizing information gain
b. Density estimation (using say, the kernel density estimator) can be used to perform classification.
c. No classifier can do better than a naive Bayes classifier if the distribution of the data is known
d. The training error (error on training set) of 1-NN classifier is 0

Ans: c. No classifier can do better than a naive Bayes classifier if the distribution of the data is known

4. Suppose we wish to calculate P(H | E1, E2) and we have no conditional independence information. Which of the following sets are sufficient for computing this (minimal set)?
Select one:
a. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)
b. P(E1, E2), P(H), P(E1, E2| H)
c. P(E1, E2) , P(H), P(E1|H), P(E2|H)
d. P(H), P(E1| H), P(E2|H)
Feedback

Ans: b. P(E1, E2), P(H), P(E1, E2| H)

5. In neural networks, nonlinear activation functions such as sigmoid and tanh
Select one:
a. help to learn nonlinear decision boundaries
b. always output values between 0 and 1
c. speed up the gradient calculation in backpropagation, as compared to linear units
d. are applied only to the output units
Feedback

Ans: a. help to learn nonlinear decision boundaries

6. Which of the following statements about Naive Bayes is incorrect?
Select one:
a. Attributes are statistically independent of one another given the class value.
b. Attributes are statistically dependent of one another given the class value.
c. Attributes can be nominal or numeric
d. Attributes are equally important.

Ans: b. Attributes are statistically dependent of one another given the class value.

7. As the number of training examples goes to infinity, your model trained on that data will have:
Select one:
a. Lower variance
b. None of the other options
c. Higher Variance
d. Does not affect variance
Feedback

Ans: a. Lower variance

8. Which of the following statements are true?
Select one:
a. The depth of a learned decision tree can be larger than the number of training examples used to create the tree.
b. Suppose data has R records, the maximum depth of the decision tree must be less than 1 + log2R
c. Cross validation can be used detect and reduce overfitting
d. As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant.

Ans: c. Cross validation can be used detect and reduce overfitting

9. Which of the following strategies cannot help reduce overfitting in decision trees?
Select one:
a. Make sure each leaf node is one pure class
b. Enforce a maximum depth for the tree
c. Enforce a minimum number of samples in leaf nodes
d. Pruning
Feedback

Ans: a. Make sure each leaf node is one pure class

10. If A and B are conditionally independent given C, are A and B independent, which of the following is not true?
Select one:
a. P(B|A, C) = P(B|C)
b. P(A,B| C) = P(A) P(B)
c. P(A,B,C) = P(C) P(A|C) P(B|C)
d. P(A|B, C) = P(A|C)
Feedback

Ans: b. P(A,B| C) = P(A) P(B)

11. Which of the following statements are false?
Select one:
a. We can get multiple local optimum solutions if we solve a linear regression problem by minimizing the sum of squared errors using gradient descent.
b. When a decision tree is grown to full depth, it is more likely to fit the noise in the data
c. When the hypothesis space is richer, over fitting is more likely
d. We can use gradient descent to learn a Gaussian Mixture Model.

Ans: a. We can get multiple local optimum solutions if we solve a linear regression problem by minimizing the sum of squared errors using gradient descent.

12. Suppose we wish to calculate P(H | E1, E2) and we know that P(E1| H, E2) = P(E1|H) for all the values of H, E1, E2. Now which of the following sets are sufficient?
Select one:
a. P(E1, E2) , P(H), P(E1|H), P(E2|H)
b. P(E1, E2), P(H), P(E1, E2| H)
c. P(H), P(E1| H), P(E2|H)
d. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)

Ans: b. P(E1, E2), P(H), P(E1, E2| H)

13. As the number of training examples goes to infinity, your model trained on that data will have:
Select one:
a. Lower Bias
b. Same Bias
c. Higher Bias
d. None of the other options

Ans: b. Same Bias

14. For polynomial regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting
Select one:
a. The assumed variance of the Gaussian noise
b. Whether we learn the weights by gradient descent
c. The use of a constant-term unit input
d. The polynomial degree

Ans: d. The polynomial degree

15. High entropy means that the partitions in decision tree classification are
Select one:
a. Not pure
b. Pure
c. Useful
d. Useless

Ans: a. Not pure

BITS PILANI WILP - 2017

1. Averaging the output of multiple decision trees helps
Select one:
a. Increase bias
b. Increase variance
c. Decrease bias
d. Decrease variance

Ans: d. Decrease variance

2. Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years. What kind of learning problem is this?
Select one:
a. None of the given answers
b. Unsupervised Learning
c. Supervised Learning
d. Reinforcement Learning

Ans: c. Supervised Learning

3. Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments. What kind of learning problem is this?
Select one:
a. Supervised Learning
b. Unsupervised Learning
c. None of the given answers
d. Reinforcement Learning

Ans: b. Unsupervised Learning

4. In farming, given data on crop yields over the last 50 years, learn to predict next year's crop yields. What kind of learning problem is this?
Select one:
a. None of the given answers
b. Unsupervised Learning
c. Reinforcement Learning
d. Supervised Learning

Ans: d. Supervised Learning

5. Suppose we wish to calculate P(H | E1, E2) and we have no conditional independence information. Which of the following sets are sufficient for computing this (minimal set)?
Select one:
a. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)
b. P(H), P(E1| H), P(E2|H)
c. P(E1, E2) , P(H), P(E1|H), P(E2|H)
d. P(E1, E2), P(H), P(E1, E2| H)

Ans: c. P(E1, E2) , P(H), P(E1|H), P(E2|H)

6. Which of the following strategies cannot help reduce overfitting in decision trees?
Select one:
a. Enforce a maximum depth for the tree
b. Enforce a minimum number of samples in leaf nodes
c. Make sure each leaf node is one pure class
d. Pruning

Ans: c. Make sure each leaf node is one pure class

7. Suppose we wish to calculate P(H | E1, E2) and we know that P(E1| H, E2) = P(E1|H) for all the values of H, E1, E2. Now which of the following sets are sufficient?
Select one:
a. P(E1, E2| H) , P(H), P(E1|H), P(E2|H)
b. P(E1, E2) , P(H), P(E1|H), P(E2|H)
c. P(E1, E2), P(H), P(E1, E2| H)
d. P(H), P(E1| H), P(E2|H)

Ans: b. P(E1, E2) , P(H), P(E1|H), P(E2|H)

8. Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow "similar" or "related". What kind of learning problem is this?
Select one:
a. None of the given answers
b. Reinforcement Learning
c. Unsupervised Learning
d. Supervised Learning

Ans: c. Unsupervised Learning

9. Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this. What machine learning task is this?
Select one:
a. Clustering
b. Classification
c. None of the given answers
d. Regression

Ans: b. Classification

10. You’ve just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem?
Select one:
a. You need to increase the learning rate.
b. Your decision trees are too shallow.
c. All of the other given options.
d. You are overfitting.

Ans: a. You need to increase the learning rate.

11. Which of the following statements about classification is true?
Select one:
a. As the number of data points grows to infinity, the MAP estimate approaches the MLE estimate for all possible priors. In other words, given enough data, the choice of prior is irrelevant
b. No classifier can do better than a naive Bayes classifier if the distribution of the data is known
c. Density estimation (using say, the kernel density estimator) can be used to perform classification
d. The depth of a learned decision tree can be larger than the number of training examples used to create the tree

Ans: c. Density estimation (using say, the kernel density estimator) can be used to perform classification

12. Consider the task of examining a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail. What kind of learning problem is this?
Select one:
a. Reinforcement Learning
b. Supervised Learning
c. Unsupervised Learning
d. None of the given answers

Ans: c. Unsupervised Learning

13. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this. What machine learning task is this?
Select one:
a. Classification
b. Clustering
c. Regression
d. None of the given answers

Ans: c. Regression

14. For polynomial regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting
Select one:
a. The assumed variance of the Gaussian noise
b. The use of a constant-term unit input
c. Whether we learn the weights by gradient descent
d. The polynomial degree

Ans: d. The polynomial degree

15. Which of the following statements is true?
Select one:
a. Given m data points, the training error converges to the true error as m →∞
b. Decision tree is learned by minimizing information gain
c. Linear regression estimator has the smallest variance among all unbiased estimators
d. A classifier trained on less training data is less likely to overfit
Previous page

Ans: a. Given m data points, the training error converges to the true error as m →∞

16. Given 50 articles written by male authors, and 50 articles written by female authors, learn to predict the gender of a new manuscript's author (when the identity of this author is unknown). What kind of learning problem is this?
Select one:
a. None of the given answers
b. Supervised Learning
c. Reinforcement Learning
d. Unsupervised Learning

Ans: b. Supervised Learning

Coding Tech Life

Machine Learning - ZC464 - Quiz 2 BITS PILANI WILP - 2017

Machine Learning Quiz 1 BITS PILANI WILP