###### Three high-quality sources from journals related to my thesis above. The sources should be effective in supporting my argument (thesis above) for my intended audience.
###### Research Design is due at the end of this week: describe how you will test the hypothesis and carry out your analysis. This section describes the data to be used to test the hypothesis, how the student will operationalize and collect data on his/her variables, and the analytic methods that to be used, noting potential biases and limitations to the research approach
1. Short-answer questions (10 points each)

a. Briefly describe why clusteirng is one kind of unsupervised learning

Briefly describe the main difference between K-means and K-medoid methods.

d. In data mining, one of the fields is outlier analysis. Explain what is an outlier? Are outliers noise data?

b. Briefly describe how a K-means clustering works

e. A good clustering method will produce high quality clusters. What criteria can we use to judge where clusters are high quality clusters?

f. List out at least two drawbacks of K-means clustering approach.

g. In hierarchical clustering, there are different ways to measure the distances between clusters, e.g. single linkage, complete linkage, and average linkage. Briefly describe the difference among these three distance measures.

2. Given the following distance matrix of four data points 1, 2, 3, and 4: (Requirement: Report all the partial trees and matrices for the intermediate steps.)

Perform hierarchical clustering using single-linkage, complete linkage, and average linkage similarity measures (30 points);

