Cluster Analysis: Partition Methods
Stata offers two commands for partitioning observations into k number of clusters. These commands are cluster kmeans
and
cluster kmedians
and use means and medians to create the partitions. Both require using the
k(number of groups)
option. From there, your further specifications will depend on the details of your situations.
In general, Stata offers options that determine what similarity (or dissimilarity) measure will be used (see help
measure options
within Stata or the measure option entry in Stata's Multivariate Statistics manual) via the
measure
option.This is particularly relevant for continuous versus binary data.
Usefully, you can also give the cluster analysis a name via the name([name of cluster])
option. This can be
a good way to differentiate between iterations of the command if you try multiple k values.
Additionally, you can select a method by which the initial group centers will be determined using the start([option])
option. There are eight start options. Three of these deal with various random methods of choosing the initial k groups. One makes
initial groups using the firstk observations and one makes k initial groups using the last k observations. The other three use different methods.
For more information see help cluster kmeans
which includes an explanation of the various
start
options.
The keepcenters
option tells Stata to retain the group means (or medians, depending on which command you use)
and append them to the data set (i.e., your last k observations in the data set are now the means or medians from your k groups).
There are two advanced options as well. The first is generate([groupvar])
which creates a new variable in the
data set assigning observations according to their groups as determined by the cluster analysis. The second option is
iterate([value])
which limits the amount of iterations allowed to the clustering algorithim. The default is 10,000.
The basic syntax is simply cluster kmeans [variables for clustering], k([# of groups]) [additional options]
Additionally, you can see help cluster kmeans
for examples pr [MV] cluster kmeans and kmedians.