What is Robust Statistics

The robust statistics is defined as measures on which extreme observations have little effect.

For example:

If we change one of the values in the data set to much larger, say 1000. The mean increases greatly, but the median stays the same at 3.5. In another words, the median is robust to the extreme observation. The reason behind this is that the mean depends on all observations in data set, while the median only depends on the midpoint of the distribution.

DataMeanMedian
1, 2, 3, 4, 5, 6 3.53.5
1, 2, 3, 4, 5, 10001693.5

Findings

For the measures of center and measures of spread,

  • We just established that the median is more robust statistic than the mean.
  • Going along with this, the IQR, which is based on the median, is a more robust statistic than the standard deviation which is calculated using the mean. As well as range, which relies solely on the most extreme observations.
RobustNon-robust
centermedianmean
spreadIQRStandard Deviation, range

Robust statistics are most useful for describing skewed distribution, or those with extreme observations. While non-robust statistics, like mean and standard deviation, are useful for describing symmetric distributions.