Module 9 Lesson 1 - Read

ReadMeasures of Spread

Overview

People waiting in line

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation. The standard deviation is a number that measures how far data values are from their mean. The standard deviation provides a numerical measure of the overall amount of variation in a data set and can be used to determine whether a particular data value is close to or far from the mean.

Standard Deviation

The standard deviation is always positive or zero. It is small when the data are all concentrated close to the mean, exhibiting little variation or spread. It is larger when the data values are more spread out from the mean, exhibiting more variation.

Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B. The average wait time at both supermarkets is five minutes. At supermarket A, the standard deviation for the wait time is two minutes. At supermarket B, the standard deviation for the wait time is four minutes.

Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B. Overall, wait times at supermarket B are more spread out from the average. Wait times at supermarket A are more concentrated near the average.

Suppose that Rosa and Ben both shop at supermarket A. Rosa waits at the checkout counter for seven minutes, and Ben waits for one minute. At supermarket A, the mean waiting time is five minutes and the standard deviation is two minutes. The standard deviation can be used to determine whether a data value is close to or far from the mean.

Rosa waits for seven minutes:

  • Rosa's wait time of seven minutes is two minutes longer than the average of five minutes. Two minutes is equal to one standard deviation.
  • Rosa's wait time of seven minutes is one standard deviation above the average of five minutes.

Ben waits for one minute:

  • Ben's wait time of one minute is two standard deviations below the average of five minutes. Four minutes is equal to two standard deviations.market
  • A data value that is two standard deviations from the average is just on the borderline for what many would consider to be far from the average. Considering data to be far from the mean if it is more than two standard deviations away is more of an approximate "rule of thumb" than a rigid rule.

The number line may help you understand the standard deviation in our supermarket example. Rosa and Ben are both shopping at supermarket A where the standard deviation for wait times is two minutes, and the average wait time is five minutes.

Rosa waited seven minutes. If we look at five (the average wait time) and seven (the time she actually waited) on a number line, seven is two spaces to the right of five. We say that seven is one standard deviation to the right of five.

Ben waited one minute. If we look at five (the average wait time) and one (the time he actually waited) on a number line, one is four spaces to the left of five. We say that one is two standard deviations to the left of five.

Inches Measurements

Calculating the Standard Deviation

If x is a number, then the difference between x and the mean is called its deviation. Deviation just means how far from the normal. The deviation is the amount by which a single measurement differs from a fixed value such as the mean. In a data set, there are as many deviations as there are items in the data set. Every value is a different distance from the mean and therefore has its own deviation. The deviations are used to calculate the standard deviation. The standard deviation is a measure of how spread out numbers are. Its symbol is LaTeX: \sigmaσ (the Greek letter sigma).

The formula is the square root of the variance.

The variance is the average of the squared differences from the mean. To calculate the variance, follow these steps:

  1. Find the mean.
  2. For each value, subtract the mean and square the result. Then work out the average of the squared differences.

 

ExpandCalculating and Interpreting

Discover

Calculating standard deviation can be complicated and time consuming depending on the size of the data. There are two separate formulas, one for the standard deviation of a population, and another for the standard deviationcrowd of people of a sample. To find the standard deviation of a population:

  1. Find the mean of the data.
  2. Find the difference in each value and the mean.
  3. Square the result of each value minus the mean.
  4. Find the sum of the differences squared.
  5. Divide this sum by the number of values in the data set. (This number is the variance.)
  6. Find the square root of the variance. (This number is the standard deviation.)

The only difference when finding the standard deviation of a sample is in step 5 to divide by the number of values in the data set minus one.

Example

Consider the population:LaTeX: \left\lbrace{-5,1,8,7,2}\right\rbrace{5,1,8,7,2}. The mean is the sum of all values divided by the number of values. LaTeX: \frac{13}{5}=2.6135=2.6. The mean is 2.6. Next, take each value, subtract the mean, and square the result.

Difference in Value and Mean Difference in Value and Mean Squared
-5 -5 - 2.6 = -7.6 57.76
1 1 - 2.6 = -1.6 2.56
8 8 - 2.6 = 5.4 29.16
7 7 - 2.6 = 4.4 19.36
2 2 - 2.6 = -0.6 0.36

The sum of the differences squared is 109.2. The sum of this value is then divided by the number of values in the data population. LaTeX: \frac{109.2}{5}=21.84109.25=21.84. The variance is 21.84.

Standard deviation is the square root of the variance. The square root of 21.84 = 4.67. Standard deviation for this data set is 4.67.

If this data set were a sample instead of a population, there would be only one difference in the calculation. After finding the sum of the differences, divide by the number of values in the data minus 1. LaTeX: \frac{109.2}{4}=27.3109.24=27.3. This is the variance. The standard deviation is the square root of the variance. The square root of 27.3 = 5.22.

There are many standard deviation calculators available on the Internet. Numbers of data can be plugged in and calculations done quickly.

How much the statistic varies from one sample to another is known as the sampling variability of a statistic. The variability in data depends upon the method by which the outcomes are obtained, for example, by measuring or by random sampling.

When the standard deviation is zero, there is no spread; that is, all the data values are equal to each other. The standard deviation is small when the data are all concentrated close to the mean, and is larger when the data values show more variation from the mean. When the standard deviation is much larger than zero, the data values are very spread out around the mean. You typically measure the sampling variability of a statistic by its standard error.

The standard error of the mean is an example of a standard error. It is a special standard deviation and is known as the standard deviation of the sampling distribution of the mean.

Interpreting Standard Deviation

The standard deviation, when first presented, can seem unclear. By graphing your data, you can get a better feel for the deviations and the standard deviation. This visual model of the data will help you to interpret how the deviations are measured.

You will find that in symmetrical distributions, the standard deviation can be very helpful but in skewed distributions, the standard deviation may not be much help. The reason is that the two sides of a skewed distribution have different spreads. Because numbers can be confusing, always graph your data. Display your data in a histogram or a box plot.

For ANY data set, no matter what the distribution of the data is:

  • At least LaTeX: 75\%75% of the data is within two standard deviations of the mean.
  • At least LaTeX: 89\%89% of the data is within three standard deviations of the mean.
  • At least LaTeX: 95\%95% of the data is within 4.5 standard deviations of the mean.

 

 

Module 9 Lesson 1 of 4