How To Compute The Range Of A Data Set

Variability describes how far autonomously data points lie from each other and from the center of a distribution. Along with measures of central tendency, measures of variability requite you lot descriptive statistics that summarize your data.

Variability is likewise referred to equally spread, scatter or dispersion. It is most commonly measured with the post-obit:

Range: the difference between the highest and lowest values
Interquartile range: the range of the middle half of a distribution
Standard divergence: boilerplate distance from the mean
Variance: boilerplate of squared distances from the mean

Why does variability matter?

While the central tendency, or average, tells you where most of your points lie, variability summarizes how far apart they are. This is important because the corporeality of variability determines how well you tin can generalize results from the sample to your population.

Low variability is ideal because it means that yous tin improve predict information about the population based on sample information. High variability means that the values are less consistent, so it'southward harder to make predictions.

Data sets can have the same fundamental tendency but different levels of variability or vice versa. If y'all know just the primal trend or the variability, you can't say anything about the other aspect. Both of them together give y'all a consummate moving-picture show of your data.

A graph showing the distribution of 3 samples with the same average, but different variability. — Example: Variability in normal distributions

Range

The range tells you the spread of your data from the lowest to the highest value in the distribution. It'southward the easiest mensurate of variability to calculate.

To observe the range, simply decrease the lowest value from the highest value in the data set.

Range instance

You have 8 data points from Sample A.

Data (minutes)	72	110	134	190	238	287	305	324

The highest value (H) is 324 and the lowest (50) is 72.

R = H – L

R = 324 – 72 = 252

The range of your data is 252 minutes .

Because just two numbers are used, the range is influenced past outliers and doesn't requite you whatsoever information nearly the distribution of values. Information technology'southward best used in combination with other measures.

Interquartile range

The interquartile range gives you lot the spread of the eye of your distribution.

For any distribution that'due south ordered from low to high, the interquartile range contains half of the values. While the start quartile (Q1) contains the outset 25% of values, the quaternary quartile (Q4) contains the last 25% of values.

The interquartile range on a normal distribution

The interquartile range is the third quartile (Q3) minus the first quartile (Q1). This gives us the range of the eye half of a data set.

Interquartile range instance

To notice the interquartile range of your 8 data points, you outset observe the values at Q1 and Q3.

Multiply the number of values in the data set up (8) by 0.25 for the 25th percentile (Q1) and by 0.75 for the 75th percentile (Q3).

Q1 position: 0.25 x 8 = 2

Q3 position: 0.75 ten eight = 6

Q1 is the value in the 2nd position, which is 110. Q3 is the value in the 6th position, which is 287.

IQR = Q3 – Q1

IQR = 287 – 110 = 177

The interquartile range of your data is 177 minutes .

Simply like the range, the interquartile range uses merely 2 values in its calculation. But the IQR is less afflicted past outliers: the 2 values come from the middle one-half of the data set up, and so they are unlikely to be extreme scores.

The IQR gives a consistent measure of variability for skewed equally well every bit normal distributions.

Five-number summary

Every distribution can be organized using a v-number summary:

Lowest value
Q1: 25th percentile
Q2: the median
Q3: 75th percentile
Highest value (Q4)

These five-number summaries can be easily visualized using box and whisker plots.

A box and whisker plot visualizing the five-number summary of the data — Box and whisker plot instance

Standard departure

The standard deviation is the average amount of variability in your dataset.

It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

At that place are six steps for finding the standard deviation by hand:

List each score and find their hateful.
Subtract the mean from each score to go the departure from the hateful.
Foursquare each of these deviations.
Add up all of the squared deviations.
Dissever the sum of the squared deviations by northward – ane (for a sample) or N (for a population).
Observe the square root of the number you institute.

Standard deviation instance

Pace i: Information (minutes)	Footstep 2: Divergence from mean	Steps 3 + 4: Squared deviation
72	72 – 207.5 = -135.5	18360.25
110	110 – 207.5 = -97.5	9506.25
134	134 – 207.5 = -73.5	5402.25
190	190 – 207.5 = -17.5	306.25
238	238 – 207.5 = xxx.v	930.25
287	287 – 207.5 = 79.5	6320.25
305	305 – 207.5 = 97.v	9506.25
324	324 – 207.5 = 116.5	13572.25
Mean = 207.5	Sum = 0	Sum of squares = 63904

Standard divergence example

Because you're dealing with a sample, you utilise n – 1.

northward – 1 = 7

63904 / 7 = 9129.fourteen

Standard divergence instance

southward = √9129.14 = 95.54

The standard deviation of your data is 95.54 . This ways that on average, each score deviates from the mean by 95.54 points.

Standard difference formula for populations

If you have information from the entire population, use the population standard deviation formula:

Formula	Caption
	σ = population standard deviation ∑ = sum of… X = each value μ = population mean N = number of values in the population

Standard divergence formula for samples

If you have data from a sample, utilise the sample standard deviation formula:

Formula	Caption
	southward = sample standard difference ∑ = sum of… X = each value x̅ = sample mean due north = number of values in the sample

Why use n – ane for sample standard deviation?

Samples are used to make statistical inferences about the population that they came from.

When you lot have population data, you lot can become an exact value for population standard deviation. Since y'all collect information from every population fellow member, the standard deviation reflects the precise amount of variability in your distribution, the population.

But when yous use sample data, your sample standard deviation is ever used equally an estimate of the population standard departure. Using n in this formula tends to give you a biased estimate that consistently underestimates variability.

Reducing the sample north to northward – i makes the standard divergence artificially large, giving you a conservative approximate of variability.

While this is not an unbiased estimate, it is a less biased estimate of standard deviation: it is better to overestimate rather than underestimate variability in samples.

The difference betwixt biased and conservative estimates of standard deviation gets much smaller when y'all have a large sample size.

Variance

The variance is the boilerplate of squared deviations from the hateful. A deviation from the mean is how far a score lies from the hateful.

Variance is the square of the standard deviation. This means that the units of variance are much larger than those of a typical value of a information ready.

While it's harder to interpret the variance number intuitively, information technology's important to calculate variance for comparing unlike data sets in statistical tests like ANOVAs.

Variance reflects the degree of spread in the data set up. The more spread the data, the larger the variance is in relation to the mean.

Variance instance

To go variance, foursquare the standard deviation.

s = 95.five

south ²= 95.5 10 95.5 = 9129.14

The variance of your data is 9129.14.

To find the variance past hand, perform all of the steps for standard deviation except for the final step.

Variance formula for populations

Formula	Explanation
	σⁱⁱ = population variance Σ = sum of… Χ= each value μ = population hateful Ν = number of values in the population

Variance formula for samples

Formula	Caption
	s² = sample variance Σ = sum of… Χ= each value x̄ = sample mean n = number of values in the sample

Biased versus unbiased estimates of variance

An unbiased estimate in statistics is i that doesn't consistently give yous either loftier values or low values – it has no systematic bias.

Just like for standard deviation, there are different formulas for population and sample variance. But while there is no unbiased judge for standard deviation, there is one for sample variance.

If the sample variance formula used the sample n, the sample variance would be biased towards lower numbers than expected. Reducing the sample n to north – i makes the variance artificially larger.

In this case, bias is non simply lowered but totally removed. The sample variance formula gives completely unbiased estimates of variance.

And then why isn't the sample standard deviation also an unbiased estimate?

That's considering sample standard difference comes from finding the foursquare root of sample variance. Since a square root isn't a linear operation, like addition or subtraction, the unbiasedness of the sample variance formula isn't carried over the sample standard divergence formula.

What's the best measure out of variability?

The all-time measure out of variability depends on your level of measurement and distribution.

Level of measurement

For information measured at an ordinal level, the range and interquartile range are the only appropriate measures of variability.

For more than circuitous interval and ratio levels, the standard deviation and variance are besides applicative.

Distribution

For normal distributions, all measures can be used. The standard departure and variance are preferred because they take your whole data set up into business relationship, just this also ways that they are easily influenced by outliers.

For skewed distributions or data sets with outliers, the interquartile range is the best measure. It's to the lowest degree afflicted by farthermost values because information technology focuses on the spread in the middle of the data gear up.

Oft asked questions about variability

What is variability?: Variability tells you how far apart points lie from each other and from the center of a distribution or a data prepare.

Variability is too referred to as spread, besprinkle or dispersion.