In his book A Field guide to Lies and Statistics, Daniel Levitin helps us to better understand published figures without being misled by representation of the writer. This article deals with two of the topics that are discussed in detail in this book: representing data in, among others, averages and graphs, and the quadrant for critical thinking that helps to interpret figures.
The internet has ensured that information has become accessible to everyone. In addition, anyone can interpret information, describe it as new content and publish it on the internet again.Research results are published on the conveyor belt and greedy readers copy the results without thinking or conclude completely different conclusions. Levitin describes how we, as individuals, can better deal with the information by placing the research results in a better context. It is important that we regularly ask ourselves how a figure has been generated, how it is presented, and what the figure actually means.
There are several ways to ASSESS the DATA, starting with the context. Where did the research take place? How many people participated? The smaller the group, the more uncertain the result. When 20% of children born in a hospital in one month are female, it seems low at first sight. However, if this is a small hospital in the countryside where perhaps only 20 children per month are born, the sample of 20 children may be too small to draw a conclusion about the surprising outcome.
By using terms like average, incorrect assumptions can also be made quickly. When an average family has 3 children, how many brothers or sisters does an average child have? The answer, unlike usually expected, is not always 2. When we count the number of children per family, we compare families and we count each child once. However, when we count the average number of brothers and sisters, we compare children. The larger the family, the more often 1 child is counted as a sibling. A family of 10 children counts once in the family comparison, but counts 10 times in the number of brothers and sisters comparison. In addition, families who have no children at all are taken into account for the “how many kids” statistic, but not in the “how many siblings” statistic.
The average is also a dangerous value, because 1 large deviating number can change the average radically. That is why the values median and mode have been created. The median is the middle value in a sorted row of numbers. 50% of the figures are larger than this median, and 50% are smaller than the median. The Mode describes the value that occurs most often in a series.
In addition to the context in which data is collected and how the values are interpreted, the representation of the data can also put you on the wrong track. When axes are not named for example, or even smarter, when there are 2 vertical axes with different scales, an increase of 10% of variable A can look just as steep as an increase of 5% with variable B. Axes can also be omitted partly. If you want to have 90% decrease to 85%, do not show the full 100% on a vertical axis, but only the range 75-100%. The 5% now suddenly looks four times as big!
Finally, there is the trick of cumulatively increasing a graph. For example, if you have sales figures for 20 quarters, but the 20th quarter has halved the number of sales, then show a cumulative graph over all 20 quarters. The halving is now limited to a slightly less steep line in the graph instead of a halving of a bar.
The second major theme that comes back a lot in this book is the understanding of probability. Levitin recommends creating a QUARTER FOR CRITICAL THINKING with a variable on each axis. The following example shows that making this quadrant helps to think critically. Levitin describes the example of women and breast cancer. An article leads with an exciting statistic: 93% of breast cancer patients fall into the high-risk group. What does this statement mean? Counting high-risk group women within a sample of breast cancer patients is not the same as counting breast cancer patients in a high-risk group. To calculate this probability, we need more information: the chance that a woman has breast cancer, which is 0.8% and, the fact that 57% of the women fall into the higher risk group. We can use the three known values to fill in the quadrant of figure 1 below.
- Step 1: 0.8% of women have breast cancer that is 8 out of 1000 women. (see figure 1 on the left)
- Step 2: 93% of 8 women who have breast cancer 7.44, (rounded 7) fall in the high-risk group (see the middle table in figure 1).
- Step 3: 57% of women fall into the high-risk group, so the total number of women in column 1 is 570. (see the right table in figure 1)
- Step 4: the chance that a woman from the high-risk group has breast cancer is actually 7/570 = 1.2%
Figure 1: The quadrant for critical thinking
Filling in the quadrant on the basis of asking the right questions gives us more insight into a woman's chances of breast cancer. A woman who falls into the high-risk group has 1.2% chance of having breast cancer instead of 0.8% for women in general. Although the difference between 0.8 and 1.2 is relatively large, it sounds a lot less alarming than the 93% that the article had published.
Statistics is an interesting field of study and the person who compiles statistics can influence his audience by describing only part of the quadrant for critical thinking in an article, or by using graphical representations to make a problem seem larger (or smaller) than it actually is.
Levitin helps the reader to ask the right questions about what he reads. When we all learn to look at the headlines from time to time, we may be able to limit the amount of incomplete or even false information distributed over the internet.
Daniel Levitin also wrote:
Levitin, D., 2016, A Fieldguide to Lies and Statistics - A Neuroscienetist on how to Make Sense of a Complex World, London: Viking (order this book)