26 January 2022: Prevalence Rates Versus Raw Numbers
Today’s Blog post will be covering a fundamental mistake made in the visual representation of data – unfortunately made by my local Santa Monica Malibu Unified School District. In an attempt to provide more transparency about case rates in their schools (note: the district performs weekly PCR surveillance testing, something that has its own pitfalls but is out of the scope of this discussion), the district created a helpful dashboard. This can be accessed at https://www.smmusd.org/Dashboard.
The primary piece of visual data presented is a bar graph of case numbers stratified by school and by week. This is a screenshot from this morning (data current as of 1/21/2022).
Looking at these data, one would reasonably assume that case rates are decreasing rather dramatically in the district in each subsequent week. This seems to be pretty much true across each of the schools. Further, Samohi is most affected.
Unfortunately, the school district has fallen into an Epidemiology 101 pitfall which is to look at unadjusted/raw values rather than accounting for sample size. Samohi has a much larger student population than, say, the middle schools or the elementary schools in the area. So a more accurate way to look at these data is to adjust them by enrollment at each school – therein arriving at a Prevalence Rate.
[Note: In the graphic below I have removed the District/Itinerant category as it significantly skews the graph axis with prevalence rates of 41.7, 16.7 and 25.0 among its 12 students in each week studied).
This graph looks quite different indeed with John Adams Middle School far outpacing its counterparts followed by McKinley Elementary School. Samohi still has high rates but they are in line with the general population of schools. SMASH has consistently the lowest rates. Interestingly, Malibu Middle School went from a January 1-7th prevalence rate of 7.4% to zero the following week and then 0.4% in the most recent week – which suggests extraordinarily effective case identification, contact tracing and quarantine/isolation or a problem with the testing. Either could be true.
This is an error made all too commonly, and is not meant to be a critique of our local school district in any way – as they are trying and making these data available. But you will see similar errors in very reputable news outlets – CNN, MSNBC, NPR – as well as less reputable news outlets. For those really fascinated by learning more, I highly recommend Edward Tufte’s book “The Visual Display of Quantitative Information.” Link: https://www.edwardtufte.com/tufte/books_vdqI
𝗦𝗶𝗴𝗻 𝗨𝗽 𝗳𝗼𝗿 𝗢𝘂𝗿 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿