This freak accident is just one of the more than 500 causes of death that are catalogued in the Alberta death database. There are 94 different types of cancer alone, causing death in Alberta. In fact, “non-Hodgkin’s lymphoma” and “non-Hodg-kins lymphoma” are two separate causes.
Any source of data that is entered manually by different people at different times with different perspectives, guidelines, judgments, attention to detail, and typing ability has this problem. It is never possible to take raw data at face value. Significant scrutiny and comparison of trends between years, age groups, genders, and causes are required. Outliers, gaps, inconsistencies, and other strange things will reveal themselves eventually, with enough patience and persistence. As we discuss in the BIG Media article AI: Where are we and where are we going?, this detailed inspection and data “cleaning” are crucial for effective statistical analysis.
In order to be useful, the cleaned categories then need to be grouped into larger meaningful classes, such as “accidents”, “cancer”, “heart disease”, etc.
The groupings themselves are also judgmental processes, and these choices will influence the interpretation and conclusions drawn from the data. One person might decide that a “malignant neoplasm of the ovary” should be grouped with cancers, another might combine this with other “female reproductive” afflictions. Whatever grouping is chosen, the most important thing is consistency, if time periods or age groups are to be compared.
For the recent article showing cause-of-death comparisons in 2020 in Alberta – A look back at causes of death in the first year of COVID-19 – we classified causes of death from 2001 to 2020 logically and consistently. However, upon further routine checking and re-checking, a weird thing came to light after the article was published.
A relatively large number of people in 2020 only, had died from “mental and behavioural disorders due to use usefectious and parasitic diseases” (transcribed directly), that had been grouped into the larger “bacterial” class. The mention of those pesky parasites made me suspicious, so I dug a little deeper and found that while there were a large number of people dying from “mental and behavioural disorders due to use of alcohol” – grouped into the general “accidents” class – in 2001 to 2019, that category did not exist in 2020 (see chart below).
Chart showing all the categories included in the (revised) sub-grouping of accidental death relating to drug poisoning.
The numbers were similar between 2020’s “mental and behavioural disorders due to use usefectious and parasitic diseases” and the rest of the years’ “mental and behavioural disorders due to the use of alcohol”, so I was quite confident that they were one and the same. I changed “mental and behavioural disorders due to use usefectious and parasitic diseases” to “accidents”, instead of “bacterial”.
Long story short, this reclassification had repercussions throughout the data that changed some of the charts in the article, including the one that sub-groups “accidental deaths due to drug poisonings” for comparison with COVID deaths. Just another day in the life of a data scientist.
Incidentally, it is unlikely that someone died from a vehicle collision with a parasite (specifically, “Other motor vehicle accident involving collision with withfectious and parasitic diseases”).