Johannes Ulander, CDSIC SDTM Trainer and Subject Matter Expert, considers the nuances of SDTM implementation and finds himself scratching his head.
SDTM Implementation
As a CDISC Subject Matter Expert, I often get to review SDTM validation reports and help assess whether something is OK or not. As I’ve learnt over the years, working with guidance is a tricky business, as there are options for how to implement it which makes me a frequent visitor to the SDTM and SDTMIG documents (sometimes I wish I had photographic memory, but I’m not sure it helps).
Last week I encountered one thing in an SDTM validation report which confused me.
The Elusive CARDIOLOGIST
I was 99% sure I had seen the term CARDIOLOGIST before and used it (But as we all know, it is hard to remember what happened in earlier studies and just because a term exists doesn’t mean you’re using it in the right place). In this case, using CARDIOLOGIST seemed like a perfect match for an Evaluator (EGEVAL) who has evaluated an ECG test result.
When something makes sense and is reported as an error, it gives me an itch that I need to scratch until I understand the root cause. I’m hoping that it is not my understanding of the standard that is incorrect, although I’m not really looking forward to the alternative; that I need to add another piece of information to my arsenal of expert advice: “Yeah, this is a known inconsistency. But just explain this in the SDRG and all will be fine.”
Search for an Answer
So, is my memory failing me and the term CARDIOLOGIST does not exist? Quick search in the terminology reveals:
Cheerful scream. Yes, I was right, the term CARDIOLOGIST exists. So, why is it reported as an issue? We’ll of course I was searching the latest version of CDISC CT (at the time of writing 2020-03-27) and the validation tool probably used an older version. Correct again, validation report states it is using CDISC CT from 2019-03-29, so CARDIOLOGIST must have been added after 2019-03-29. Quick search in the history of changes to check when it was introduced.
And the answer is 2016-03-25 and it has not changed since. Which is not the answer I was hoping for. (The itching gets a little more intense.)
Checking the SDTMIG
If the term exists, it might be that the validation tool is providing a false positive. Next step is to check the SDTMIG. (Confession. I always have a copy of the SDTMIG 3.2 no portfolio version on every computer I work on as it usually is the quickest way to get the information I need. Can’t risk being out of advice when internet doesn’t work because I don’t have photographic memory.)
There is no controlled terminology for EGEVAL in SDTMIG 3.2. Looking in the setup of the validation tool it reveals that it is using SDTM 3.2. I’m getting more confused and the itching intensifies. There is no other alternative than to go SDTMIG 3.3 and hope that the answer is there.
Luckily SDTMIG 3.3 did not disappoint me. Evaluator does have controlled terminology attached to it from the code list with submission value EVAL.
Still an Issue?
But why did the validator report this as an issue, when the setup used SDTMIG 3.2 and the controlled terminology for EGEVAL was introduced in SDTMIG 3.3? Well, remember that SDTMIG and controlled terminology have different release schedules. Controlled Terminology is released 4 times a year and SDTMIG is released when enough changes have happened to justify (the rather hard work of making) a new release. So, controlled terminology actually has precedence over SDTMIG, which means that if terminology exists that supports a variable, that terminology should be used, even if the controlled terminology is released after SDTMIG. (Thought: How are you supposed to know that?)
Hunt the variable
So, in this case the validator is correct. My next thought then is that it is probably a blank space or special character hiding in the content of the EGEVAL variable, which means that it is not visible to the human eye. Or maybe a spelling error. Making sure I’m not making a spelling error myself; I’ll search for “cardi” in the code list EVAL:
Interesting, it does not yield any results. (Scratching.) So what code list was the term in, if it is not C78735 – Evaluator? Going back to my first search I find that CARDIOLOGIST was found in the C96777 – Medical Evaluator code list.
So, we have two code lists for evaluators (and I’m not really liking where this is heading. Scratch. Itching intensifies).
Back to the SDTMIG
Going back to SDTMIG 3.3 I see that the Medical Evaluator code list is attached to EGEVALID (as well as other ‑‑EVALID variables) which is a bit strange if it contains CARDIOLOGIST and Evaluator doesn’t. The ‑‑EVALID is supposed to qualify ‑‑EVAL, so if there are e.g. multiple persons with the same role evaluating the same thing, you can identify the different persons assessments by using ‑‑EVALID. Which you can see in the CDISC Notes column in SDTMIG 3.3.
My itch forces me to compare the contents of both code lists and realize that some values work well for some ‑‑EVAL/‑‑EVALID pairs in the 12 domains that now have these variables (CO, EG, MS, CV, MK, NV, OE, RE, UR, RS, TU and TR) and not so good for others.
As EVAL contains 22 items and MEDEVAL 41 it means that we have 902 possible combinations for every domain, where I’m guessing that at least 80% of the combinations are incorrect for a specific domain. Below are some examples of valid EVAL/EVALID combinations that don’t make sense‑, at least to me.
And below are some examples that make sense but will yield errors in all domains because the values belong to the wrong code list (using terms from MEDEVAL on –EVAL).
(If you like, you can do the same for e.g. laboratory tests and combine it with unit, specimen, location etc. These 4 code lists will yield 2069 * 750 * 386 * 107 = 64’090’378’500 possible combinations where my guess is that 99% of them are invalid. I believe that anyone involved in Diabetes research would like the units for Glucose to be either mmol/L or mg/dL and not one of the other 748 units.)
The validation tool could be improved by posting more helpful error messages. In the case above the more helpful error message could be: EGEVAL value not found in ‘Evaluator’ extensible codelist. The term used exists in another code list ‘Medical Evaluator’ code list used for EGEVALID.
Cure the rash?
Bottom line is, with our current metadata and terminology setup it is not possible to write a validator that checks the meaningful things. Having decoupled code lists, we can construct records that have the wrong meaning but are valid from a technical point of view. We need to make the relationships explicit between the things we collect/observe in a format that is understandable by computers and humans.
Unfortunately, I now have a rash and I need to add another inconsistency to my arsenal of advice, although I would rather fix the root cause problem. Explaining data that makes sense in study data reviewers guides because we cannot explain those relationships properly for computers is not a sustainable way forward. Perhaps my advice for the future would be:
Define the relationships for the data you collect so that computers also understand them – in other words create Biomedical Concepts.
Johannes is an authorised SDTM Trainer, part of the A3 Informatics software team and regularly consults on CDISC implementation. If you need help or have questions please get in touch.