We have reached the tipping point, after years of deliberation, discussion and delays, it’s time to organise the knowledge that can propel the clinical research industry towards quality, reusable metadata. Dave Iberson-Hurst reflects on the journey towards Biomedical Concepts.
This is a personal reflection on these things that we refer to as Biomedical Concepts (BCs). As such it may be a little biased and, as a recollection of many years of work, the memory may have failed a little. I thought it might be interesting to reflect, as I am seeing a growing interest in BCs, not just in CDISC materials from the 360 project, but places such as Transcelerate and their Digital Data Flow project.
I could start the story in December 2007 but when I scan some old slides, I see some of the key ideas behind BCs and presentations I gave in 2004 to the FDA talking about using the CDISC Operational Data Model (ODM) for end-to-end traceability. I presented a vision of a reviewer being able to click on a link within a SDTM dataset and be presented with the corresponding CRF, the audit trail and the links from that source data to the SDTM tabulation. Looking back, I see a linked data world. In April 2007, at the CDISC European Interchange in Montreux, I helped organise a two-day end-to-end workshop using CDISC standards in an attempt to show that we could break down the silos that plague our work. A linked, silo free world, are key themes in the BC discussion.
Wikinomics and Biomedical Concepts
But the first tangible memory is December 2007. I found myself travelling to Eli Lilly in Indianapolis and a CDISC board meeting. I remember taking my travel bike with me and cycling some of the bike routes on the day before the meeting. There was a smattering of snow and ice on the ground that made riding somewhat amusing but it always good to cycle around somewhere new, you see so much more, see things that you might ordinarily miss.
We were having the meeting at Eli Lilly because Steve Ruberg was standing down as CDISC board chair after his two years in office and he came to the meeting highly enthused. He was enthused not because he was stepping down but because he had just read Wikinomics and saw a vision of mass collaboration as a means of speeding the development of the CDISC standards. Steve enthused about the approach and proposed that CDISC should pursue a similar strategy. I was there in my then role as head of CDISC’s technical work. At the end of the meeting, I was chatting with Steve and others about the notion. I commented that I saw the sense of having a set of definitions in a bucket in the centre of the room that we [industry] could share, the bucket of Lego bricks – standard components – that would always fit together, consistent across the industry.
Don’t throw rocks
As an aside the CDISC lead technical role is a fascinating one but it also can be a lonely place. I would urge everyone to support Peter van Reusel, who currently occupies the role. It is a role in which you can never win, for there will always be someone telling you that you did the right thing immediately followed by someone telling you got it totally wrong. The problem being those that tell you that you are doing it wrong will outnumber those who support you. They say you should walk a mile in someone shoes, well I have walked that road in those shoes, and it can be a soul-destroying walk; you are trying to change the direction of a super tanker using a canoe paddle. So, to the community out there I would say, yes, it is ok to disagree. If you do, come with alternatives, come with solutions, don’t just throw rocks.
CDISC decided that the idea of shared definitions and mass collaboration was worth running with but, at the time, had little idea of what it meant. It also came at a time of many challenges for CDISC with politics, as ever, playing a role in slowing any progress. The additional challenge of such a project with volunteers was, well shall we say, ‘interesting’; at the time CDISC had very few full-time technical staff. I remember attending a meeting in the SAS office in Arlington, one that very much sticks in my mind. I summarise that day with the phrase,
“we didn’t manage to get on the same page, we didn’t even manage to get on the same planet”.
On reflection the mistake I made in these early days was not to try something, anything. Prototype, iterate, build something, clarify the problem. We didn’t understand the problem.
Metadata and the birth of the Biomedical Concept
By 2010 I had left CDISC and had become fascinated with the metadata issue. If the memory is not too jaded, I think it was February 2010 and I was in a meeting discussing metadata and the light bulb came on. It wasn’t so much of a light bulb as a blinding flash. Maybe I am just slow, but suddenly the pieces fell into place and I had a clear picture of what was needed. I knew the fundamentals behind what we do as an industry and the mechanics needed to achieve it. To me that was the day the BC was born in my head.
I was now consulting and still thinking in background mode but by 2012, I had managed to implement some early BCs in MS Excel. OK, you can stop laughing now given my view of Excel. I did what we all do when in doubt, click on the green X icon and fire up Excel. It was slightly better than that, as it was structured and organised, but it was still Excel with some (ok a lot ) VBA behind it.
CDISC Therapeutic Areas & Machine Readable Support
In January 2012, I wrote a paper that I distributed to some leading thinkers in the industry. This was the early days of Therapeutic Area development. My concern was that, without solid machine-readable support, the outputs would not be consistent and as useable as they could be. We were still in the world of PDF, manual transcription and incomplete definitions.
The note suggested that the TA work should develop the TA content using standard formats within Excel. While, in parallel, build the CDISC Library tooling as an open-source project. As both developed, in an iterative manner, the two could be brought together with the spreadsheet content loaded into the tooling. I stressed at the time that we should, “not attempt to link this to other initiatives,” and “be kept simple,” such that progress was not impeded.
By late 2014, my major takeaway from the Excel implementation was that the idea was sound, the general model right. It worked. The tooling demonstrated it would work. We could achieve what we wanted to. But it also failed; the lesson was, don’t do it in Excel. It failed because it didn’t offer the means by which a user would manipulate the various building blocks with ease. The interaction with the human was not meeting the needs of the users.
The Arrival of Graph Technologies
Within a few years, we were seeing the emergence of graph technologies in a few large sponsor companies such as AZ and Roche. The combination of graph and the problem space caught my imagination. On July 23rd, 2015, I sat in my office with my feet on the desk watching stage 18 of the Tour de France while contemplating taking a year off working – well, not earning any money – and funding a year of research into BCs. I had long realised that you cannot convince the audience with power point; slides don’t cut it. You have to make it real, learn, make mistakes, show people things working.
My old home office still makes me smile. It was a great place to work but I like to have a TV to follow the cycling while I work. My excuse was it would be great for on-line meetings. I bought a TV for the office and, as they say, ‘went large’. When it turned up, it was slightly bigger than I had expected, somewhat dominating the room. My wife has never let me forget.
The second half of 2015 and 2016 was spent working though the issues: 2 steps forward, one step back, up some dark allies and back out again, learning new skills, bringing some old ones unused for a few years back to life. By 2019, there was a basic product and a new company had been formed. As ever, there was more work to do. The user experience and user interface needed work. So we focused on that while taking the opportunity to incorporate lessons learned.
EHRs, FHIR & SDTM
In the period around 2017, I did some prototyping work to demonstrate some of the inherent power in BCs. This involved taking CRFs built using BCs and use the metadata from the BCs to automatically determine the LOINC codes needed to request data from an Electronic Health Record using the HL7 FHIR standard. The data returned can then be used to populate the CRF and auto generate an SDTM domain. In 2018, further prototyping work demonstrated the mining of BCs from existing resources such that we can rapidly build the necessary library of BCs without too much effort.
Recently we have seen the CDISC 360 project running. I got involved early on but I was not able to dedicate the necessary time to it. From an internal perspective and those which pay the bills, the work on the Transcelerate DDF project came along at the same time and it made more sense for us to address the issues raised there. It filled in the pieces we needed for our own internal work. Transcelerate ran a hackathon to allow vendors to present ideas and prototype implementations. These ideas included BCs and it was statements in their vision document such as the following, that grabbed my attention and propelled our work forward:
“The primary endpoints linked to this objective are limited to ‘absolute change in percent of predicted FEV1 from baseline to [Week X],”
You can see the work we covered in the presentation from PHUSE Bitesize event, myself and Johannes Ulander gave in June this year.
What Can Biomedical Concepts do for Me?
I believe BCs make our world more manageable, more flexible with the aim to make our data more useful. BCs are getting more attention now. We need to build the library to help the community take the next step forward. But I urge caution as well. We have a habit of trying to solve every small issue and make a solution perfect. Don’t. Follow the 80-20 principle: deliver something, gain experience, learn and understand the problem. Iterate, build some more, discuss and learn, reverse if necessary, repeat, keep your eye on the end point and maintain your vision.
Kirsten Langendorf and I presented at the CDISC French User Group on 26th November, explaining why we need Biomedical Concepts now and how we can put the data back at the centre of what we do. You can watch it here.
Ultimately, we need to focus on the support of high quality data: consistent and complete, structured in its natural form, supporting current standards, designed to maximise the utility of the data while preventing its decay. I have always explained data decay as the inability to use data a few years after it was archived away.
From 2007 to today, there have been many people with whom I have discussed the topic. I am fortunate to work with two of the best, Kirsten Langendorf and Johannes Ulander. I have enjoyed many a good debate with both over the years and more than the occasional beer. There are many others that I have discussed this with, even a few robust conversations but a few stand out, Diane Wold, Tim Williams, Jozef Aerts, Scott Bahlavooni, Armando Oliva, Peter van Reusel and Sam Hume and some at the FDA. There are many others I am certain, apologies for not mentioning them.
CDISC and the community has an opportunity to accelerate the progress on BCs and improve the standards that we use today. We must not lose it. We need to demonstrate the benefits of BCs and we need a library of BCs. We can do that today.