Modeling rare diseases with AI for prediction | Volv

Working with electronic health records

inTrigue is a standardised repeatable methodology that produces robust models despite the challenges of real-world messy, gappy, sparse and unstructured data, which is how electronic health records (EHRs) are presented.

Developing computational models capable of detecting rare disease patients in population-scale databases such as electronic health records (EHRs) is challenging for several reasons, perhaps the most daunting of which being the limited number of already-diagnosed, ‘labelled’ patients from which to learn.

We overcome this obstacle with a novel lightly-supervised algorithm that leverages unlabelled and/or unreliably-labelled patient data – which is typically plentiful – to facilitate model induction. Importantly, we can prove the algorithm is safe. Adding unlabelled/unreliably-labelled data to the learning procedure produces models which are usually more accurate, and guaranteed never to be less accurate, than models learned from reliably-labelled data alone.

Volv's methods are shown to substantially outperform state-of-the-art models in patient-finding.

State of the Art Performance

Compare our results:

inTrigue: accuracy = 90.8% (standard error = 0.4%) and AUC= 93.0% (standard error = 0.1%)

kopcke: (L2-regulatrised logistic regression on full feature set) accuracy = 73.1% (standard error = 1.1%) and AUC = 74.9% (standard error = 1.0%)

miotto: (classification based on relevance scoring on full feature set) accuracy = 74.2% (standard error = 1.0%) and AUC = 75.8% (standard error = 1.0%)

This means we identify cohorts more accurately:

eliminating inappropriate candidates and finding the true candidates more successfully, delivering better quality recruitment and retention for clinical trials, for example.

inTrigue is a standardised repeatable methodology that produces robust models despite the challenges of real-world messy, gappy, sparse and unstructured data, which is how electronic health records (EHRs) are presented.

State of the Art Performance

Learn More about Volv data science and how we make a difference