Working with electronic health records
Developing computational models capable of detecting rare disease patients in population-scale databases such as electronic health records (EHRs) is challenging for several reasons, perhaps the most daunting of which being the limited number of already-diagnosed, ‘labelled’ patients from which to learn.
We overcome this obstacle with a novel lightly-supervised algorithm that leverages unlabelled and/or unreliably-labelled patient data – which is typically plentiful – to facilitate model induction. Importantly, we can prove the algorithm is safe. Adding unlabelled/unreliably-labelled data to the learning procedure produces models which are usually more accurate, and guaranteed never to be less accurate, than models learned from reliably-labelled data alone.
Volv's methods are shown to substantially outperform state-of-the-art models in patient-finding.
State of the Art Performance
Compare our results:
inTrigue: accuracy = 90.8% (standard error = 0.4%) and AUC= 93.0% (standard error = 0.1%)
kopcke: (L2-regulatrised logistic regression on full feature set) accuracy = 73.1% (standard error = 1.1%) and AUC = 74.9% (standard error = 1.0%)
miotto: (classification based on relevance scoring on full feature set) accuracy = 74.2% (standard error = 1.0%) and AUC = 75.8% (standard error = 1.0%)
This means we identify cohorts more accurately:
eliminating inappropriate candidates and finding the true candidates more successfully, delivering better quality recruitment and retention for clinical trials, for example.
Thank you for submitting your details, we will be in contact with you with the information requested, shortly.