Working with electronic health records
Developing computational models capable of detecting rare disease patients in population-scale databases such as electronic health records (EHRs) is challenging for several reasons, perhaps the most daunting of which being the limited number of already-diagnosed, ‘labelled’ patients from which to learn.
We overcome this obstacle with a novel lightly-supervised algorithm that leverages unlabelled and/or unreliably-labelled patient data – which is typically plentiful – to facilitate model induction. Importantly, we can prove the algorithm is safe. Adding unlabelled/unreliably-labelled data to the learning procedure produces models which are usually more accurate, and guaranteed never to be less accurate, than models learned from reliably-labelled data alone.
Volv's methods are shown to substantially outperform state-of-the-art models in patient-finding.