Meeting the challenge
This pharmaceutical company faced four difficulties:
- the disease prevalence was one in a million of population
- specialist clinicians were able to diagnose the disease with no more than 76% accuracy
- only one in four patients were ever identified
- and those that were identified were done so generally after six years of misdiagnoses.
To compound these four difficulties, the pharmaceutical company was unable to provide any single medical record for an already diagnosed patient. In the absence of any examples, Volv undertook to build a methodology that would learn about the disease automatically from publicly available data.
We made a number of statements of intention as follows: to deliver patient cohort selection performance above current, manual best practice,; to derive new insights into diagnosing the disease; to become better at diagnosis than experienced specialist clinicians.
Deriving diagnostic models for rare diseases manually, based on input from experts, is problematic for several reasons, including the evolving nature of experts’ understanding, the complexity of disease processes, and cognitive biases associated with human decision-making [Elstein, Schwarz 2002].
Testing & Results
The Volv approach lifts clinical accuracy for the detection of at-risk patients to world-class levels of precision.
It outperformed the existing, state-of-the-art prediction models with a two thirds reduction in error rate. The sensitivity of prediction models to false positives is measured using Area Under the ROC Curve (‘AUC’). In terms of AUC, 1.00 is perfect accuracy (no false positives), and Volv’s prediction model achieved a rating of 0.935, which is well above the human performance level of 0.76 and the best alternative models (Kopcke and Miotto).
- Volv model gives
- accuracy =90.8% (standard error = 0.4%) and
- AUC = 93.5% (standard error= 0.1%);
- Kopcke model (L2-regularised logistic regression on full feature set [Kopcke et al. 2013]) gives
- accuracy = 73.1%(standard error = 1.1%) and
- AUC= 74.9% (standard error = 1.0%);
- Miotto (classification based on relevance scoring on full feature set [Miotto, Weng 2015]) gives
- accuracy = 74.2% (standard error= 1.0%) and
- AUC = 75.8%(standard error = 1.0%)
When the system and methodology were used to augment the physician the accuracy rose to 0.975.