Img Synth Patients Contrast

In some applications, the labeled training data is extremely limited and/or class-imbalanced, and this is an obstacle to effective learning. This situation arises, for example, because it is often difficult to collect smartphone data traces for individuals with confirmed diagnoses for the Target Disease (TD) of interest, and because typical datasets have far fewer ‘cases’ (patients with the TD) than ‘controls’ (unaffected patients).

We hypothesize that, if sufficiently-realistic ‘synthetic’ patient vectors could be acquired, especially for the minority-class, adding these vectors to the training set could improve the predictive modeling process. Specifically, it may be possible to increase the predictive accuracy of our algorithms by generating synthetic patient vectors, combining these vectors with real labeled examples, and training our algorithms on this larger ‘hybrid’ dataset. The basic aspects of this learning from synthetic patients (LSP) method are now described. (See [1,2] for a more thorough exposition.)

Previous studies have found that, to be useful for improving learning performance, synthetic examples should be realistic and diverse [2,3,4]. We integrate two complementary mechanisms to create such synthetic patients.

  • Realism is achieved by using adversarial learning to produce synthetic patient feature vectors (SPs) that are in-distinguishable from those of real patients (RPs) [1,2] and by working directly in the original patient feature space.
  • Diversity is obtained by appropriately injecting randomness into SP construction, with an emphasis on output-ting SPs which are well-separated in feature-space [3,1,2].

As is demonstrated empirically in subsequent sections, the resulting SPs are useful and convenient for downstream analysis (e.g. predictive modeling) and also intuitively-interpretable by clinicians.

Img Synth Patients

This basic idea is formalized with the SP generation and learning algorithms. The main focus is applications with very few positive-class training examples, as this is common in smartphone-based health monitoring.

However, as delineated, synthetic patients corresponding to both positive-class and negative-class RPs can be created if needed.