Migraine Classification : Using Synthetic Minority Oversampling Technique (SMOTE) Migraine Model
Migraine is not just a bad headache. It’s a disabling neurological disorder with different symptoms and different treatment approaches compared to other headache disorders. The American Migraine Foundation estimates that at least 39 million Americans live with migraine, but because many people do not get a diagnosis or the treatment they need the actual number is probably higher.
Although migraine is a common disease with substantial impact, it is under diagnosed and under treated.
Paucity of relevant literature to begin migraine diagnosis and proper classification of the type of migraine most often leads to under diagnosis and under treatments or maybe even delay the diagnosis.
Recently I stumbled upon some interesting migraine data from the American Migraine Foundation which records millions of migraine ‘episodes’ along with it’s presented ‘features’ or symptoms. With the classification of such disorder being highly imbalanced in nature Synthetic Minority Oversampling Technique (SMOTE) seemed the way to go.
What is Synthetic Minority Oversampling Technique (SMOTE):
Imbalanced classification involves developing predictive models on classification datasets that have a severe class imbalance.
The challenge of working with imbalanced datasets is that most machine learning techniques will ignore, and in turn have poor performance on, the minority class, although typically it is performance on the minority class that is most important.
One approach to addressing imbalanced datasets is to oversample the minority class. The simplest approach involves duplicating examples in the minority class, although these examples don’t add any new information to the model. Instead, new examples can be synthesized from the existing examples. This is a type of data augmentation for the minority class and is referred to as the Synthetic Minority Oversampling Technique, or SMOTE for short.
SMOTE works by selecting examples that are close in the feature space, drawing a line between the examples in the feature space and drawing a new sample at a point along that line.
Specifically, a random example from the minority class is first chosen. Then k of the nearest neighbors for that example are found (typically k=5). A randomly selected neighbor is chosen and a synthetic example is created at a randomly selected point between the two examples in feature space.
The approach is effective because new synthetic examples from the minority class are created that are plausible, that is, are relatively close in feature space to existing examples from the minority class.
SMOTE Migraine Classifier Model:
Utilizing the nearest neighbors school of thought SMOTE forms links between relative close records.
This study attempted to understand the relevance and the strength of certain migraine associated ‘features’ or symptoms and their relevance in classification.
Although location of migraine around the head is a strong indicator, medical literature also speaks of other closely associated features which play high significance in proper classification of the subject.
We can see that the SMOTE Migraine Model closely follows the actual classification trend set previously by medical experts and performs at-par with diagnosis.
With an accuracy of ~91%, the SMOTE Migraine Model performs better than existing trivial algorithms and even some trained models in proper classification of the subject.
The SMOTE Migraine Model is just a small step towards starting the era of ‘Intelligent Diagnosis’. Similar understandings can be utilised to build models to predict the onset of migraine in patients even before their first episode occurs. Knowing the exact type of migraine that a particular subject might be afflicted with, early diagnosis helps in effective prophylaxis and lesser exposure to abortive medicines such as opiods.