A Data-Mining Approach for Investigating Social and Economic Geographical Dynamics of beta-Thalassemia's Spread

Akay A., Dragomir A., YARDIMCI A., Canatan D., Yesilipek A., Pogue B. W.

IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, vol.13, no.5, pp.774-780, 2009 (SCI-Expanded) identifier identifier identifier


beta-Thalassemia is an anemic genetic disorder that remains a major global health issue, especially in the globalized era where public health, economics, and education are tightly interwoven. Previous studies have examined the disease's rate and heredity. This study analyzed beta-thalassemia's socioeconomic geography and how it affects the afflicted population. We processed survey data and performed data mining using self-organizing maps to identify underlying data structure. We hypothesized that certain variables mark subgroups within the affected population and we aimed at identifying these subgroups and used a correlation-based measure to assess the variable's importance to the subgroup's distinction. The population's education level was one of the major factors that divided it into different subgroups. Our study showed that recurring patterns of specific variables separated the affected population into disparate subgroups based on their response to questionnaires. Future studies can use such tools to delve deeper into how other variables (e.g. socioeconomic and genomic) can identify subgroups within larger affected populations.