Computational Program for Racial Health Disparities

Studies have shown that race and socioeconomic status are important factors in determining people's overall health. The Computational Program for Racial Health Disparities (CPRHD) focuses on using computational approaches to explore and analyze the data for discerning and predicting socioeconomic and racial factors that affect health disparities.

Table of Contents

Projects under CPRHD

1. West Nile Virus outbreak prediction in Chicago area
This project focuses on understanding the effects of socioeconomic factors on the West Nile Virus outbreak in the Chicago area (Cook and DuPage counties). The project uses mosquito and socioeconomic data to build Machine learning models to predict the Mosquito Infection Rate (MIR).

2. Storm Water Management Model to evaluate Green Infrastructure's affect on WNV

This project uses stormwater modeling to predict green infrastructure and its effects on the population of disease carrying mosquitoes. The primary goals of the project are

To develop Python functions for running SWMM simulations and extracting data
To develop a Python function to create input files for simulation

3. Modeling West Nile Virus Neuroinvasive incidence rate in Urban counties
This goal of this project is build Machine Learning models to predict West Nile Virus Neuroinvasive incidence rate using physical and socioeconomic factors that affect the spread of the virus.

4. Design and development of standardized R package for time-series data

This project is focused on building a standard pipeline for researchers to handle time-series data. The main goals of the project are

Quality Control
Data Visualization
Feature Extraction
Clustering

5. Generic data cleaning pipeline

Public health issues are not easy to deal with as it requires gathering of large meaningful datasets and cannot be directly used the way it was obtained because of privacy issues. Data collection is planned to be done by providing easy access to common population to provide information about their health using mobile phones. A user enters the details on a mobile application and this data will be stored in a cloud-based backend.

To tackle the problem of analyzing heterogeneous datasets, a statistical pipeline is built to harmonize data across various cohorts. The code analyses data and applies statistical data standardization techniques like re-normalization, covariates identification and dimensionality reduction. The use of Geospatial feature further helps in analyzing the data obtained.

The correlation between variables is calculated and, since the multicollinearity is high in the dataset, PCA is performed. The multidimensional dataset is then analyzed to cross-correlate the blood metabolite measurements in racially diverse women to find factors that contribute to breast cancer risk disparities. This will help in clinical translation by targeting novel biomarkers and pathways and facilitate developing biosensor-based companion diagnostic tools for early detection and individualized treatment.

This project also aims to

Code an R package to implement data harmonization pipeline in a flexible and adaptive manner.
Scale, automate, and containerize the package to deploy on cloud.
Publish the package on CRAN and make it open-source.