Vaccine Data Dive using AI and ML

Exploring AI and machine learning for
COVID-19 vaccination insights

A public health authority sought a proof of concept to demonstrate how AI and machine learning techniques could be applied to their stored COVID-19 vaccination datasets. The primary goal was to extract data insights that would aid in effectively targeting vaccines to communities in greatest need. Additionally, they aimed to identify areas with high vaccine hesitancy to enhance community advocacy efforts.

The Butterfly Challenge

The available COVID-19 vaccination data within the authority’s existing Azure Environment included demographic details of vaccinated individuals, vaccination center codes, and records of ‘no vaccination given.’ However, the raw data required significant cleaning and wrangling to ensure accuracy, eliminate biases, and prevent false positives in the analysis. Proper data preparation was essential to generate reliable insights.

To better understand vaccine hesitancy, it was necessary to explore external data sources. Social media was identified as a valuable supplement to the datasets, providing real-time insights into public concerns. A stream of social media data spanning the vaccination timeframe was accessed, allowing the team to analyse sentiment and identify key themes related to hesitancy.

Building a Model Pipeline with Iterative Feature Engineering

We partnered with the authority’s internal consultants to access their Azure platform and datasets. The provided COVID-19 datasets encompassed metrics such as vaccine usage (doses administered), categories of need based on Joint Committee on Vaccinations and Immunisation (JCVI) guidelines, and population health factors, including ethnicity, age, and underlying health conditions like diabetes mellitus.

We built a model pipeline to integrate and cluster these datasets. Through iterative feature engineering, we refined demographic criteria within the clusters.

The project resulted in the development of Jupyter notebooks containing replicable code to extract and analyse the datasets. Additionally, LITI syntax rules were created for focused text extraction and analysis using SAS Visual Text Analytics (VTA).

A detailed audience report was generated from Twitter data, which identified key demographic factors associated with COVID-19 vaccine hesitancy among social media users. This clustering of data could be leveraged for targeted campaigns, either through social media marketing or personalised community advocacy. While no significant trends were found in ethnicity, gender, or area deprivation regarding vaccine uptake, an association was discovered between age, family status, and vaccine hesitancy.

The solution was delivered iteratively through weekly sprints, which allowed the customer to refine their requirements and explore new questions as the project progressed.

The Butterfly Effect

Download Your Copy of the Case Study

How Can We Help?

Name

First Name

Last Name

Email

Phone

(###)

###

####

A Brief Description of Your Challenge