Healthy Data: Using SQL to Analyze Hospital Data
- Scott C
- May 13, 2023
- 3 min read
Updated: Jun 6, 2023

Introduction:
The goal of this data set was to use SQL to analyze healthcare data. The focus was on looking at the following questions.
Do the majority of patients stay in the hospital for less than 7 days?
Which medical specialties are doing the most average number of procedure?
Is the hospital treating patients of different races differently related to the number of lab procedures.
Is there a correlation between number of days stayed in the hospital vs the number of lab procedures?
Provide a list of all patients who had an emergency but stayed less than the average time in the hospital
Provide a summary of the top 50 medication patients
Data:
MySQL was used to analyze healthcare data that includes 10 years of data from 1999-2008. This data set includes the clinical care of 130 US hospitals. The data includes hospital admission data, length of stay, laboratory tests performed, medications administered, patient demographic data, etc. The data can be found on Kaggle.
Analysis:
Hospital Stays
The SQL statement below was used to produce a histogram.


By looking at the histogram, there were a total of 17,756 patients that were in the hospital for 3 days, followed by 2 days, 1 day, and then 4 days. It looks like the majority of hospital stays are between 1-4 days. This indicates that most hospital stays are less than a week.
Medical Specialties and Number of Average Procedures
The SQL statement below was used to determine which medical specialties had the highest number of procedures.


While looking at the average procedures it appears that Surgery-Thoracic, Surgery-Cardiovascular, and Radiologist are the highest. The top procedures mainly involve the circulatory system and the heart. This makes sense since the most common cause of death is related to cardiovascular disease(heart disease) [1]
Average Number of Lab Procedures and Race


The average number of laboratory procedures range between 41-44. There is not a huge gap in number of procedures, so there is likely not a significant difference between each race and getting proper treatment.
Correlation Between Number of Days Stayed and Number of Procedures
The SQL code below was used to see the correlation between the number of stays stayed compared to the number of procedures. Data was separated into three catagories.
0-24 = few lab procedures
25-54 = average number of procedures
55+ = many lab procedures


Looking at the results it does appear that the more procedures there are, the longer the amount of time the patient stays in the hospital.
All Emergency Patients that Stayed Less than the Average Time in the Hospital
The purpose of the SQL code below was to produce a table showing time in hospital who came in for emergence services.


Top 50 Medication Patients
The following SQL code was used to determine which patients received the highest number of medications and laboratory procedures. It provided the following summary table below.


Overview:
From using SQL, I was able to summarize the data in order to gather a few insights. It appears that on average most patients stay in the hospital for only 1-4 days. This is important because it can be used by the hospital to determine when another bed will be available. The top 5 medical specialties that are doing the majority of the procedures are mostly those that are related to the cardiovascular system. This makes sense due to heart disease being very common in the United States.
What to do with this information?
There does not appear to be much difference between the number of procedures and race. Depending on the procedure though, there might be more to these number. It would be of interest to look deeper into this because this dataset is related to patients who have diabetes. [2] There have been showing higher rates of both heart disease [3] and diabetes [4] for particular races.
The average time for emergency patients and the top 50 medicated patients were also looked at. This information could help for both understanding the amount of beds that will likely be available for incoming patients. The summary could be used to understand which types of patients that you may need to focus more time on.
Sources
[1] https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm
[3] https://www.cdc.gov/nchs/hus/spotlight/HeartDiseaseSpotlight_2019_0404.pdf
[4] https://www.cdc.gov/diabetes/health-equity/diabetes-by-the-numbers.html



Comments