SELECT
MIN(age) AS min_age,
MAX(age) AS max_age,
AVG(age) AS avg_age
FROM ; heart_disease_prediction_dataset
Gender Analysis
Gen AI - Query Databases
Processed data saved in a database table can be accessed, based on your requirements, using queries. Because queries are an essential part of any data professional’s workflow, writing efficient queries is a necessary skillset. We will learn how to leverage Generative AI platforms to create optimized queries for data, provided we can give the model enough context.
llama is the setup we will be using
Objective
By the end of this lab, you’ll be able to prompt a Generative AI model to create efficient queries for your data set.
Data Set
For the purpose of this lab, you are making use of the Heart Disease Data set from the UCI ML library, available publically under the CCA 4.0 International license.
You can download the data set and run the queries generated in this lab using any SQL querying system.
You might note that there is a section named Prompt Instructions on the Generative AI interface.
Note: The prompt instructions are editable only until the first prompt is executed. Thereafter, you cannot change the instructions. If a mistake occurs and you want to start over, you can generate a fresh chat and add the correct prompt instructions before generating the SQL queries.
Prompt Instructions
Paste the following text in the prompt instructions to give the model the appropriate context for the data.
Input 1
We have a Heart Disease prediction dataset with a single table which has the following attributes.
1. age - age in years
2. gender- gender (1 = male; 0 = female)
3. cp - chest pain type
-- Value 1: typical angina
-- Value 2: atypical angina
-- Value 3: non-anginal pain
-- Value 4: asymptomatic
4. trestbps - resting blood pressure (in mm Hg on admission to the hospital)
5. chol - serum cholestoral in mg/dl
6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg - resting electrocardiographic results
-- Value 0: normal
-- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
-- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach - maximum heart rate achieved
9. exang - exercise induced angina (1 = yes; 0 = no)
10. oldpeak - ST depression induced by exercise relative to rest
11. slope - the slope of the peak exercise ST segment
-- Value 1: upsloping
-- Value 2: flat
-- Value 3: downsloping
12. ca - number of major vessels (0-3) colored by flourosopy
13. thal - 3 = normal; 6 = fixed defect; 7 = reversable defect
14. num (the predicted attribute) - diagnosis of heart disease (angiographic disease status)
-- Value 0: < 50% diameter narrowing
-- Value 1: > 50% diameter narrowing
Querying the Data
Age Distribution
Input 2
Write an SQL query to find the minimum, maximum, and average age of patients in the dataset.
Response 2
Input 3
Write and SQL query to count the number of male and female patients in the dataset.
Response 3
SELECT
gender,*) AS patient_count
COUNT(
FROM
heart_disease_prediction_dataset
GROUP BY ; gender
Chest Pain Frequency
Input 4
Write an SQL query to determine the frequency of each type of chest pain (typical angina, atypical angina, non-anginal pain, asymptomatic) among patients.”
Response 4
SELECT
cp,*) AS pain_frequency
COUNT(
FROM
heart_disease_prediction_dataset
GROUP BY ; cp
Age Group Analysis & Target Var
Input 5
Write an SQL query to investigate the distribution of the target variable (presence or absence of heart disease) within different age groups (e.g., 20-30, 30-40, etc.).
Response 5
SELECT
CASE20 AND 30 THEN '20-30'
WHEN age BETWEEN 31 AND 40 THEN '31-40'
WHEN age BETWEEN 41 AND 50 THEN '41-50'
WHEN age BETWEEN 51 AND 60 THEN '51-60'
WHEN age BETWEEN 61 AND 70 THEN '61-70'
WHEN age BETWEEN 'Above 70'
ELSE
END AS age_group,= 1 THEN 1 ELSE 0 END) AS heart_disease_count,
SUM(CASE WHEN num = 0 THEN 1 ELSE 0 END) AS no_heart_disease_count
SUM(CASE WHEN num
FROM
heart_disease_prediction_dataset
GROUP BY
age_group
ORDER BY ; age_group
Other Queries
Cholesterol Range
Find the range of cholesterol levels among patients (minimum, maximum).
Age Range and Gender Analysis
Determine the age range (youngest and oldest) for male and female patients separately.
Age Group Analysis and Target Variable
Investigate the distribution of the target variable (presence or absence of heart disease) within different age groups (e.g., 20-30, 30-40, etc.).
Maximum Heart Rate by Age Group
Find the maximum heart rate achieved during exercise for different age groups (e.g., 30-40, 40-50, etc.).
Percentage of Patients with High Blood Sugar
Calculate the percentage of patients with fasting blood sugar greater than 120 mg/dl.
Ratio of Patients with Resting Electrocardiographic Abnormality
Find the ratio of patients with abnormal resting electrocardiographic results to those with normal results.
Number of Patients with Reversible Thalassemia
Count the number of patients with reversible thalassemia detected by thallium stress testing.
Average Age of Patients with Chest Pain
Calculate the average age of patients who experienced chest pain during diagnosis.
Distribution of Patients by Number of Major Vessels:
Investigate the distribution of patients based on the number of major vessels colored by flurroscopy (0-3)