Published May 8, 2025 | Version v3
Dataset Open

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

Description

Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]". 

To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

  1. Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)

  2. Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)

  3. Unhealthy habits:

    • Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
    • Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
  4. General Health:

    • Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
    • Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
    • Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
    • Physical Health - number of days being physically ill or injured (0-30 days)
    • Mental Health - number of days having bad mental health (0-30 days)
    • General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

Below is a description of the features collected for each patient:

S. No.

Original Variable/Attribute

Coded Variable/Attribute

Interpretation

1.       

CVDINFR4

HeartDisease

Those who have ever had CHD or myocardial infarction

2.       

_BMI5CAT

BMI

Body Mass Index

3.       

_SMOKER3

Smoking

Have you ever smoked more than 100 cigarettes in your life? (The answer is either yes or no)

4.       

_RFDRHV7

AlcoholDrinking

Adult men who drink more than 14 drinks per week and adult women who consume more than 7 drinks per week are considered heavy drinkers

5.       

CVDSTRK3

Stroke

(Ever told) (you had) a stroke?

6.       

PHYSHLTH

PhysicalHealth

It includes physical illness and injury during the past 30 days

7.       

MENTHLTH

MentalHealth

How many days in the last 30 days have you had poor mental health?

8.       

DIFFWALK

DiffWalking

Are you having trouble walking or climbing stairs?

9.       

SEXVAR

Sex

Are you male or female?

10.    

_AGE_G

AgeCategory

Out of given fourteen age groups, which group do you fall into?

11.    

_IMPRACE

Race

Imputed race/ethnicity value

12.    

DIABETE4

Diabetic

(Ever told) (you had) diabetes?

13.    

EXERANY2

PhysicalActivity

Adults who reported engaging in physical activity or exercise from outside their regular work in the previous 30 days

14.    

GENHLTH

GenHealth

How would you rate your overall health?

15.    

SLEPTIM1

SleepTime

During 24 hours, on average, how many hours of sleep do you get?

16.    

CHASTHMA

Asthma

 (Ever told) (you had) asthma?

17.    

CHCKDNY2

KidneyDisease

Were you ever told you had kidney disease, other than stones, bladder infection, or incontinence?

18.    

CHCSCNCR

SkinCancer

Ever told (you had) skin cancer?

 

Technical info

https://www.cdc.gov/brfss/annual_data/2020/pdf/codebook20_llcp-v2-508.pdf

Files

heart_2020_cleaned.csv

Files (23.3 MB)

Name Size Download all
md5:f5a885ff39a113e797af67367b6bc2cf
23.3 MB Preview Download