JNB Lab: The Framingham Heart Study

6.7. JNB Lab: The Framingham Heart Study#

The Framingham Heart Study is an ongoing study of cardiovascular health in the United States. The initial study followed over 5,000 volunteers from Framingham, Massachusetts, USA over the course of several decades, and it still continues today. The study led to important findings in many areas, including a link between cholesterol and heart disease. (These exercises are inspired by an assignment from UC Berkeley’s Data 8.)

We load in the data below from the file framingham.csv (click here to download a copy). There are a number of different interesting variables to explore, but we will focus on exploring total cholesterol levels (TOTCHOL) versus the occurrence of heart disease (ANYCHD). The variable ANYCHD takes the value 0 if the patient does not have heart disease and the value 1 if they do.

framingham = pd.read_csv("framingham.csv")
framingham.head()

	AGE	SYSBP	DIABP	TOTCHOL	CURSMOKE	GLUCOSE	DEATH	ANYCHD
0	39	106.0	70.0	195.0	0	77.0	0	1
1	46	121.0	81.0	250.0	0	76.0	0	0
2	48	127.5	80.0	245.0	1	70.0	0	0
3	61	150.0	95.0	225.0	1	103.0	1	0
4	46	130.0	84.0	285.0	1	85.0	0	0

6.7.1. Part 1: Explore the data#

As always, it’s important to take a look at your data before you dive into any kind of inference. We first note that we have both categorical and numerical variables, which changes what types of visualization we might be interested in. First, we’ll look at the size of the data frame, which shows that we have 3842 observations of 9 variables.

framingham.shape

(3842, 9)

Since we are interested in how cholesterol connects to occurrence of heart disease, we next explore those variables. Like the penguin example in the text, we will separate our data into our two samples. Then we’ll calculate the mean cholesterol rating for each group.

chd = framingham[framingham['ANYCHD'] == 1]
nochd = framingham[framingham['ANYCHD'] == 0]

print(f"The mean total cholesterol for those with an occurrence of CHD is {chd['TOTCHOL'].mean():.3f}.")
print(f"The mean total cholesterol for those without occurrence of CHD is {nochd['TOTCHOL'].mean():.3f}.")

The mean total cholesterol for those with an occurrence of CHD is 249.482.
The mean total cholesterol for those without occurrence of CHD is 232.846.

The means are different, but it’s not clear if they are a lot different, or just a little bit. That depends on the sample sizes and the distribution of cholesterol levels!

Write a line or two of code to figure out how many people in the study have an occurrence of CHD and how many do not.
Make a histogram of the cholesterol levels for both samples. Describe the distributions’ centers, shapes, and compare the two.

6.7.2. Part 2: Two Sample T-tests#

We want to determine if there is a true difference in average cholesterol levels between people with heart disease and those without. This is true in our sample, but we need to see if the evidence is enough to make a claim about the population. We are testing the hypotheses:

\( H_0: \mu_1 = \mu_2\) \( H_1: \mu_1 \neq \mu_2\)

where \(\mu_1\) and \(\mu_2\) represent the average total cholesterol levels of the CHD and No CHD populations, respectively. But first we need to know if using a T-test is valid!

Describe the assumptions of the T-test and comment on if they are valid for this example. Your work in Part 1 should be enough.

Once we know that the test is appropriate for our problem, we want to perform the test. Recall that to do this we want to set up a CompareMeans object from our two samples. Just like in the chapter, we do that in the following way.

sample1 = chd['TOTCHOL']
sample2 = nochd['TOTCHOL']

# create a CompareMeans object from the two samples
cm = sm.stats.weightstats.CompareMeans.from_data(sample1, sample2)

Now we are ready to perform the test!

Compute the test statistic and \(P\)-value using cm.ttest_ind.
Write your conclusion in a complete sentence.
Give a confidence interval for the difference in average total cholesterol.

JNB Lab: The Framingham Heart Study

Contents

6.7. JNB Lab: The Framingham Heart Study#

6.7.1. Part 1: Explore the data#

6.7.2. Part 2: Two Sample T-tests#