# Chi-Squared Goodness of Fit Test

One categorical variable with counts (or proportion) in each category

We have seen: products are produced by two machines, machine A produced 15 defective parts in a run of 280, while machine B produced 10 defective parts in a run of 200. Is there a difference in the reliability of these two machines?

New question type: products are produced by three machines, machine A produced 15 defective parts in a run of 280, while machine B produced 10 defective parts in a run of 200. Is there a difference in the reliability of these machines?

# Practice Question 1

In the past, for a large introductory statistics course, the proportions of students that received grades of A, B, C, D, or F have been, respectively, 0.15, 0.35, 0.30, 0.10, and 0.10

This year, there were 200 students in the class, and following grades were given:

Grade | A | B | C | D | F |
---|---|---|---|---|---|

Number | 51 | 79 | 61 | 8 | 1 |

Test to see whether the distribution of grades this year was different from the distribution in the past?

Hypothesis

H0: PA = 0.15, PB = 0.35, PC = 0.30, PD = 0.10, PF = 0.10

H1: at least one p does not fit the distribution

Calculate expected values

Grade | A | B | C | D | F |
---|---|---|---|---|---|

Observed | 51 | 79 | 61 | 8 | 1 |

Expected | 30 | 70 | 60 | 20 | 20 |

Conditions

Random

Independent

Count: At least 80% of the expected counts are greater than 5 and none are less than 1

Calculate

Calculate by calculator

P-value

df = k - 1 (k: number of categories)

Interpret

P < α

So we reject the null hypothesis and have evidence to support the claim that at least one grade proportion does not fit the expected distribution

# Chi-Squared Test of Homogeneity or Independence/Association

Two categorical variables

Homogeneity

Do two or more sub-groups of a population share the same distribution of a categorical variable (each group has its own sample)

Do people of different races have the same proportion of smokers to non-smokers.

Do different education levels have different proportions of Democrats, Republicans, and Independent

Independence/Association

Determining whether two categorical variables are associated (variables from a single SRS)

Is there an association between race and smoking status

Is there an association between education and voting preference

# Practice Question 2

Girls and boys at an elementary school were sampled and asked about their favorite subject

- Does favorite subject differ by gender?

Favorite subject | Boys | Girls | Total |
---|---|---|---|

Math | 96 | 295 | 391 |

English | 32 | 45 | 77 |

Social Studies | 94 | 40 | 134 |

Total | 222 | 380 | 602 |

Hypothesis

H0: favorite subject does not differ by gender

H1: favorite subject does differ by gender

Expected

- Row Total * Colum Total / Total

Conditions

For each sub group, the sample is a SRS

NO expected cell counts are < 5

Calculate

Calculate by calculator

![ТЬИ PIus Stver Editj(T ф TEns [NSTRUMENTS гоямдта CALCF4 ](./media/image303.png)

![П-84 PIus Stver Editj(n ф TEn.s [NSTRUMENTS гоямдтп CALCF4 п ](./media/image305.png)

P-value

- df = (r-1)*(c-1)

Interpret

P < α

So we reject the null hypothesis and have evidence to support the claim that favourite subject is different between boys and girls

- Is favourite subject associated with gender?

H0: There is no association between favorite subject and gender

H1: There is an association between favorite subject and gender

# Practice Question 3

You are playing a dice game with a friend. They brought a 6 sided die that you think may not be fair. You conduct an experiment to determine if it is fair. You roll the die 100 times and get following:

Side | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|

Frequency | 17 | 24 | 15 | 22 | 12 | 10 |

Hypothesis

H0: P1 = P2 = P3 = P4 = P5 = P6

H1: at least one is not equal

Expected

- 1/6 = 0.1667

Conditions

Random

Independent

Expected counts are greater than 5

Calculate

Calculate by calculator

Interpret

P > α

So we fail to reject the null hypothesis and do not have evidence to support the claim that the die is unfair