Chi-Squared Goodness of Fit Test

One categorical variable with counts (or proportion) in each category
We have seen: products are produced by two machines, machine A produced 15 defective parts in a run of 280, while machine B produced 10 defective parts in a run of 200. Is there a difference in the reliability of these two machines?
New question type: products are produced by three machines, machine A produced 15 defective parts in a run of 280, while machine B produced 10 defective parts in a run of 200. Is there a difference in the reliability of these machines?

Practice Question 1

In the past, for a large introductory statistics course, the proportions of students that received grades of A, B, C, D, or F have been, respectively, 0.15, 0.35, 0.30, 0.10, and 0.10

This year, there were 200 students in the class, and following grades were given:

Grade	A	B	C	D	F
Number	51	79	61	8	1

Test to see whether the distribution of grades this year was different from the distribution in the past?

Hypothesis
- H0: PA = 0.15, PB = 0.35, PC = 0.30, PD = 0.10, PF = 0.10
- H1: at least one p does not fit the distribution
Calculate expected values

Grade	A	B	C	D	F
Observed	51	79	61	8	1
Expected	30	70	60	20	20

Conditions
- Random
- Independent
- Count: At least 80% of the expected counts are greater than 5 and none are less than 1
Calculate

$C:\\6432CA65\\FE01530B-89BD-4F8B-A3E1-55F12080AD12\_files\\image296.png$

$计算机生成了可选文字: ． rl ． 84 mus Silver E 确 t 1 STATPLOTFiTSL\*TF2 FOR 众 T CALC 1 禹$
Calculate by calculator

$计算机生成了可选文字: Tl-B4 Pius Silver E 确 t 1 STATPLOT\*iTgLS\&TF2 FORMAT CALC 10$

$计算机生成了可选文字: ． rx ． 84 mus & 、吧． E t 1 STATPLOTFITgLS\&TF2 FORMAT F3 CALC F4$
P-value

df = k - 1 (k: number of categories)
Interpret
- P < α
- So we reject the null hypothesis and have evidence to support the claim that at least one grade proportion does not fit the expected distribution

Chi-Squared Test of Homogeneity or Independence/Association

Two categorical variables
Homogeneity
- Do two or more sub-groups of a population share the same distribution of a categorical variable (each group has its own sample)
- Do people of different races have the same proportion of smokers to non-smokers.
- Do different education levels have different proportions of Democrats, Republicans, and Independent
Independence/Association
- Determining whether two categorical variables are associated (variables from a single SRS)
- Is there an association between race and smoking status
- Is there an association between education and voting preference

Practice Question 2

Girls and boys at an elementary school were sampled and asked about their favorite subject

Does favorite subject differ by gender?

Favorite subject	Boys	Girls	Total
Math	96	295	391
English	32	45	77
Social Studies	94	40	134
Total	222	380	602

Hypothesis
- H0: favorite subject does not differ by gender
- H1: favorite subject does differ by gender
Expected
- Row Total * Colum Total / Total
Conditions
- For each sub group, the sample is a SRS
- NO expected cell counts are < 5
Calculate

$C:\\6432CA65\\FE01530B-89BD-4F8B-A3E1-55F12080AD12\_files\\image302.png$
Calculate by calculator

![ТЬИ PIus Stver Editj(T ф TEns [NSTRUMENTS гоямдта CALCF4 ](./media/image303.png)

![П-84 PIus Stver Editj(n ф TEn.s [NSTRUMENTS гоямдтп CALCF4 п ](./media/image305.png)
P-value
- df = (r-1)*(c-1)
Interpret
- P < α
- So we reject the null hypothesis and have evidence to support the claim that favourite subject is different between boys and girls

Is favourite subject associated with gender?

H0: There is no association between favorite subject and gender
H1: There is an association between favorite subject and gender

Practice Question 3

You are playing a dice game with a friend. They brought a 6 sided die that you think may not be fair. You conduct an experiment to determine if it is fair. You roll the die 100 times and get following:

Side	1	2	3	4	5	6
Frequency	17	24	15	22	12	10

Hypothesis
- H0: P1 = P2 = P3 = P4 = P5 = P6
- H1: at least one is not equal
Expected
- 1/6 = 0.1667
Conditions
- Random
- Independent
- Expected counts are greater than 5
Calculate

$C:\\6432CA65\\FE01530B-89BD-4F8B-A3E1-55F12080AD12\_files\\image296.png$
Calculate by calculator
Interpret
- P > α
- So we fail to reject the null hypothesis and do not have evidence to support the claim that the die is unfair

6.6 Hypothesis Tests for Categorical Data (Chi-Squared Test)

Chi-Squared Goodness of Fit Test

Practice Question 1

Chi-Squared Test of Homogeneity or Independence/Association

Practice Question 2

Practice Question 3

results matching ""

No results matching ""