際際滷

際際滷Share a Scribd company logo
Tutorial: Chi-Square
Distribution
Presented by: Nikki Natividad
Course: BIOL 5081 - Biostatistics
Purpose
 To measure discontinuous categorical/binned data
in which a number of subjects fall into categories
 We want to compare our observed data to what we
expect to see. Due to chance? Due to association?
 When can we use the Chi-Square Test?
 Testing outcome of Mendelian Crosses, Testing
Independence  Is one factor associated with another?,
Testing a population for expected proportions
Assumptions:
 1 or more categories
 Independent observations
 A sample size of at least 10
 Random sampling
 All observations must be used
 For the test to be accurate, the expected
frequency should be at least 5
Conducting Chi-Square Analysis
1) Make a hypothesis based on your basic biological
question
2) Determine the expected frequencies
3) Create a table with observed frequencies, expected
frequencies, and chi-square values using the
formula:
(O-E)2
E
4) Find the degrees of freedom: (c-1)(r-1)
5) Find the chi-square statistic in the Chi-Square
Distribution table
6) If chi-square statistic > your calculated chi-square
value, you do not reject your null hypothesis and
Example 1: Testing for
Proportions
Leaf Cutter
Ants
Carpenter
Ants
Black Ants Total
Observed 25 18 17 60
Expected 20 20 20 60
O-E 5 -2 -3 0
(O-E)2
E
1.25 0.2 0.45 2 = 1.90
HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black
ants.
HA: Horned lizards eat more amounts of one species of ants than the
others.
2 = Sum of all: (O-E)2
E
Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2
Under a critical value of your choice (e.g. 留 = 0.05 or 95% confidence),
look up Chi-square statistic on a Chi-square distribution table.
Example 1: Testing for
Proportions
2
留=0.05 = 5.991
Example 1: Testing for
Proportions
Chi-square statistic: 2 = 5.991 Our calculated value: 2 =
1.90
*If chi-square statistic > your calculated value, then you do not reject
your null hypothesis. There is a significant difference that is not due to
chance.
5.991 > 1.90  We do not reject our null hypothesis.
Leaf Cutter
Ants
Carpenter
Ants
Black Ants Total
Observed 25 18 17 60
Expected 20 20 20 60
O-E 5 -2 -3 0
(O-E)2
E
1.25 0.2 0.45 2 = 1.90
SAS: Example 1
Included to format
the table
Define your data
Indicate what
your want in your
output
SAS: Example 1
SAS: What does the p-value
mean?
The exact p-value for a nondirectional test is the
sum of probabilities for the table having a test
statistic greater than or equal to the value of the
observed test statistic.
High p-value: High probability that test statistic >
observed test statistic. Do not reject null
hypothesis.
Low p-value: Low probability that test statistic >
observed test statistic. Reject null hypothesis.
SAS: Example 1
High probability that
Chi-Square statistic >
our calculated chi-
square statistic.
We do not reject our null
hypothesis.
SAS: Example 1
Example 2: Testing
Association
c
cellchi2 = displays how much each cell
contributes to the overall chi-squared
value
no col = do not display totals of column
no row = do not display totals of rows
chi sq = display chi square statistics
HO: Gender and eye colour are
not associated with each other.
HA: Gender and eye colour are
associated with each other.
Example 2: More SAS
Examples
Example 2: More SAS
Examples
(2-1)(3-1) = 1*2 = 2
High probability that
Chi-Square statistic > our
calculated chi-square
statistic. (78.25%)
We do not reject our null
hypothesis.
Example 2: More SAS
Examples
If there was an
association, can
check which
interactions describe
association by
looking at how much
each cell contributes
to the overall Chi-
square value.
Limitations
 No categories should be less than 1
 No more than 1/5 of the expected categories should
be less than 5
 To correct for this, can collect larger samples or
combine your data for the smaller expected
categories until their combined value is 5 or more
 Yates Correction*
 When there is only 1 degree of freedom, regular
chi-test should not be used
 Apply the Yates correction by subtracting 0.5 from
the absolute value of each calculated O-E term,
then continue as usual with the new corrected
values
What do these mean?
Likelihood Ratio Chi Square
Continuity-Adjusted Chi-Square
Test
Mantel-Haenszel Chi-Square
Test
QMH = (n-1)r2
 r2 is the Pearson correlation coefficient (which also
measures the linear association between row and
column)
 http://support.sas.com/documentation/cdl/en/procstat/63104/HTM
L/default/viewer.htm#procstat_freq_a0000000659.htm
 Tests alternative hypothesis that there is a linear
association between the row and column variable
 Follows a Chi-square distribution with 1 degree of
freedom
Phi Coefficient
Contigency Coefficient
Cramers V
Yates & 2 x 2 Contingency
Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1
We need to use the YATES CORRECTION
High
Cholesterol
Low
Cholesterol
Total
Heart Disease 15 7 22
Expected 12.65 9.35 22
Chi-Square 0.44 0.59 1.03
No Heart Disease 8 10 18
Expected 10.35 7.65 18
Chi-Square 0.53 0.72 1.25
TOTAL 23 17 40
Chi-Square Total 2.28
Yates & 2 x 2 Contingency
Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
High
Cholesterol
Low
Cholesterol
Total
Heart Disease 15 7 22
Expected 12.65 9.35 22
Chi-Square 0.27 0.37 0.64
No Heart Disease 8 10 18
Expected 10.35 7.65 18
Chi-Square 0.33 0.45 0.78
TOTAL 23 17 40
Chi-Square Total 1.42
(|15-12.65| -
0.5)2
12.65
= 0.27
Example 1: Testing for
Proportions
2
留=0.05 = 3.841
Yates & 2 x 2 Contingency
Tables
HO: Heart Disease is not associated with cholesterol levels.
HA: Heart Disease is more likely in patients with a high cholesterol diet.
3.841 > 1.42  We do not reject our null hypothesis.
High
Cholesterol
Low
Cholesterol
Total
Heart Disease 15 7 22
Expected 12.65 9.35 22
Chi-Square 0.27 0.37 0.64
No Heart Disease 8 10 18
Expected 10.35 7.65 18
Chi-Square 0.33 0.45 0.78
TOTAL 23 17 40
Chi-Square Total 1.42
Fishers Exact Test
 Left: Use when the alternative to independence is
negative association between the variables. These
observations tend to lie in lower left and upper right
cells of the table. Small p-value = Likely negative
association.
 Right: Use this one-sided test when the alternative
to independence is positive association between
the variables. These observations tend to lie in
upper left and lower right cells or the table. Small
p-value = Likely positive association.
 Two-Tail: Use this when there is no prior
alternative.
Yates & 2 x 2 Contingency
Tables
Yates & 2 x 2 Contingency
Tables
HO: Heart Disease is not
associated with cholesterol
levels.
HA: Heart Disease is more
likely in patients with a high
cholesterol diet.
Conclusion
 The Chi-square test is important in testing the
association between variables and/or checking if
ones expected proportions meet the reality of
ones experiment
 There are multiple chi-square tests, each catered
to a specific sample size, degrees of freedom, and
number of categories
 We can use SAS to conduct Chi-square tests on
our data by utilizing the command proc freq
References
Chi-Square Test Descriptions:
http://www.enviroliteracy.org/pdf/materials/1210.pdf
http://129.123.92.202/biol1020/Statistics/Appendix
%206%20%20The%20Chi-Square%20TEst.pdf
Ozdemir T and Eyduran E. 2005. Comparison of chi-
square and likelihood ratio chi-square tests: power of test.
Journal of Applied Sciences Research. 1(2):242-244.
SAS Support website: http://www.sas.com/index.html
FREQ procedure
YouTube Chi-square SAS Tutorial (user: mbate001):
http://www.youtube.com/watch?v=ACbQ8FJTq7k

More Related Content

Similar to Chi-Square Presentation - Nikki.ppt (20)

PPTX
Goodness of-fit
Long Beach City College
PPTX
Test of significance
Dr. Imran Zaheer
PPTX
Parametric & non parametric
ANCYBS
PPTX
Contingency Tables
Long Beach City College
PPT
Chapter12
rwmiller
PDF
chi square statistics
Dr. PRABHAT KUMAR SINGH
PPTX
Tests of statistical significance : chi square and spss
Drsnehas2
PPT
Statistical tests for categorical data(2020)88.ppt
yonas381043
PPT
5--Test of hypothesis statistics (part_1).ppt
hussnainbajwa101
PDF
9618821.pdf
UMAIRASHFAQ20
PPT
9618821.ppt
UMAIRASHFAQ20
PPTX
Chi squared test
Ramakanth Gadepalli
DOCX
Non parametrics tests
rodrick koome
PDF
Lecture # 10 chi-square test of association
LalaZaheer
PPTX
Chi square test
Dr.Syam Chandran.C
PDF
inferentialstatistics-210411214248.pdf
ChenPalaruan
PPTX
Inferential statistics
Dalia El-Shafei
PPTX
Chi square test
Patel Parth
DOCX
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docx
bartholomeocoombs
PPTX
Contingency tables
Long Beach City College
Goodness of-fit
Long Beach City College
Test of significance
Dr. Imran Zaheer
Parametric & non parametric
ANCYBS
Contingency Tables
Long Beach City College
Chapter12
rwmiller
chi square statistics
Dr. PRABHAT KUMAR SINGH
Tests of statistical significance : chi square and spss
Drsnehas2
Statistical tests for categorical data(2020)88.ppt
yonas381043
5--Test of hypothesis statistics (part_1).ppt
hussnainbajwa101
9618821.pdf
UMAIRASHFAQ20
9618821.ppt
UMAIRASHFAQ20
Chi squared test
Ramakanth Gadepalli
Non parametrics tests
rodrick koome
Lecture # 10 chi-square test of association
LalaZaheer
Chi square test
Dr.Syam Chandran.C
inferentialstatistics-210411214248.pdf
ChenPalaruan
Inferential statistics
Dalia El-Shafei
Chi square test
Patel Parth
Chapter 11 Chi-Square Tests and ANOVA 359 Chapter .docx
bartholomeocoombs
Contingency tables
Long Beach City College

More from BAGARAGAZAROMUALD2 (13)

PDF
water-13-00495-v3.pdf
BAGARAGAZAROMUALD2
PPT
AssessingNormalityandDataTransformations.ppt
BAGARAGAZAROMUALD2
PPT
5116427.ppt
BAGARAGAZAROMUALD2
PPT
240-design.ppt
BAGARAGAZAROMUALD2
PDF
Remote Sensing_2020-21 (1).pdf
BAGARAGAZAROMUALD2
PPT
Szeliski_NLS1.ppt
BAGARAGAZAROMUALD2
PPT
AssessingNormalityandDataTransformations.ppt
BAGARAGAZAROMUALD2
PPT
18-21 Principles of Least Squares.ppt
BAGARAGAZAROMUALD2
PPTX
Ch 11.2 Chi Squared Test for Independence.pptx
BAGARAGAZAROMUALD2
PPT
lecture12.ppt
BAGARAGAZAROMUALD2
PPT
StatWRLecture6.ppt
BAGARAGAZAROMUALD2
PPT
chapter18.ppt
BAGARAGAZAROMUALD2
PPT
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
water-13-00495-v3.pdf
BAGARAGAZAROMUALD2
AssessingNormalityandDataTransformations.ppt
BAGARAGAZAROMUALD2
5116427.ppt
BAGARAGAZAROMUALD2
240-design.ppt
BAGARAGAZAROMUALD2
Remote Sensing_2020-21 (1).pdf
BAGARAGAZAROMUALD2
Szeliski_NLS1.ppt
BAGARAGAZAROMUALD2
AssessingNormalityandDataTransformations.ppt
BAGARAGAZAROMUALD2
18-21 Principles of Least Squares.ppt
BAGARAGAZAROMUALD2
Ch 11.2 Chi Squared Test for Independence.pptx
BAGARAGAZAROMUALD2
lecture12.ppt
BAGARAGAZAROMUALD2
StatWRLecture6.ppt
BAGARAGAZAROMUALD2
chapter18.ppt
BAGARAGAZAROMUALD2
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
Ad

Recently uploaded (20)

PPTX
Introduction_to_GD&T_Complete.pptx_growww
rajkumarsingh764766
PPTX
Design _of RC _Structure_Presentation.pptx
htunhtunLinn6
PDF
Pompidou-case-study-detailed-planF16.pdf
qxtr95m9nf
PPTX
Factorising Expressions Presentation in Green Brown Illustrative Scrapbook St...
hazlienasyiqeen
PDF
Madrina Brewery - Label design, character design
impybla
PDF
LESSON LEARNING PLAN Subject: ICT Computer Systems Servicing (CSS)
sachidanacabel
PPTX
FINAL.......... april 02-2025 april.pptx
MAACJudymaeM
PPTX
Design Thinking Infographics by 際際滷go.pptx
JuanAntonioAguirreAb2
DOCX
pre test uncertainty, conflictbnnnmnb mn
sanghahembramsh
PPTX
assignmesmcnjjanckujeckusent2-summit1.pptx
DoanHoaiAnhDuongK18C
PDF
HISTORY OF Ethiopia and Horn of African
dereab29
PDF
70% of Users Leave Unresponsive Sites Is Yours Driving Them Away?
Virtual Employee Pvt. Ltd.
PPTX
Modelling for etab and design on etabs for concrete
MohamedAttia601252
PPTX
Design_Guidelinescarrr_Presentation.pptx
kikajic949
PDF
COLOUR IN INTERIOR DESIGN- KAVYA CHAWLA .pdf
KavyaChawla4
PPTX
Turn prompts into brochures - AI Brochure Generator
Venngage AI Infographic Generator
PPTX
BASIC PRACTICE POWER POINT PRESENTATION 1
rkbasumatary02
PPTX
Round 1 Final Assessment-Chelsea Black.pptx
indiapoliticscom
PPTX
Iot module of the module 4 is the very beautiful
prodbythre
PPTX
ai teaching assistant for visual learning.pptx
jamesmay2663
Introduction_to_GD&T_Complete.pptx_growww
rajkumarsingh764766
Design _of RC _Structure_Presentation.pptx
htunhtunLinn6
Pompidou-case-study-detailed-planF16.pdf
qxtr95m9nf
Factorising Expressions Presentation in Green Brown Illustrative Scrapbook St...
hazlienasyiqeen
Madrina Brewery - Label design, character design
impybla
LESSON LEARNING PLAN Subject: ICT Computer Systems Servicing (CSS)
sachidanacabel
FINAL.......... april 02-2025 april.pptx
MAACJudymaeM
Design Thinking Infographics by 際際滷go.pptx
JuanAntonioAguirreAb2
pre test uncertainty, conflictbnnnmnb mn
sanghahembramsh
assignmesmcnjjanckujeckusent2-summit1.pptx
DoanHoaiAnhDuongK18C
HISTORY OF Ethiopia and Horn of African
dereab29
70% of Users Leave Unresponsive Sites Is Yours Driving Them Away?
Virtual Employee Pvt. Ltd.
Modelling for etab and design on etabs for concrete
MohamedAttia601252
Design_Guidelinescarrr_Presentation.pptx
kikajic949
COLOUR IN INTERIOR DESIGN- KAVYA CHAWLA .pdf
KavyaChawla4
Turn prompts into brochures - AI Brochure Generator
Venngage AI Infographic Generator
BASIC PRACTICE POWER POINT PRESENTATION 1
rkbasumatary02
Round 1 Final Assessment-Chelsea Black.pptx
indiapoliticscom
Iot module of the module 4 is the very beautiful
prodbythre
ai teaching assistant for visual learning.pptx
jamesmay2663
Ad

Chi-Square Presentation - Nikki.ppt

  • 1. Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics
  • 2. Purpose To measure discontinuous categorical/binned data in which a number of subjects fall into categories We want to compare our observed data to what we expect to see. Due to chance? Due to association? When can we use the Chi-Square Test? Testing outcome of Mendelian Crosses, Testing Independence Is one factor associated with another?, Testing a population for expected proportions
  • 3. Assumptions: 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5
  • 4. Conducting Chi-Square Analysis 1) Make a hypothesis based on your basic biological question 2) Determine the expected frequencies 3) Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E 4) Find the degrees of freedom: (c-1)(r-1) 5) Find the chi-square statistic in the Chi-Square Distribution table 6) If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and
  • 5. Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 2 = 1.90 HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. 2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. 留 = 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.
  • 6. Example 1: Testing for Proportions 2 留=0.05 = 5.991
  • 7. Example 1: Testing for Proportions Chi-square statistic: 2 = 5.991 Our calculated value: 2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 We do not reject our null hypothesis. Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 2 = 1.90
  • 8. SAS: Example 1 Included to format the table Define your data Indicate what your want in your output
  • 10. SAS: What does the p-value mean? The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic. High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis.
  • 11. SAS: Example 1 High probability that Chi-Square statistic > our calculated chi- square statistic. We do not reject our null hypothesis.
  • 13. Example 2: Testing Association c cellchi2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated with each other.
  • 14. Example 2: More SAS Examples
  • 15. Example 2: More SAS Examples (2-1)(3-1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis.
  • 16. Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi- square value.
  • 17. Limitations No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* When there is only 1 degree of freedom, regular chi-test should not be used Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values
  • 18. What do these mean?
  • 21. Mantel-Haenszel Chi-Square Test QMH = (n-1)r2 r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) http://support.sas.com/documentation/cdl/en/procstat/63104/HTM L/default/viewer.htm#procstat_freq_a0000000659.htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom
  • 25. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.44 0.59 1.03 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.53 0.72 1.25 TOTAL 23 17 40 Chi-Square Total 2.28
  • 26. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.33 0.45 0.78 TOTAL 23 17 40 Chi-Square Total 1.42 (|15-12.65| - 0.5)2 12.65 = 0.27
  • 27. Example 1: Testing for Proportions 2 留=0.05 = 3.841
  • 28. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. 3.841 > 1.42 We do not reject our null hypothesis. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.33 0.45 0.78 TOTAL 23 17 40 Chi-Square Total 1.42
  • 29. Fishers Exact Test Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. Two-Tail: Use this when there is no prior alternative.
  • 30. Yates & 2 x 2 Contingency Tables
  • 31. Yates & 2 x 2 Contingency Tables
  • 32. HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet.
  • 33. Conclusion The Chi-square test is important in testing the association between variables and/or checking if ones expected proportions meet the reality of ones experiment There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq
  • 34. References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix %206%20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi- square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html FREQ procedure YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k

Editor's Notes

  • #10: Remember to note that expected frequencies are pre-assigned by SAS to make sure each group has equal proportions
  • #11: OPTIONAL: Could say this with last slide
  • #12: Remember to note that expected frequencies are pre-assigned by SAS to make sure each group has equal proportions