狠狠撸

狠狠撸Share a Scribd company logo
Introduction to
Biostatistics
Lecture Objectives
? Overall: To give a basic understanding
of descriptive statistics
? Specific:
– understand the branches of statistics
– understand the different types of data that
can be collected
Statistics
? The science of collecting, monitoring,
analyzing, summarizing, and
interpreting data.
– This includes design issues as well.
Branches of Statistics
4
? Descriptive statistics
– Gives numerical and graphic procedures to
summarize a collection of data in a clear and
understandable way.
– Provide summary indices for a given data, e.g.
arithmetic mean, median, standard deviation,
coefficient of variation, etc.
? Inductive (inferential) statistics
– Provides procedures to draw inferences about a
population from a sample
Population
sample
Estimating population values from sample values
Why need biostatistics?
? Main reason: handling variations
–Biological variation
? Attribute differ not only among individuals
but also within same individual over time
? Example: height, weight, blood pressure,
eye color ...
–Sample variation
? Biomedical research projects are usually
carried out on small numbers of study
subjects 5
Role of biostatistics in Epidemiology
6
? Epidemiology is the study of the distribution and
determinants of health-related states or events
(including disease), and the application of this study to
the control of diseases and other health problems.
? Essential for scientific method of investigation
– Formulate hypothesis
– Design study to objectively test hypothesis
– Collect reliable and unbiased data
– Process and evaluate data rigorously
– Interpret and draw appropriate conclusions
? Essential for understanding, appraisal and
critique of scientific literature.
What is Data?
Variable
? Any measurable characteristic that
assumes different values for different
subjects, e.g., age, height, hair colour,
gender
? Observation of variables on different
subjects gives rise to data
Types of data
? Qualitative (Categorical data)
– Gender, disease severity
? Quantitative (Measurement) data
– Age, BP,Weight
Categorical Data
? The variable being studied are grouped
into categories based on some
qualitative trait.
? The resulting data are merely labels or
categories.
Examples: Categorical Data
? Hair color
– blonde, brown, red, black, etc.
? Opinion of students about riots
– ticked off, neutral, happy
? Smoking status
– smoker, non-smoker
Categorical data classified as Nominal,
Ordinal, and/or Binary
Categorical data
Not binary
Binary
Ordinal
data
Nominal
data
Binary Not binary
Nominal Data
? A type of categorical data in which objects fall
into unordered categories. E.g.
? Hair color
– blonde, brown, red, black, etc.
? Race
– Caucasian, African-American, Asian, etc.
? Smoking status
– smoker, non-smoker
Ordinal Data
? A type of categorical data in which
order is important. Examples
? Education level
– None, Primary, Post primary
? Degree of illness
– none, mild, moderate, severe
Binary Data
? A type of categorical data in which there
are only two categories.
? Binary data can either be nominal or
ordinal. Examples
? Smoking status
– smoker, non-smoker
? Education
– Primary, Post primary
Measurement Data
? The variables being studied are
“measured” based on some
quantitative trait.
? The resulting data are set of numbers.
Measurement data classified as
Discrete or Continuous
Measurement
data
Continuous
Discrete
Discrete Measurement Data
Only certain values are possible (there
are gaps between the possible values).
Continuous Measurement Data
Theoretically, any value within an interval is possible
with a fine enough measuring device.
Discrete data -- Gaps between possible values
0 1 2 3 4 5 6 7
Continuous data -- Theoretically,
no gaps between possible values
0 1000
Discrete Measurement Data
Examples
? Number of pregnancies
? Number of students late for class
? Number of crimes reported
? Number of huts in a sampled rural home
? CD4 counts
Generally, discrete data are counts.
Continuous Measurement Data
Examples
? Cholesterol level
? Height
? Body weight
? BP
Generally, continuous data come from
measurements.
Descriptive Statistics
A first step to summarizing
or describing raw data
What to describe?
? What is the “location” or “center” of the
data? (“measures of location”)
? How do the data vary? (“measures of
variability”)
Measures of Location
Measures of location indicate where on the
number line the data are to be found.
Common measures of location are:
? Mean
? Median
? Mode
Mean
? Another name for average.
? Let X1,X2,X3,…,Xn be the realised
values of a variable X, from a sample of
size n. Then the mean is
Formula:
n
i
X
X
?
?
That is, add up all of the data points and divide
by the number of data points.
Median
? Another name for 50th percentile
? ( Middle value).
? Appropriate for describing measurement
data.
? “Robust to outliers,” that is, not
affected much by unusual values.
Example
? ?
14
.
137
7
113
124
146
170
132
124
151
?
?
?
?
?
?
?
?
X
The systolic blood pressure of seven
middle aged men were as follows:
151, 124, 132, 170, 146, 124 and 113.
The mean is
Median
? Also known as the 50th percentile or
simply the middle value
? If the sample data are arranged in
increasing order, the median is
(i) the middle value if n is an odd number, or
(ii) midway between the two middle values if
n is an even number
Example 1. Median- n is odd
The reordered systolic blood pressure data
seen earlier are:
113, 124, 124, 132, 146, 151, and 170.
Median=132
Example 2. Median if– n is even
Six men with high cholesterol participated in a study to
investigate the effects of diet on cholesterol level. At the
beginning of the study, their cholesterol levels (mg/dL)
were as follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows:
230, 274, 274, 292, 327 and 366.
The Median is half way between the middle two readings,
i.e. (274+292) ? 2 = 283.
Quartiles
31
? Quantiles: dividing the distribution of
ordered values into 4 equal-sized parts
First 25% Second 25% Third 25% Fourth 25%
Q1 Q2 Q3
Q1: first quartile
Q2 : second quartile = median
Q3: third quartile
Mode
? The value that occurs most frequently.
? One data set can have many modes.
? Appropriate for all types of data, but
most useful for categorical data or
discrete data with only a few number of
possible values.
The most appropriate measure
of location depends on …
the shape of the data’s
distribution.
Most appropriate measure of
location
? Depends on whether or not data are
“symmetric” or “skewed”.
? Depends on whether or not data have
one (“unimodal”) or more
(“multimodal”) modes.
Choosing Appropriate Measure of
Location
? If data are symmetric, the mean,
median, and mode will be approximately
the same.
? If data are multimodal, report the mean,
median and/or mode for each
subgroup.
? If data are skewed, report the median.
Mean versus Median
? Large sample values tend to inflate the
mean. This will happen if the
histogram of the data is right-skewed.
? The median is not influenced by large
sample values and is a better measure
of centrality if the distribution is
skewed.
Mean versus Median
37
? Median is less sensitive to extreme
values
x1 87 87
x2 95 95
x3 98 98
x4 101 101
x5 105.0 1050
Median is unchanged
Measures of Variation
38
? Summarize the dispersion of individual
values from some central value like the
mean
? Measures of dispersion characterise how
spread out the distribution is, i.e., how variable
the data are.
mean
x
x
x
x
x
x
Indices of Variation
? Commonly used measures of
dispersion include:
– Range
– Variance & standard deviation
– Inter-quartile range (IQR)
– Coefficient of Variation (or
relative standard deviation)
Range
?R= largest obs. - smallest obs.
or, equivalently
R = xmax - xmin
or, at times present
R = (xmin ,xmax )
Inter-quartile Range
? IQR = third quartile - first quartile
or, equivalently
IQR = Q3 - Q1
Q1 =lower quartile (has 25% of data
below and 75% above)
Q3=upper quartile (has 75% of data
below and 25% above)
IQR:-Example
? Consider the ages of 8 patients
18,21,23,24,24,32,42,59
Q1 =22 , Q3= 37
IQR=37-22=15
Variance
43
? Variance of a population : average of
squares of deviation from the mean
? Variance of a sample: usually subtract 1
from n in the denominator
n
X
X
n
i
i
2
1
)
( ?
?
?
1
)
( 2
1
?
?
?
?
n
X
X
n
i
i
effective sample
size, also called
degree of freedom
Standard deviation
44
? Problem with variance: its awkward unit
of measurement as value are squared
? Solution: taking square root of variance
=> standard deviation
? Sample standard deviation ( s or sd)
? ?
s s
x x
n
i
i
n
? ?
?
?
?
?
2
2
1
1
? it is the typical (standard) difference
(deviation) of an observation from the mean
? think of it as the average distance a data
point is from the mean, although this is not
strictly true
What is a standard deviation?
Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
14
.
137
?
x
Example (contd.)
Therefore,
? ? 86
.
2304
7
1
2
?
?
?
?
i
i x
x
6
.
19
1
7
86
.
2304
?
?
?
s
Standard deviation
48
? Caution must be exercised when using
standard deviation as a comparative index of
dispersion
Weights of
newborn elephants
(kg)
929 853
878 939
895 972
937 841
801 826
Weights of
newborn mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10
=887.1
sd =56.50
X
n=10
= 0.68
sd = 0.255
X
Incorrect to say that elephants show greater
variation for birth-weights than mice because of
higher standard deviation
Coefficient of variance
49
? Coefficient of variance expresses
standard deviation relative to its mean
X
s
cv ?
Mice show greater birth-weight variation
0637
.
0
?
elephants
cv
375
.
0
?
mice
cv
Measures of Variation -
Some Comments
50
? When comparison groups have very
different means (CV is suitable as it
expresses the standard deviation
relative to its corresponding mean)
? When different units of measurements
are involved, e.g. group 1 unit is mm,
and group 2 unit is gm (CV is suitable
for comparison as it is unit-free)
? In such cases, standard deviation
should not be used for comparison.
Measures of Variation -
Some Comments
? Range is the simplest, but is very
sensitive to outliers
? Variance units are the square of the
original units
? Interquartile range is mainly used with
skewed data (or data with outliers)
? standard deviation is the most
commonly used measure of variation.
.
? An outlier is an observation which does not
appear to belong with the other data
? Outliers can arise because of a
measurement or recording error or because
of equipment failure during an experiment,
etc.
? An outlier might be indicative of a sub-
population, e.g. an abnormally low or high
value in a medical test could indicate
presence of an illness in the patient.
Outliers
Q & A
53
? Thank you for your attention!

More Related Content

Similar to Introduction to Biostatistics_20_4_17.ppt (20)

Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
BirhanTesema
?
Descriptive statistics and sampling Methods ).ppt
Descriptive statistics and sampling Methods ).pptDescriptive statistics and sampling Methods ).ppt
Descriptive statistics and sampling Methods ).ppt
swati patel
?
PARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptxPARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptx
DrLasya
?
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato Begum
Dr. Cupid Lucid
?
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Megha Sharma
?
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
Dr. Senthilvel Vasudevan
?
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptxPlanning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
meishaarquilla
?
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
?
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
MuhammadNafees42
?
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
Ranjith Paravannoor
?
Basic statistics
Basic statisticsBasic statistics
Basic statistics
Seth Anandaram Jaipuria College
?
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
PETTIROSETALISIC
?
Biostatistics
BiostatisticsBiostatistics
Biostatistics
Dr. Senthilvel Vasudevan
?
Descriptive
DescriptiveDescriptive
Descriptive
Mmedsc Hahm
?
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
Hamdi Alhakimi
?
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
?
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
albertlaporte
?
Biostatistics
BiostatisticsBiostatistics
Biostatistics
priyarokz
?
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
ObsaHassanMohamed
?
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
?
Descriptive statistics and sampling Methods ).ppt
Descriptive statistics and sampling Methods ).pptDescriptive statistics and sampling Methods ).ppt
Descriptive statistics and sampling Methods ).ppt
swati patel
?
PARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptxPARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptx
DrLasya
?
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato Begum
Dr. Cupid Lucid
?
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Megha Sharma
?
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptxPlanning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
Planning-Data-Analysis-Using-Statistics_20241016_063349_0000.pptx
meishaarquilla
?
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
?
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
MuhammadNafees42
?
bio statistics for clinical research
bio statistics for clinical researchbio statistics for clinical research
bio statistics for clinical research
Ranjith Paravannoor
?
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
PETTIROSETALISIC
?
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
Hamdi Alhakimi
?
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
?
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
albertlaporte
?
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
?

Recently uploaded (20)

Contemporaries_10446988.pdf to learning and inovation
Contemporaries_10446988.pdf to learning and inovationContemporaries_10446988.pdf to learning and inovation
Contemporaries_10446988.pdf to learning and inovation
kumarmritunjay1580
?
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
rossanthonytan130
?
FSU COSSPP Engage Magazine 2022-2023 - page view
FSU COSSPP Engage Magazine 2022-2023 - page viewFSU COSSPP Engage Magazine 2022-2023 - page view
FSU COSSPP Engage Magazine 2022-2023 - page view
Rebecca Sage
?
Advance your anesthesia career with NAPA
Advance your anesthesia career with NAPAAdvance your anesthesia career with NAPA
Advance your anesthesia career with NAPA
NAPAAnesthesia1
?
Portofolio 2025_Atikah Hawa Citarahma.pdf
Portofolio 2025_Atikah Hawa Citarahma.pdfPortofolio 2025_Atikah Hawa Citarahma.pdf
Portofolio 2025_Atikah Hawa Citarahma.pdf
AtikahHawaCitarahma
?
How to Land an IT Job From Non-Tech Fields in 2025
How to Land an IT Job From Non-Tech Fields in 2025How to Land an IT Job From Non-Tech Fields in 2025
How to Land an IT Job From Non-Tech Fields in 2025
Base Camp
?
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
taqyed
?
Communicating with others developed by iubat.pptx
Communicating with others developed by iubat.pptxCommunicating with others developed by iubat.pptx
Communicating with others developed by iubat.pptx
23103093
?
Personal Brand Portfolio Melissa Iford.pdf
Personal Brand Portfolio Melissa Iford.pdfPersonal Brand Portfolio Melissa Iford.pdf
Personal Brand Portfolio Melissa Iford.pdf
melissaiford
?
Template to build an outstanding presentation
Template to build an outstanding presentationTemplate to build an outstanding presentation
Template to build an outstanding presentation
priyankaarul2023
?
Candidate Interview tips for career development
Candidate Interview tips for career developmentCandidate Interview tips for career development
Candidate Interview tips for career development
YFaheem
?
Sales Process Infographics by 狠狠撸sgo.pptx
Sales Process Infographics by 狠狠撸sgo.pptxSales Process Infographics by 狠狠撸sgo.pptx
Sales Process Infographics by 狠狠撸sgo.pptx
Rein85
?
Bangor University: A Legacy of Excellence in Education and Research
Bangor University: A Legacy of Excellence in Education and ResearchBangor University: A Legacy of Excellence in Education and Research
Bangor University: A Legacy of Excellence in Education and Research
studyabroad731
?
DEEPTI presentation[1].pptx on research thesis
DEEPTI presentation[1].pptx on research thesisDEEPTI presentation[1].pptx on research thesis
DEEPTI presentation[1].pptx on research thesis
ELISIONOFFICIAL
?
FSU COSSPP Engage Magazine 2021-2022 - page view
FSU COSSPP Engage Magazine 2021-2022 - page viewFSU COSSPP Engage Magazine 2021-2022 - page view
FSU COSSPP Engage Magazine 2021-2022 - page view
Rebecca Sage
?
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
NashiedaLilangBuale
?
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe BookProfessional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
NathanBaughman3
?
Tenorshare 4MeKey Crack With Registration Code [2025]
Tenorshare 4MeKey Crack With Registration Code [2025]Tenorshare 4MeKey Crack With Registration Code [2025]
Tenorshare 4MeKey Crack With Registration Code [2025]
sfretrehjygs
?
neurology neoplasm tumors and cysts part 2
neurology neoplasm tumors and cysts part 2neurology neoplasm tumors and cysts part 2
neurology neoplasm tumors and cysts part 2
shivangilahoty56
?
Contemporaries_10446988.pdf to learning and inovation
Contemporaries_10446988.pdf to learning and inovationContemporaries_10446988.pdf to learning and inovation
Contemporaries_10446988.pdf to learning and inovation
kumarmritunjay1580
?
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
727381415-SQUID-GAME-POWERPOINT-CLASSROOM-INTERACTIVE-POWERPOINT-GAME.pptx
rossanthonytan130
?
FSU COSSPP Engage Magazine 2022-2023 - page view
FSU COSSPP Engage Magazine 2022-2023 - page viewFSU COSSPP Engage Magazine 2022-2023 - page view
FSU COSSPP Engage Magazine 2022-2023 - page view
Rebecca Sage
?
Advance your anesthesia career with NAPA
Advance your anesthesia career with NAPAAdvance your anesthesia career with NAPA
Advance your anesthesia career with NAPA
NAPAAnesthesia1
?
Portofolio 2025_Atikah Hawa Citarahma.pdf
Portofolio 2025_Atikah Hawa Citarahma.pdfPortofolio 2025_Atikah Hawa Citarahma.pdf
Portofolio 2025_Atikah Hawa Citarahma.pdf
AtikahHawaCitarahma
?
How to Land an IT Job From Non-Tech Fields in 2025
How to Land an IT Job From Non-Tech Fields in 2025How to Land an IT Job From Non-Tech Fields in 2025
How to Land an IT Job From Non-Tech Fields in 2025
Base Camp
?
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
加拿大毕业证购买(魁北克大学成绩单)鲍蚕础惭文凭学历认证
taqyed
?
Communicating with others developed by iubat.pptx
Communicating with others developed by iubat.pptxCommunicating with others developed by iubat.pptx
Communicating with others developed by iubat.pptx
23103093
?
Personal Brand Portfolio Melissa Iford.pdf
Personal Brand Portfolio Melissa Iford.pdfPersonal Brand Portfolio Melissa Iford.pdf
Personal Brand Portfolio Melissa Iford.pdf
melissaiford
?
Template to build an outstanding presentation
Template to build an outstanding presentationTemplate to build an outstanding presentation
Template to build an outstanding presentation
priyankaarul2023
?
Candidate Interview tips for career development
Candidate Interview tips for career developmentCandidate Interview tips for career development
Candidate Interview tips for career development
YFaheem
?
Sales Process Infographics by 狠狠撸sgo.pptx
Sales Process Infographics by 狠狠撸sgo.pptxSales Process Infographics by 狠狠撸sgo.pptx
Sales Process Infographics by 狠狠撸sgo.pptx
Rein85
?
Bangor University: A Legacy of Excellence in Education and Research
Bangor University: A Legacy of Excellence in Education and ResearchBangor University: A Legacy of Excellence in Education and Research
Bangor University: A Legacy of Excellence in Education and Research
studyabroad731
?
DEEPTI presentation[1].pptx on research thesis
DEEPTI presentation[1].pptx on research thesisDEEPTI presentation[1].pptx on research thesis
DEEPTI presentation[1].pptx on research thesis
ELISIONOFFICIAL
?
FSU COSSPP Engage Magazine 2021-2022 - page view
FSU COSSPP Engage Magazine 2021-2022 - page viewFSU COSSPP Engage Magazine 2021-2022 - page view
FSU COSSPP Engage Magazine 2021-2022 - page view
Rebecca Sage
?
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
2- 8-Point Action Agenda_ The Medium-Term Strategy of the Health Sector for 2...
NashiedaLilangBuale
?
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe BookProfessional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
Professional Frozen Beverage Maker: IFSM26SERIES_WL_TRI_Recipe Book
NathanBaughman3
?
Tenorshare 4MeKey Crack With Registration Code [2025]
Tenorshare 4MeKey Crack With Registration Code [2025]Tenorshare 4MeKey Crack With Registration Code [2025]
Tenorshare 4MeKey Crack With Registration Code [2025]
sfretrehjygs
?
neurology neoplasm tumors and cysts part 2
neurology neoplasm tumors and cysts part 2neurology neoplasm tumors and cysts part 2
neurology neoplasm tumors and cysts part 2
shivangilahoty56
?

Introduction to Biostatistics_20_4_17.ppt

  • 2. Lecture Objectives ? Overall: To give a basic understanding of descriptive statistics ? Specific: – understand the branches of statistics – understand the different types of data that can be collected
  • 3. Statistics ? The science of collecting, monitoring, analyzing, summarizing, and interpreting data. – This includes design issues as well.
  • 4. Branches of Statistics 4 ? Descriptive statistics – Gives numerical and graphic procedures to summarize a collection of data in a clear and understandable way. – Provide summary indices for a given data, e.g. arithmetic mean, median, standard deviation, coefficient of variation, etc. ? Inductive (inferential) statistics – Provides procedures to draw inferences about a population from a sample Population sample Estimating population values from sample values
  • 5. Why need biostatistics? ? Main reason: handling variations –Biological variation ? Attribute differ not only among individuals but also within same individual over time ? Example: height, weight, blood pressure, eye color ... –Sample variation ? Biomedical research projects are usually carried out on small numbers of study subjects 5
  • 6. Role of biostatistics in Epidemiology 6 ? Epidemiology is the study of the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. ? Essential for scientific method of investigation – Formulate hypothesis – Design study to objectively test hypothesis – Collect reliable and unbiased data – Process and evaluate data rigorously – Interpret and draw appropriate conclusions ? Essential for understanding, appraisal and critique of scientific literature.
  • 8. Variable ? Any measurable characteristic that assumes different values for different subjects, e.g., age, height, hair colour, gender ? Observation of variables on different subjects gives rise to data
  • 9. Types of data ? Qualitative (Categorical data) – Gender, disease severity ? Quantitative (Measurement) data – Age, BP,Weight
  • 10. Categorical Data ? The variable being studied are grouped into categories based on some qualitative trait. ? The resulting data are merely labels or categories.
  • 11. Examples: Categorical Data ? Hair color – blonde, brown, red, black, etc. ? Opinion of students about riots – ticked off, neutral, happy ? Smoking status – smoker, non-smoker
  • 12. Categorical data classified as Nominal, Ordinal, and/or Binary Categorical data Not binary Binary Ordinal data Nominal data Binary Not binary
  • 13. Nominal Data ? A type of categorical data in which objects fall into unordered categories. E.g. ? Hair color – blonde, brown, red, black, etc. ? Race – Caucasian, African-American, Asian, etc. ? Smoking status – smoker, non-smoker
  • 14. Ordinal Data ? A type of categorical data in which order is important. Examples ? Education level – None, Primary, Post primary ? Degree of illness – none, mild, moderate, severe
  • 15. Binary Data ? A type of categorical data in which there are only two categories. ? Binary data can either be nominal or ordinal. Examples ? Smoking status – smoker, non-smoker ? Education – Primary, Post primary
  • 16. Measurement Data ? The variables being studied are “measured” based on some quantitative trait. ? The resulting data are set of numbers.
  • 17. Measurement data classified as Discrete or Continuous Measurement data Continuous Discrete
  • 18. Discrete Measurement Data Only certain values are possible (there are gaps between the possible values). Continuous Measurement Data Theoretically, any value within an interval is possible with a fine enough measuring device.
  • 19. Discrete data -- Gaps between possible values 0 1 2 3 4 5 6 7 Continuous data -- Theoretically, no gaps between possible values 0 1000
  • 20. Discrete Measurement Data Examples ? Number of pregnancies ? Number of students late for class ? Number of crimes reported ? Number of huts in a sampled rural home ? CD4 counts Generally, discrete data are counts.
  • 21. Continuous Measurement Data Examples ? Cholesterol level ? Height ? Body weight ? BP Generally, continuous data come from measurements.
  • 22. Descriptive Statistics A first step to summarizing or describing raw data
  • 23. What to describe? ? What is the “location” or “center” of the data? (“measures of location”) ? How do the data vary? (“measures of variability”)
  • 24. Measures of Location Measures of location indicate where on the number line the data are to be found. Common measures of location are: ? Mean ? Median ? Mode
  • 25. Mean ? Another name for average. ? Let X1,X2,X3,…,Xn be the realised values of a variable X, from a sample of size n. Then the mean is Formula: n i X X ? ? That is, add up all of the data points and divide by the number of data points.
  • 26. Median ? Another name for 50th percentile ? ( Middle value). ? Appropriate for describing measurement data. ? “Robust to outliers,” that is, not affected much by unusual values.
  • 27. Example ? ? 14 . 137 7 113 124 146 170 132 124 151 ? ? ? ? ? ? ? ? X The systolic blood pressure of seven middle aged men were as follows: 151, 124, 132, 170, 146, 124 and 113. The mean is
  • 28. Median ? Also known as the 50th percentile or simply the middle value ? If the sample data are arranged in increasing order, the median is (i) the middle value if n is an odd number, or (ii) midway between the two middle values if n is an even number
  • 29. Example 1. Median- n is odd The reordered systolic blood pressure data seen earlier are: 113, 124, 124, 132, 146, 151, and 170. Median=132
  • 30. Example 2. Median if– n is even Six men with high cholesterol participated in a study to investigate the effects of diet on cholesterol level. At the beginning of the study, their cholesterol levels (mg/dL) were as follows: 366, 327, 274, 292, 274 and 230. Rearrange the data in numerical order as follows: 230, 274, 274, 292, 327 and 366. The Median is half way between the middle two readings, i.e. (274+292) ? 2 = 283.
  • 31. Quartiles 31 ? Quantiles: dividing the distribution of ordered values into 4 equal-sized parts First 25% Second 25% Third 25% Fourth 25% Q1 Q2 Q3 Q1: first quartile Q2 : second quartile = median Q3: third quartile
  • 32. Mode ? The value that occurs most frequently. ? One data set can have many modes. ? Appropriate for all types of data, but most useful for categorical data or discrete data with only a few number of possible values.
  • 33. The most appropriate measure of location depends on … the shape of the data’s distribution.
  • 34. Most appropriate measure of location ? Depends on whether or not data are “symmetric” or “skewed”. ? Depends on whether or not data have one (“unimodal”) or more (“multimodal”) modes.
  • 35. Choosing Appropriate Measure of Location ? If data are symmetric, the mean, median, and mode will be approximately the same. ? If data are multimodal, report the mean, median and/or mode for each subgroup. ? If data are skewed, report the median.
  • 36. Mean versus Median ? Large sample values tend to inflate the mean. This will happen if the histogram of the data is right-skewed. ? The median is not influenced by large sample values and is a better measure of centrality if the distribution is skewed.
  • 37. Mean versus Median 37 ? Median is less sensitive to extreme values x1 87 87 x2 95 95 x3 98 98 x4 101 101 x5 105.0 1050 Median is unchanged
  • 38. Measures of Variation 38 ? Summarize the dispersion of individual values from some central value like the mean ? Measures of dispersion characterise how spread out the distribution is, i.e., how variable the data are. mean x x x x x x
  • 39. Indices of Variation ? Commonly used measures of dispersion include: – Range – Variance & standard deviation – Inter-quartile range (IQR) – Coefficient of Variation (or relative standard deviation)
  • 40. Range ?R= largest obs. - smallest obs. or, equivalently R = xmax - xmin or, at times present R = (xmin ,xmax )
  • 41. Inter-quartile Range ? IQR = third quartile - first quartile or, equivalently IQR = Q3 - Q1 Q1 =lower quartile (has 25% of data below and 75% above) Q3=upper quartile (has 75% of data below and 25% above)
  • 42. IQR:-Example ? Consider the ages of 8 patients 18,21,23,24,24,32,42,59 Q1 =22 , Q3= 37 IQR=37-22=15
  • 43. Variance 43 ? Variance of a population : average of squares of deviation from the mean ? Variance of a sample: usually subtract 1 from n in the denominator n X X n i i 2 1 ) ( ? ? ? 1 ) ( 2 1 ? ? ? ? n X X n i i effective sample size, also called degree of freedom
  • 44. Standard deviation 44 ? Problem with variance: its awkward unit of measurement as value are squared ? Solution: taking square root of variance => standard deviation ? Sample standard deviation ( s or sd) ? ? s s x x n i i n ? ? ? ? ? ? 2 2 1 1
  • 45. ? it is the typical (standard) difference (deviation) of an observation from the mean ? think of it as the average distance a data point is from the mean, although this is not strictly true What is a standard deviation?
  • 46. Example Data Deviation Deviation2 151 13.86 192.02 124 -13.14 172.73 132 -5.14 26.45 170 32.86 1079.59 146 8.86 78.45 124 -13.14 172.73 113 -24.14 582.88 Sum = 960.0 Sum = 0.00 Sum = 2304.86 14 . 137 ? x
  • 47. Example (contd.) Therefore, ? ? 86 . 2304 7 1 2 ? ? ? ? i i x x 6 . 19 1 7 86 . 2304 ? ? ? s
  • 48. Standard deviation 48 ? Caution must be exercised when using standard deviation as a comparative index of dispersion Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10 =887.1 sd =56.50 X n=10 = 0.68 sd = 0.255 X Incorrect to say that elephants show greater variation for birth-weights than mice because of higher standard deviation
  • 49. Coefficient of variance 49 ? Coefficient of variance expresses standard deviation relative to its mean X s cv ? Mice show greater birth-weight variation 0637 . 0 ? elephants cv 375 . 0 ? mice cv
  • 50. Measures of Variation - Some Comments 50 ? When comparison groups have very different means (CV is suitable as it expresses the standard deviation relative to its corresponding mean) ? When different units of measurements are involved, e.g. group 1 unit is mm, and group 2 unit is gm (CV is suitable for comparison as it is unit-free) ? In such cases, standard deviation should not be used for comparison.
  • 51. Measures of Variation - Some Comments ? Range is the simplest, but is very sensitive to outliers ? Variance units are the square of the original units ? Interquartile range is mainly used with skewed data (or data with outliers) ? standard deviation is the most commonly used measure of variation.
  • 52. . ? An outlier is an observation which does not appear to belong with the other data ? Outliers can arise because of a measurement or recording error or because of equipment failure during an experiment, etc. ? An outlier might be indicative of a sub- population, e.g. an abnormally low or high value in a medical test could indicate presence of an illness in the patient. Outliers
  • 53. Q & A 53 ? Thank you for your attention!