1) The document discusses descriptive statistics such as percentiles, quartiles, measures of center, and measures of spread. It provides examples and explanations of how to calculate and interpret these statistical concepts.
2) Percentiles such as the 25th and 75th percentiles are used to describe the quartiles of a data distribution. The median and mean are common measures of center. Standard deviation and variance are frequently used to quantify the spread of values around the mean.
3) Worked examples demonstrate how to find percentiles, quartiles, measures of center and spread, and determine outliers using calculations and technology. Interpreting these statistical results in the context of a problem
The document discusses various statistical measures used to describe data, including measures of central tendency (mean, median, mode) and measures of variability (range, variance, standard deviation, percentiles, quartiles). It provides examples of calculating each measure for sample data sets. It also discusses how data can be organized and displayed graphically using histograms, bar graphs, and other visualizations. The goal of descriptive statistics is to summarize key aspects of a data set, such as its central tendency and variability, which provides critical information for understanding the data.
The document provides an overview of descriptive statistics. It discusses how data can be qualitative (categorical) or quantitative and organized graphically or numerically. For qualitative data, common graphs are bar graphs and pie charts. These show the relative frequency of outcomes in different categories. For quantitative data, histograms are often used to show the frequency or relative frequency distribution of numeric values. The document gives examples of organizing both types of data into tables and converting them into graphical representations.
The document provides information about measures of central tendency (mean, median, mode) and measures of dispersion (range, quartiles, variance, standard deviation) using examples of data distributions. It defines key terms like mean, median, mode, range, quartiles, variance and standard deviation. It also shows how to calculate and interpret these measures of central tendency and dispersion using sample data sets.
Stat 130 chi-square goodnes-of-fit testAldrin Lozano
Ìý
- The chi-square goodness-of-fit test can be used to determine if a frequency distribution fits a specific pattern or theoretical distribution. It compares observed frequencies to expected frequencies.
- To perform the test, the chi-square statistic is calculated using the formula (O-E)^2/E, where O is the observed frequency and E is the expected frequency. This value is then compared to a critical value from the chi-square distribution based on the degrees of freedom.
- If the chi-square statistic exceeds the critical value, the null hypothesis that the observed and expected frequencies are the same is rejected, indicating a poor fit between the observed and expected distributions.
This document provides examples and explanations of various graphical methods for describing data, including frequency distributions, bar charts, pie charts, stem-and-leaf diagrams, histograms, and cumulative relative frequency plots. It demonstrates how to construct these graphs using sample data on student weights, grades, ages, and other examples. The goal is to help readers understand different ways to visually represent data distributions and patterns.
This document outlines key concepts related to constructing confidence intervals for estimating population means and proportions. It discusses how to calculate confidence intervals when the population standard deviation is known or unknown. Specifically, it provides the formulas and assumptions for constructing confidence intervals for a population mean using the normal and t-distributions. It also outlines how to calculate confidence intervals for a population proportion using the normal approximation. Examples are provided to demonstrate how to construct 95% confidence intervals for a mean and proportion based on sample data.
TSTD 6251Ìý Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
Ìý
TSTD 6251Ìý Fall 2014
SPSS Exercise and Assignment 1
20 Points
In this class, we are going to study descriptive summary statistics and learn how to construct box plot. We are still working with univariate variable for this exercise.
Practice Example:
Admission receipts (in million of dollars) for a recent season are given below for the
n =
30 major league baseball teams:
19.4ÌýÌýÌý 26.6ÌýÌýÌý 22.9ÌýÌýÌý 44.5ÌýÌýÌý 24.4ÌýÌýÌý 19.0ÌýÌýÌý 27.5ÌýÌýÌý 19.9ÌýÌýÌý 22.8ÌýÌýÌý 19.0ÌýÌýÌý 16.9ÌýÌýÌý 15.2ÌýÌýÌý 25.7ÌýÌýÌý 19.0ÌýÌýÌý 15.5ÌýÌýÌý 17.1ÌýÌýÌý 15.6Ìý ÌýÌý10.6ÌýÌýÌý 16.2ÌýÌýÌý 15.6ÌýÌýÌý 15.4ÌýÌýÌý 18.2ÌýÌýÌý 15.5ÌýÌýÌý 14.2ÌýÌýÌýÌýÌý 9.5ÌýÌýÌýÌýÌý 9.9
10.7ÌýÌýÌý 11.9ÌýÌýÌýÌý 26.7ÌýÌý 17.5
Require:
a.Ìý Compute the mean, variance and standard deviation.
b.Ìý Find the sample median, first quartile, and third quartile.
c.Ìý Construct a boxplot and interpret the distribution of the data.
d.Ìý Discuss the distribution of this set of data by examining kurtosis and skewness
ÌýÌýÌýÌý statistics, such as if the distribution is skewed to one side of the distribution, and if the
ÌýÌýÌýÌý distribution shows a peaked/skinny curve or a spread out/flat curve.
SPSS Procedures for Computing Summary Statistics
:
Enter the 30 data values in the first column of SPSS
Data View
Tab
Variable View
and name this variable
receipts
Adjust
Decimals
to 3 decimal points
Type
Admission Receipts
($ mn)
in the
Label
column for output viewer
Return to
Data View
and click
A
nalyze
on the menu bar
Click the second menu
D
e
scriptive Statistics
Click
F
requencies …
Move
Admission Receipts
to the
Variable(s)
list by clicking the arrow button
Click
S
tatistics …
button at the top of the dialog box
Now, you can select the descriptive statistics according to what the question requires.Ìý For this practice question, it requires central tendency, dispersion, percentile and distribution statistics, so we click all the boxes
except for
P
ercentile(s): and Va
l
ues are group midpoints
.
Click
Continue
to return to the
Frequencies
dialog box
Click
OK
to generate descriptive statistic output which is pasted below:
The first table provides summary statistics and the second table lists frequencies, relative frequencies and cumulative frequencies. The statistics required for solving this problem are highlighted in red.
Statistics
Admission Receipts
N
Valid
30
Missing
0
Mean
18.76333
Std. Error of Mean
1.278590
Median
17.30000
Mode
19.000
Std. Deviation
7.003127
Variance
49.043782
Skewness
1.734
Std. Error of Skewness
.427
Kurtosis
5.160
Std. Error of Kurtosis
.833
Range
35.000
Minimum
9.500
Maximum
44.500
Sum
562.900
Percentiles
10
10.61000
20
14.40000
25
15.35000
30
15.50000
40
15.84000
50
17.30000
60
19.00000
70
19.75000
75
22.82500
80
24.10000
90
26.69000
Admission Receipts
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
9.500
1
3.3
3.3
3.3
9.900
1
3.3
3.3
6.7
10.600
1
3.3
3.3
10.0
10.700
1
3.3
3.3
13.3
11.900
1
3.3
3.3
16.7
14.200
1
3.3
3.3
20.0
15.2.
PREDICTION OF TOOL WEAR USING ARTIFICIAL NEURAL NETWORK IN TURNING OF MILD STEELSourav Samanta
Ìý
1) The document describes using an artificial neural network (ANN) to predict tool wear in turning of mild steel.
2) Experimental data on tool wear was collected for 27 combinations of cutting speed, feed rate, and depth of cut.
3) An ANN model was developed with the three machining parameters as inputs and tool wear as the output. The model was trained on 70% of the experimental data.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 9: Inferences from Two Samples
9.3 Two Means, Two Dependent Samples, Matched Pairs
Evolutionary generation and degeneration of randomness to assess the independ...David F. Barrero
Ìý
The document examines the independence of tests in the Ent randomness test battery. An evolutionary algorithm was used to generate random numbers to reduce bias. The Ent tests were run on the generated numbers and statistics were stored. Analysis found several tests were correlated, including entropy, compression, chi-square and excess. While the tests provide useful information, using more than five statistics may overestimate results. Future work aims to focus on the uncorrelated tests and explore non-linear relationships.
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
Ìý
In this project I compare different Machine Learning Algorithm on different Text Mining Tasks.
ML algorithms: Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, Ordinal Regression as ML task
Tasks considered: Classifying Positive and Negative Reviews, Predicting Review Stars, Quantifying Sentiment Over Time, Detecting Fake Reviews
I am Anthony F. I am a Math Exam Helper at liveexamhelper.com. I hold a Masters' Degree in Maths, University of Cambridge, UK. I have been helping students with their exams for the past 9 years. You can hire me to take your exam in Math.
Visit liveexamhelper.com or email info@liveexamhelper.com.
You can also call on +1 678 648 4277 for any assistance with Math Exams.
I am Bianca H. I am a Statistics Assignment Expert at statisticsassignmenthelp.com. I hold a Master in Statistics from, the University of Nottingham, UK. I have been helping students with their assignments for the past 7 years. I solve assignments related to Statistics. Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignments.
Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.
The aim of this report is to use eigenvectors, eigenvalues, and orthogonality to understand the concept of Principal Component Analysis (PCA) and to show why PCA is useful.
The document describes various variable selection methods applied to predict violent crime rates using socioeconomic data from US cities. It analyzes a dataset with 95 variables and 807 observations on income, family structure, ethnicity, and other factors to predict violent crime rates. Several variable selection techniques are applied including forward selection, backward elimination, lasso, elastic net, best random subset selection (BRSS), decision trees, and random forests. BRSS, which approximates best subset selection, identified 15 variables as most predictive of violent crime and had strong out-of-sample performance. Analysis of 1000 training and test splits found that BRSS, random forests, and decision trees consistently outperformed other techniques in terms of out-of-sample predictive accuracy
The document describes various variable selection methods applied to predict violent crime rates using socioeconomic data from US cities. It analyzes a dataset with 95 variables and 807 observations, using several variable selection techniques to determine the most predictive factors of violent crime. These include best random subset selection (BRSS), which approximates best subset selection by randomly selecting variable combinations. BRSS identified factors like immigration, ethnicity, family structure, and income as best predicting violent crime rates. Model performance was evaluated using metrics like R2, and BRSS had strong out-of-sample prediction, outperforming some other common techniques.
This document provides an overview of risk management and quality control using statistical process control charts. It discusses [1] managing quality risk through control charts, [2] different types of risks including material, consequential, social, legal, and political risks, and [3] best practices for risk management including policies, methodologies, and resources. The document also covers control chart fundamentals, calculating control limits, identifying assignable causes, and process improvement.
This document provides lecture notes on statistics for analytical chemistry. It begins by recommending textbooks on the subject. It then discusses various applications of analytical chemistry and outlines the general analytical problem of selecting a sample, extracting and detecting analytes, and determining the reliability of results. The notes explain the concepts of errors, precision, accuracy, and how to quantify them using statistical measures like the mean, median, and standard deviation. It provides examples of how to calculate these measures and illustrates the differences between random and systematic errors. Finally, it discusses pooling data from multiple samples to better approximate the population standard deviation.
This document discusses descriptive statistics and numerical measures used to describe data sets. It introduces measures of central tendency including the mean, median, and mode. The mean is the average value calculated by summing all values and dividing by the number of values. The median is the middle value when values are arranged in order. The mode is the most frequently occurring value. The document also discusses measures of dispersion like range and standard deviation which describe how spread out the data is. Examples are provided to demonstrate calculating the mean, median and other descriptive statistics.
This document provides an introduction to business intelligence and data analytics. It discusses key concepts such as data sources, data warehouses, data marts, data mining, and data analytics. It also covers topics like univariate analysis, measures of dispersion, heterogeneity measures, confidence intervals, cross validation, and ROC curves. The document aims to introduce fundamental techniques and metrics used in business intelligence and data mining.
This document summarizes a study on the impact of scrambling techniques on the entropy of barcodes. The study tested barcodes with and without error correcting codes (ECC) using four scrambling methods and three entropy measures. Results showed that scrambling increased the entropy and randomness of barcodes that originally contained ECC, making it harder to detect the presence of ECC. However, the difference in entropy between scrambled ECC and non-ECC barcodes was small and not statistically significant. The study concluded that while entropy analysis can detect the presence of structure in barcodes, the methods tested were not effective at distinguishing scrambled ECC barcodes from purely random barcodes.
This document discusses evaluating meter test data that does not follow a normal distribution. It provides an overview of ANSI/ASQ Z1.9 sampling procedures and requirements for normal data. Non-normal data distributions are common for electronic and digital meter test results. Tools for assessing normality include Anderson-Darling tests and normal probability plots. If data is non-normal, transformations like Box-Cox and Johnson may be applied, but often do not work for meter data. Alternative statistical analyses may be needed for non-normal data.
This document discusses various statistical analysis concepts including error bars, mean, standard deviation, t-tests, and correlation. It provides definitions and formulas for calculating these values. For example, it defines error bars as showing the range or standard deviation of a data point, and standard deviation as a measure of how spread out values are from the mean. It also gives an example of using a t-test to determine if differences between two means are statistically significant or not. In the last section, it defines correlation as a measure of association between two variables but notes it does not imply one causes the other.
An Analysis of the Accuracies of the AC50 Estimates of Dose-Response Curve Mo...raym2sigmaxi
Ìý
The document analyzes the accuracy of two equations - an alternative model and a standard logistic 4-parameter model - in estimating the AC50 parameter from dose-response curve data simulated using a Hill equation. Simulations were run with increasing levels of normal error added to the Hill equation data. The average AC50 standard error from the simulations showed that at low levels of normal error, the standard logistic model was more accurate, but the alternative model became more accurate at higher error levels. A linear regression of the results indicated the point at which the alternative model would surpass the standard model in accuracy.
More Related Content
Similar to Power point chapter 2 sections 6 through 9 (20)
PREDICTION OF TOOL WEAR USING ARTIFICIAL NEURAL NETWORK IN TURNING OF MILD STEELSourav Samanta
Ìý
1) The document describes using an artificial neural network (ANN) to predict tool wear in turning of mild steel.
2) Experimental data on tool wear was collected for 27 combinations of cutting speed, feed rate, and depth of cut.
3) An ANN model was developed with the three machining parameters as inputs and tool wear as the output. The model was trained on 70% of the experimental data.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 9: Inferences from Two Samples
9.3 Two Means, Two Dependent Samples, Matched Pairs
Evolutionary generation and degeneration of randomness to assess the independ...David F. Barrero
Ìý
The document examines the independence of tests in the Ent randomness test battery. An evolutionary algorithm was used to generate random numbers to reduce bias. The Ent tests were run on the generated numbers and statistics were stored. Analysis found several tests were correlated, including entropy, compression, chi-square and excess. While the tests provide useful information, using more than five statistics may overestimate results. Future work aims to focus on the uncorrelated tests and explore non-linear relationships.
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
Ìý
In this project I compare different Machine Learning Algorithm on different Text Mining Tasks.
ML algorithms: Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, Ordinal Regression as ML task
Tasks considered: Classifying Positive and Negative Reviews, Predicting Review Stars, Quantifying Sentiment Over Time, Detecting Fake Reviews
I am Anthony F. I am a Math Exam Helper at liveexamhelper.com. I hold a Masters' Degree in Maths, University of Cambridge, UK. I have been helping students with their exams for the past 9 years. You can hire me to take your exam in Math.
Visit liveexamhelper.com or email info@liveexamhelper.com.
You can also call on +1 678 648 4277 for any assistance with Math Exams.
I am Bianca H. I am a Statistics Assignment Expert at statisticsassignmenthelp.com. I hold a Master in Statistics from, the University of Nottingham, UK. I have been helping students with their assignments for the past 7 years. I solve assignments related to Statistics. Visit statisticsassignmenthelp.com or email info@statisticsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignments.
Credit card fraud is a growing problem that affects card holders around the world. Fraud detection has been an interesting topic in machine learning. Nevertheless, current state of the art credit card fraud detection algorithms miss to include the real costs of credit card fraud as a measure to evaluate algorithms. In this paper a new comparison measure that realistically represents the monetary gains and losses due to fraud detection is proposed. Moreover, using the proposed cost measure a cost sensitive method based on Bayes minimum risk is presented. This method is compared with state of the art algorithms and shows improvements up to 23% measured by cost. The results of this paper are based on real life transactional data provided by a large European card processing company.
The aim of this report is to use eigenvectors, eigenvalues, and orthogonality to understand the concept of Principal Component Analysis (PCA) and to show why PCA is useful.
The document describes various variable selection methods applied to predict violent crime rates using socioeconomic data from US cities. It analyzes a dataset with 95 variables and 807 observations on income, family structure, ethnicity, and other factors to predict violent crime rates. Several variable selection techniques are applied including forward selection, backward elimination, lasso, elastic net, best random subset selection (BRSS), decision trees, and random forests. BRSS, which approximates best subset selection, identified 15 variables as most predictive of violent crime and had strong out-of-sample performance. Analysis of 1000 training and test splits found that BRSS, random forests, and decision trees consistently outperformed other techniques in terms of out-of-sample predictive accuracy
The document describes various variable selection methods applied to predict violent crime rates using socioeconomic data from US cities. It analyzes a dataset with 95 variables and 807 observations, using several variable selection techniques to determine the most predictive factors of violent crime. These include best random subset selection (BRSS), which approximates best subset selection by randomly selecting variable combinations. BRSS identified factors like immigration, ethnicity, family structure, and income as best predicting violent crime rates. Model performance was evaluated using metrics like R2, and BRSS had strong out-of-sample prediction, outperforming some other common techniques.
This document provides an overview of risk management and quality control using statistical process control charts. It discusses [1] managing quality risk through control charts, [2] different types of risks including material, consequential, social, legal, and political risks, and [3] best practices for risk management including policies, methodologies, and resources. The document also covers control chart fundamentals, calculating control limits, identifying assignable causes, and process improvement.
This document provides lecture notes on statistics for analytical chemistry. It begins by recommending textbooks on the subject. It then discusses various applications of analytical chemistry and outlines the general analytical problem of selecting a sample, extracting and detecting analytes, and determining the reliability of results. The notes explain the concepts of errors, precision, accuracy, and how to quantify them using statistical measures like the mean, median, and standard deviation. It provides examples of how to calculate these measures and illustrates the differences between random and systematic errors. Finally, it discusses pooling data from multiple samples to better approximate the population standard deviation.
This document discusses descriptive statistics and numerical measures used to describe data sets. It introduces measures of central tendency including the mean, median, and mode. The mean is the average value calculated by summing all values and dividing by the number of values. The median is the middle value when values are arranged in order. The mode is the most frequently occurring value. The document also discusses measures of dispersion like range and standard deviation which describe how spread out the data is. Examples are provided to demonstrate calculating the mean, median and other descriptive statistics.
This document provides an introduction to business intelligence and data analytics. It discusses key concepts such as data sources, data warehouses, data marts, data mining, and data analytics. It also covers topics like univariate analysis, measures of dispersion, heterogeneity measures, confidence intervals, cross validation, and ROC curves. The document aims to introduce fundamental techniques and metrics used in business intelligence and data mining.
This document summarizes a study on the impact of scrambling techniques on the entropy of barcodes. The study tested barcodes with and without error correcting codes (ECC) using four scrambling methods and three entropy measures. Results showed that scrambling increased the entropy and randomness of barcodes that originally contained ECC, making it harder to detect the presence of ECC. However, the difference in entropy between scrambled ECC and non-ECC barcodes was small and not statistically significant. The study concluded that while entropy analysis can detect the presence of structure in barcodes, the methods tested were not effective at distinguishing scrambled ECC barcodes from purely random barcodes.
This document discusses evaluating meter test data that does not follow a normal distribution. It provides an overview of ANSI/ASQ Z1.9 sampling procedures and requirements for normal data. Non-normal data distributions are common for electronic and digital meter test results. Tools for assessing normality include Anderson-Darling tests and normal probability plots. If data is non-normal, transformations like Box-Cox and Johnson may be applied, but often do not work for meter data. Alternative statistical analyses may be needed for non-normal data.
This document discusses various statistical analysis concepts including error bars, mean, standard deviation, t-tests, and correlation. It provides definitions and formulas for calculating these values. For example, it defines error bars as showing the range or standard deviation of a data point, and standard deviation as a measure of how spread out values are from the mean. It also gives an example of using a t-test to determine if differences between two means are statistically significant or not. In the last section, it defines correlation as a measure of association between two variables but notes it does not imply one causes the other.
An Analysis of the Accuracies of the AC50 Estimates of Dose-Response Curve Mo...raym2sigmaxi
Ìý
The document analyzes the accuracy of two equations - an alternative model and a standard logistic 4-parameter model - in estimating the AC50 parameter from dose-response curve data simulated using a Hill equation. Simulations were run with increasing levels of normal error added to the Hill equation data. The average AC50 standard error from the simulations showed that at low levels of normal error, the standard logistic model was more accurate, but the alternative model became more accurate at higher error levels. A linear regression of the results indicated the point at which the alternative model would surpass the standard model in accuracy.
An Analysis of the Accuracies of the AC50 Estimates of Dose-Response Curve Mo...raym2sigmaxi
Ìý
Power point chapter 2 sections 6 through 9
1. Chapter 2
D E S C R I P T I V E S TA T I S T I C S
SECTIONS 6-9
2. 2.6 Percentiles
ï‚— Quartiles are specific examples of percentiles. The first
quartile is the same as the 25th percentile and the third
quartile is the same as the 75th percentile.
ï‚— The nth percentile represents the value that is greater than or
equal to n% of the data.
3. ï‚— Jennifer just received the results
EXAMPLE of her SAT exams. Her SAT
Composite of 1710 is at the 73rd
Consider each of
the following
percentile. What does this mean?
statements about
percentiles.
ï‚— Suppose you received the highest
score on an exam. Your friend
scored the second-highest
score, yet you both were in the
99th percentile. How can this be?
4. Number
Frequency RF CRF
EXAMPLE of Tickets
0 6 0.08 0.08
The following data 1 18 0.24 0.32
set shows the
number of parking 2 12 0.16 0.48
tickets received.
3 11 0.15 0.63
4 9 0.12 0.75
5 6 0.08 0.83
6 5 0.07 0.90
7 4 0.05 0.95
8 2 0.03 0.98
9 1 0.01 0.99
10 1 0.01 1
5. ï‚— Find and interpret the 90th
EXAMPLE percentile.
The following data
set shows the ï‚— Find and interpret the 20th
number of parking
tickets received.
percentile.
ï‚— Find the first quartile, the
median, and the third quartile.
ï‚— Construct a box plot.
7. Number
Frequency RF CRF
EXAMPLE of Tickets
0 6 0.08 0.08
The following data 1 18 0.24 0.32
set shows the
number of parking 2 12 0.16 0.48
tickets received.
3 11 0.15 0.63
4 9 0.12 0.75
5 6 0.08 0.83
6 5 0.07 0.90
7 4 0.05 0.95
8 2 0.03 0.98
9 1 0.01 0.99
10 1 0.01 1
8. EXAMPLE ï‚— Find the inner quartile range of
the data set.
The following data
set shows the
number of parking
tickets received. ï‚— Do any of the data values appear
to be outliers
10. EXAMPLE
Find the mean 1. 4.5, 10, 1, 1, 9, 14, 4, 8.5, 6, 1, 9
median and mode of
the following data
set.
Use technology to
find statistical
information.
11. Number
Frequency RF CRF
EXAMPLE of Tickets
0 6 0.08 0.08
The following data 1 18 0.24 0.32
set shows the
number of parking 2 12 0.16 0.48
tickets received.
3 11 0.15 0.63
Find the mean, 4 9 0.12 0.75
median, and mode.
5 6 0.08 0.83
Use technology to 6 5 0.07 0.90
find statistical
information. 7 4 0.05 0.95
8 2 0.03 0.98
9 1 0.01 0.99
10 1 0.01 1
12. 2.9 Measures of Spread
ï‚— The final statistics we would like to be able to find are
measures that tell us how spread out the data is about the
mean.
ï‚— The two statistics that are most commonly used to measure
spread are standard deviation and variation.
ï‚— Standard deviation gives us another way to identify possible
outliers: a data value might be an outlier if it is more than
two standard deviations from the mean.
14. EXAMPLE
Find the standard 1. 4.5, 10, 1, 1, 9, 17, 4, 8.5, 5, 1, 9
deviation and
variance of the data
set assuming that it
is a sample.
Use standard
deviation to
determine if any
values are possible
outliers.
Use technology to
find statistical
values.
15. Number
Frequency RF CRF
EXAMPLE of Tickets
0 6 0.08 0.08
The following data 1 18 0.24 0.32
set shows the
number of parking 2 12 0.16 0.48
tickets received.
3 11 0.15 0.63
Find the standard
deviation and 4 9 0.12 0.75
variance of the data 5 6 0.08 0.83
set assuming that it
is a sample. 6 5 0.07 0.90
Use standard 7 4 0.05 0.95
deviation to
determine if any
8 2 0.03 0.98
values are possible 9 1 0.01 0.99
outliers.
10 1 0.01 1
16. In 2000 the mean age of a sample of females
Example in the U.S. population was 37.8 years with a
standard deviation of 21.8 years and the mean
age of a sample of males was 35.3 with a
standard deviation of 18.4 years.
In relation to the rest of their sex, which is
older, a 48 year old woman or a 45 year old
man?
27. Characterizing a Data Distribution
Example: For each distribution described below, discuss the
number of peaks, symmetry, and amount of variation you
would expect to find.
- The salaries of actors/actresses.
- The number of vacations taken each year.
- The weights of calculators stored in the math library – half
are graphing calculators and half are scientific calculators.
28. HOMEWORK
2.13 #s 4a, b, c, 7, 10, 12, 13a, b, d, e, f, also construct a line
graph for the data from Publisher A and Publisher B, 16a part
i and iii, 16b, 21, 29, 30, 31