This chapter introduces key concepts in pattern classification and probability. It discusses how probability is used to model randomness and uncertainty. A random experiment is defined as a process with uncertain outcomes. The sample space contains all possible outcomes, and events are sets of outcomes. Probability is assigned to events based on axioms such as the sum of probabilities of a partition equals 1. Conditional probability and Bayes' theorem are introduced. Random variables map outcomes to numbers and have probability distributions like the normal distribution. The chapter concludes with an example of classifying fish using visual features to motivate subsequent chapters on pattern recognition systems, learning, and adaptation.
2. Pattern Classification, Chapter 1
2
Introduction
Probability is the study of randomness and uncertainty.
In the early days, probability was associated with games
of chance (gambling).
3. Pattern Classification, Chapter 1
3
Simple Games Involving Probability
Game: A fair die is rolled. If the result is 2, 3, or 4, you win
$1; if it is 5, you win $2; but if it is 1 or 6, you lose $3.
Should you play this game?
4. Pattern Classification, Chapter 1
4
Random Experiment
a random experiment is a process whose outcome is uncertain.
Examples:
Tossing a coin once or several times
Picking a card or cards from a deck
Measuring temperature of patients
...
5. Pattern Classification, Chapter 1
5
Sample Space
The sample space is the set of all possible outcomes.
Simple Events
The individual outcomes are called simple events.
Event
An event is any collection
of one or more simple events
Events & Sample Spaces
6. Pattern Classification, Chapter 1
6
Example
Experiment: Toss a coin 3 times.
Sample space
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Examples of events include
A = {HHH, HHT,HTH, THH}
= {at least two heads}
B = {HTT, THT,TTH}
= {exactly two tails.}
7. Pattern Classification, Chapter 1
7
Basic Concepts (from Set Theory)
The union of two events A and B, A B, is the event consisting of
all outcomes that are either in A or in B or in both events.
The complement of an event A, Ac
, is the set of all outcomes in that
are not in A.
The intersection of two events A and B, A B, is the event
consisting of all outcomes that are in both events.
When two events A and B have no outcomes in common, they are
said to be mutually exclusive, or disjoint, events.
8. Pattern Classification, Chapter 1
8
Example
Experiment: toss a coin 10 times and the number of heads is observed.
Let A = { 0, 2, 4, 6, 8, 10}.
B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.
A B= {0, 1, , 10} = .
A B contains no outcomes. So A and B are mutually exclusive.
Cc
= {6, 7, 8, 9, 10}, A C = {0, 2, 4}.
9. Pattern Classification, Chapter 1
9
Rules
Commutative Laws:
A B = B A, A B = B A
Associative Laws:
(A B) C = A (B C )
(A B) C = A (B C) .
Distributive Laws:
(A B) C = (A C) (B C)
(A B) C = (A C) (B C)
DeMorgans Laws:
.,
1111
c
i
n
i
c
i
n
i
c
i
n
i
c
i
n
i
AAAA ====
=錚
錚
錚
錚
錚
錚
=錚
錚
錚
錚
錚
錚
11. Pattern Classification, Chapter 1
11
Probability
A Probability is a number assigned to each subset (events) of a sample
space .
Probability distributions satisfy the following rules:
12. Pattern Classification, Chapter 1
12
Axioms of Probability
For any event A, 0 P(A) 1.
P() =1.
If A1, A2, An is a partition of A, then
P(A) = P(A1) + P(A2)+...+ P(An)
(A1, A2, An is called a partition of A if A1 A2 An = A and A1, A2,
An are mutually exclusive.)
13. Pattern Classification, Chapter 1
13
Properties of Probability
For any event A, P(Ac
) = 1 - P(A).
If A B, then P(A) P(B).
For any two events A and B,
P(A B) = P(A) + P(B) - P(A B).
For three events, A, B, and C,
P(ABC) = P(A) + P(B) + P(C) -
P(AB) - P(AC) - P(BC) + P(AB C).
14. Pattern Classification, Chapter 1
14
Example
In a certain population, 10% of the people are rich, 5% are famous,
and 3% are both rich and famous. A person is randomly selected
from this population. What is the chance that the person is
not rich?
rich but not famous?
either rich or famous?
15. Pattern Classification, Chapter 1
15
Intuitive Development (agrees with axioms)
Intuitively, the probability of an event a could be
defined as:
Where N(a) is the number that event a happens in n trialsWhere N(a) is the number that event a happens in n trials
17. Pattern Classification, Chapter 1
17
More Formal:
is the Sample Space:
Contains all possible outcomes of an experiment
in is a single outcome
A in is a set of outcomes of interest
18. Pattern Classification, Chapter 1
18
Independence
The probability of independent events A, B and C is
given by:
P(A,B,C) = P(A)P(B)P(C)
A and B are independent, if knowing that A has happenedA and B are independent, if knowing that A has happened
does not say anything about B happeningdoes not say anything about B happening
19. Pattern Classification, Chapter 1
19
Bayes Theorem
Provides a way to convert a-priori probabilities to a-
posteriori probabilities:
23. Pattern Classification, Chapter 1
23
Random Variables
A (scalar) random variable X is a function that maps
the outcome of a random event into real scalar
values
X(X())
24. Pattern Classification, Chapter 1
24
Random Variables Distributions
Cumulative Probability Distribution (CDF):
Probability Density Function (PDF):Probability Density Function (PDF):
30. Pattern Classification, Chapter 1
30
Mean Estimation from Samples
Given a set of N samples from a distribution, we can
estimate the mean of the distribution by:
31. Pattern Classification, Chapter 1
31
Variance Estimation from Samples
Given a set of N samples from a distribution, we can
estimate the variance of the distribution by:
33. Chapter 1: Introduction to Pattern
Recognition (Sections 1.1-1.6)
Machine Perception
An Example
Pattern Recognition Systems
The Design Cycle
Learning and Adaptation
Conclusion
34. Pattern Classification, Chapter 1
34
Machine Perception
Build a machine that can recognize patterns:
Speech recognition
Fingerprint identification
OCR (Optical Character Recognition)
DNA sequence identification
35. Pattern Classification, Chapter 1
35
An Example
Sorting incoming Fish on a conveyor according to
species using optical sensing
Sea bass
Species
Salmon
36. Pattern Classification, Chapter 1
36
Problem Analysis
Set up a camera and take some sample images to extract
features
Length
Lightness
Width
Number and shape of fins
Position of the mouth, etc
This is the set of all suggested features to explore for use in our
classifier!
37. Pattern Classification, Chapter 1
37
Preprocessing
Use a segmentation operation to isolate fishes from one
another and from the background
Information from a single fish is sent to a feature
extractor whose purpose is to reduce the data by
measuring certain features
The features are passed to a classifier
43. Pattern Classification, Chapter 1
43
Threshold decision boundary and cost relationship
Move our decision boundary toward smaller values of
lightness in order to minimize the cost (reduce the number
of sea bass that are classified salmon!)
Task of decision theory
46. Pattern Classification, Chapter 1
46
We might add other features that are not correlated
with the ones we already have. A precaution should
be taken not to reduce the performance by adding
noisy features
Ideally, the best decision boundary should be the one
which provides an optimal performance such as in the
following figure:
48. Pattern Classification, Chapter 1
48
However, our satisfaction is premature because
the central aim of designing a classifier is to
correctly classify novel input
Issue of generalization!