際際滷

際際滷Share a Scribd company logo
Pattern Classification, Chapter 1
1
Basic Probability
Pattern Classification, Chapter 1
2
Introduction
 Probability is the study of randomness and uncertainty.
 In the early days, probability was associated with games
of chance (gambling).
Pattern Classification, Chapter 1
3
Simple Games Involving Probability
Game: A fair die is rolled. If the result is 2, 3, or 4, you win
$1; if it is 5, you win $2; but if it is 1 or 6, you lose $3.
Should you play this game?
Pattern Classification, Chapter 1
4
Random Experiment
 a random experiment is a process whose outcome is uncertain.
Examples:
 Tossing a coin once or several times
 Picking a card or cards from a deck
 Measuring temperature of patients
 ...
Pattern Classification, Chapter 1
5
Sample Space
The sample space is the set of all possible outcomes.
Simple Events
The individual outcomes are called simple events.
Event
An event is any collection
of one or more simple events
Events & Sample Spaces
Pattern Classification, Chapter 1
6
Example
Experiment: Toss a coin 3 times.
 Sample space 
  = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
 Examples of events include
 A = {HHH, HHT,HTH, THH}
= {at least two heads}
 B = {HTT, THT,TTH}
= {exactly two tails.}
Pattern Classification, Chapter 1
7
Basic Concepts (from Set Theory)
 The union of two events A and B, A  B, is the event consisting of
all outcomes that are either in A or in B or in both events.
 The complement of an event A, Ac
, is the set of all outcomes in  that
are not in A.
 The intersection of two events A and B, A  B, is the event
consisting of all outcomes that are in both events.
 When two events A and B have no outcomes in common, they are
said to be mutually exclusive, or disjoint, events.
Pattern Classification, Chapter 1
8
Example
Experiment: toss a coin 10 times and the number of heads is observed.
 Let A = { 0, 2, 4, 6, 8, 10}.
 B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.
 A  B= {0, 1, , 10} = .
 A  B contains no outcomes. So A and B are mutually exclusive.
 Cc
= {6, 7, 8, 9, 10}, A  C = {0, 2, 4}.
Pattern Classification, Chapter 1
9
Rules
 Commutative Laws:
 A  B = B  A, A  B = B  A
 Associative Laws:
 (A  B)  C = A  (B  C )
 (A  B)  C = A  (B  C) .
 Distributive Laws:
 (A  B)  C = (A  C)  (B  C)
 (A  B)  C = (A  C)  (B  C)
 DeMorgans Laws:
.,
1111
c
i
n
i
c
i
n
i
c
i
n
i
c
i
n
i
AAAA  ====
=錚
錚
錚
錚
錚
錚
=錚
錚
錚
錚
錚
錚
Pattern Classification, Chapter 1
10
Venn Diagram

A BAB
Pattern Classification, Chapter 1
11
Probability
 A Probability is a number assigned to each subset (events) of a sample
space .
 Probability distributions satisfy the following rules:
Pattern Classification, Chapter 1
12
Axioms of Probability
 For any event A, 0  P(A)  1.
 P() =1.
 If A1, A2,  An is a partition of A, then
P(A) = P(A1) + P(A2)+...+ P(An)
(A1, A2,  An is called a partition of A if A1  A2 An = A and A1, A2, 
An are mutually exclusive.)
Pattern Classification, Chapter 1
13
Properties of Probability
 For any event A, P(Ac
) = 1 - P(A).
 If A  B, then P(A)  P(B).
 For any two events A and B,
P(A  B) = P(A) + P(B) - P(A  B).
For three events, A, B, and C,
P(ABC) = P(A) + P(B) + P(C) -
P(AB) - P(AC) - P(BC) + P(AB C).
Pattern Classification, Chapter 1
14
Example
 In a certain population, 10% of the people are rich, 5% are famous,
and 3% are both rich and famous. A person is randomly selected
from this population. What is the chance that the person is
 not rich?
 rich but not famous?
 either rich or famous?
Pattern Classification, Chapter 1
15
Intuitive Development (agrees with axioms)
 Intuitively, the probability of an event a could be
defined as:
Where N(a) is the number that event a happens in n trialsWhere N(a) is the number that event a happens in n trials
Here We Go Again: Not So Basic
Probability
Pattern Classification, Chapter 1
17
More Formal:
  is the Sample Space:
 Contains all possible outcomes of an experiment
  in  is a single outcome
 A in  is a set of outcomes of interest
Pattern Classification, Chapter 1
18
Independence
 The probability of independent events A, B and C is
given by:
P(A,B,C) = P(A)P(B)P(C)
A and B are independent, if knowing that A has happenedA and B are independent, if knowing that A has happened
does not say anything about B happeningdoes not say anything about B happening
Pattern Classification, Chapter 1
19
Bayes Theorem
 Provides a way to convert a-priori probabilities to a-
posteriori probabilities:
Pattern Classification, Chapter 1
20
Conditional Probability
 One of the most useful concepts!
AA
BB
Pattern Classification, Chapter 1
21
Bayes Theorem
 Provides a way to convert a-priori probabilities to a-
posteriori probabilities:
Pattern Classification, Chapter 1
22
Using Partitions:
 If events Ai are mutually exclusive and partition
Pattern Classification, Chapter 1
23
Random Variables
 A (scalar) random variable X is a function that maps
the outcome of a random event into real scalar
values

 X(X())
Pattern Classification, Chapter 1
24
Random Variables Distributions
 Cumulative Probability Distribution (CDF):
 Probability Density Function (PDF):Probability Density Function (PDF):
Pattern Classification, Chapter 1
25
Random Distributions:
 From the two previous equations:
Pattern Classification, Chapter 1
26
Uniform Distribution
 A R.V. X that is uniformly distributed between x1 and
x2 has density function:
XX11 XX22
Pattern Classification, Chapter 1
27
Gaussian (Normal) Distribution
 A R.V. X that is normally distributed has density
function:
袖袖
Pattern Classification, Chapter 1
28
Statistical Characterizations
 Expectation (Mean Value, First Moment):
Second Moment:Second Moment:
Pattern Classification, Chapter 1
29
Statistical Characterizations
 Variance of X:
 Standard Deviation of X:
Pattern Classification, Chapter 1
30
Mean Estimation from Samples
 Given a set of N samples from a distribution, we can
estimate the mean of the distribution by:
Pattern Classification, Chapter 1
31
Variance Estimation from Samples
 Given a set of N samples from a distribution, we can
estimate the variance of the distribution by:
Pattern
Classification
Chapter 1: Introduction to Pattern
Recognition (Sections 1.1-1.6)
 Machine Perception
 An Example
 Pattern Recognition Systems
 The Design Cycle
 Learning and Adaptation
 Conclusion
Pattern Classification, Chapter 1
34
Machine Perception
 Build a machine that can recognize patterns:
 Speech recognition
 Fingerprint identification
 OCR (Optical Character Recognition)
 DNA sequence identification
Pattern Classification, Chapter 1
35
An Example
 Sorting incoming Fish on a conveyor according to
species using optical sensing
Sea bass
Species
Salmon
Pattern Classification, Chapter 1
36
 Problem Analysis
 Set up a camera and take some sample images to extract
features
 Length
 Lightness
 Width
 Number and shape of fins
 Position of the mouth, etc
 This is the set of all suggested features to explore for use in our
classifier!
Pattern Classification, Chapter 1
37
 Preprocessing
 Use a segmentation operation to isolate fishes from one
another and from the background
 Information from a single fish is sent to a feature
extractor whose purpose is to reduce the data by
measuring certain features
 The features are passed to a classifier
Pattern Classification, Chapter 1
38
Pattern Classification, Chapter 1
39
 Classification
 Select the length of the fish as a possible feature for
discrimination
Pattern Classification, Chapter 1
40
Pattern Classification, Chapter 1
41
The length is a poor feature alone!
Select the lightness as a possible feature.
Pattern Classification, Chapter 1
42
Pattern Classification, Chapter 1
43
 Threshold decision boundary and cost relationship
 Move our decision boundary toward smaller values of
lightness in order to minimize the cost (reduce the number
of sea bass that are classified salmon!)
Task of decision theory
Pattern Classification, Chapter 1
44
 Adopt the lightness and add the width of the fish
Fish xT
= [x1, x2]
Lightness Width
Pattern Classification, Chapter 1
45
Pattern Classification, Chapter 1
46
 We might add other features that are not correlated
with the ones we already have. A precaution should
be taken not to reduce the performance by adding
noisy features
 Ideally, the best decision boundary should be the one
which provides an optimal performance such as in the
following figure:
Pattern Classification, Chapter 1
47
Pattern Classification, Chapter 1
48
 However, our satisfaction is premature because
the central aim of designing a classifier is to
correctly classify novel input
Issue of generalization!
Pattern Classification, Chapter 1
49

More Related Content

Lecture 3

  • 1. Pattern Classification, Chapter 1 1 Basic Probability
  • 2. Pattern Classification, Chapter 1 2 Introduction Probability is the study of randomness and uncertainty. In the early days, probability was associated with games of chance (gambling).
  • 3. Pattern Classification, Chapter 1 3 Simple Games Involving Probability Game: A fair die is rolled. If the result is 2, 3, or 4, you win $1; if it is 5, you win $2; but if it is 1 or 6, you lose $3. Should you play this game?
  • 4. Pattern Classification, Chapter 1 4 Random Experiment a random experiment is a process whose outcome is uncertain. Examples: Tossing a coin once or several times Picking a card or cards from a deck Measuring temperature of patients ...
  • 5. Pattern Classification, Chapter 1 5 Sample Space The sample space is the set of all possible outcomes. Simple Events The individual outcomes are called simple events. Event An event is any collection of one or more simple events Events & Sample Spaces
  • 6. Pattern Classification, Chapter 1 6 Example Experiment: Toss a coin 3 times. Sample space = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Examples of events include A = {HHH, HHT,HTH, THH} = {at least two heads} B = {HTT, THT,TTH} = {exactly two tails.}
  • 7. Pattern Classification, Chapter 1 7 Basic Concepts (from Set Theory) The union of two events A and B, A B, is the event consisting of all outcomes that are either in A or in B or in both events. The complement of an event A, Ac , is the set of all outcomes in that are not in A. The intersection of two events A and B, A B, is the event consisting of all outcomes that are in both events. When two events A and B have no outcomes in common, they are said to be mutually exclusive, or disjoint, events.
  • 8. Pattern Classification, Chapter 1 8 Example Experiment: toss a coin 10 times and the number of heads is observed. Let A = { 0, 2, 4, 6, 8, 10}. B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}. A B= {0, 1, , 10} = . A B contains no outcomes. So A and B are mutually exclusive. Cc = {6, 7, 8, 9, 10}, A C = {0, 2, 4}.
  • 9. Pattern Classification, Chapter 1 9 Rules Commutative Laws: A B = B A, A B = B A Associative Laws: (A B) C = A (B C ) (A B) C = A (B C) . Distributive Laws: (A B) C = (A C) (B C) (A B) C = (A C) (B C) DeMorgans Laws: ., 1111 c i n i c i n i c i n i c i n i AAAA ==== =錚 錚 錚 錚 錚 錚 =錚 錚 錚 錚 錚 錚
  • 10. Pattern Classification, Chapter 1 10 Venn Diagram A BAB
  • 11. Pattern Classification, Chapter 1 11 Probability A Probability is a number assigned to each subset (events) of a sample space . Probability distributions satisfy the following rules:
  • 12. Pattern Classification, Chapter 1 12 Axioms of Probability For any event A, 0 P(A) 1. P() =1. If A1, A2, An is a partition of A, then P(A) = P(A1) + P(A2)+...+ P(An) (A1, A2, An is called a partition of A if A1 A2 An = A and A1, A2, An are mutually exclusive.)
  • 13. Pattern Classification, Chapter 1 13 Properties of Probability For any event A, P(Ac ) = 1 - P(A). If A B, then P(A) P(B). For any two events A and B, P(A B) = P(A) + P(B) - P(A B). For three events, A, B, and C, P(ABC) = P(A) + P(B) + P(C) - P(AB) - P(AC) - P(BC) + P(AB C).
  • 14. Pattern Classification, Chapter 1 14 Example In a certain population, 10% of the people are rich, 5% are famous, and 3% are both rich and famous. A person is randomly selected from this population. What is the chance that the person is not rich? rich but not famous? either rich or famous?
  • 15. Pattern Classification, Chapter 1 15 Intuitive Development (agrees with axioms) Intuitively, the probability of an event a could be defined as: Where N(a) is the number that event a happens in n trialsWhere N(a) is the number that event a happens in n trials
  • 16. Here We Go Again: Not So Basic Probability
  • 17. Pattern Classification, Chapter 1 17 More Formal: is the Sample Space: Contains all possible outcomes of an experiment in is a single outcome A in is a set of outcomes of interest
  • 18. Pattern Classification, Chapter 1 18 Independence The probability of independent events A, B and C is given by: P(A,B,C) = P(A)P(B)P(C) A and B are independent, if knowing that A has happenedA and B are independent, if knowing that A has happened does not say anything about B happeningdoes not say anything about B happening
  • 19. Pattern Classification, Chapter 1 19 Bayes Theorem Provides a way to convert a-priori probabilities to a- posteriori probabilities:
  • 20. Pattern Classification, Chapter 1 20 Conditional Probability One of the most useful concepts! AA BB
  • 21. Pattern Classification, Chapter 1 21 Bayes Theorem Provides a way to convert a-priori probabilities to a- posteriori probabilities:
  • 22. Pattern Classification, Chapter 1 22 Using Partitions: If events Ai are mutually exclusive and partition
  • 23. Pattern Classification, Chapter 1 23 Random Variables A (scalar) random variable X is a function that maps the outcome of a random event into real scalar values X(X())
  • 24. Pattern Classification, Chapter 1 24 Random Variables Distributions Cumulative Probability Distribution (CDF): Probability Density Function (PDF):Probability Density Function (PDF):
  • 25. Pattern Classification, Chapter 1 25 Random Distributions: From the two previous equations:
  • 26. Pattern Classification, Chapter 1 26 Uniform Distribution A R.V. X that is uniformly distributed between x1 and x2 has density function: XX11 XX22
  • 27. Pattern Classification, Chapter 1 27 Gaussian (Normal) Distribution A R.V. X that is normally distributed has density function: 袖袖
  • 28. Pattern Classification, Chapter 1 28 Statistical Characterizations Expectation (Mean Value, First Moment): Second Moment:Second Moment:
  • 29. Pattern Classification, Chapter 1 29 Statistical Characterizations Variance of X: Standard Deviation of X:
  • 30. Pattern Classification, Chapter 1 30 Mean Estimation from Samples Given a set of N samples from a distribution, we can estimate the mean of the distribution by:
  • 31. Pattern Classification, Chapter 1 31 Variance Estimation from Samples Given a set of N samples from a distribution, we can estimate the variance of the distribution by:
  • 33. Chapter 1: Introduction to Pattern Recognition (Sections 1.1-1.6) Machine Perception An Example Pattern Recognition Systems The Design Cycle Learning and Adaptation Conclusion
  • 34. Pattern Classification, Chapter 1 34 Machine Perception Build a machine that can recognize patterns: Speech recognition Fingerprint identification OCR (Optical Character Recognition) DNA sequence identification
  • 35. Pattern Classification, Chapter 1 35 An Example Sorting incoming Fish on a conveyor according to species using optical sensing Sea bass Species Salmon
  • 36. Pattern Classification, Chapter 1 36 Problem Analysis Set up a camera and take some sample images to extract features Length Lightness Width Number and shape of fins Position of the mouth, etc This is the set of all suggested features to explore for use in our classifier!
  • 37. Pattern Classification, Chapter 1 37 Preprocessing Use a segmentation operation to isolate fishes from one another and from the background Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features The features are passed to a classifier
  • 39. Pattern Classification, Chapter 1 39 Classification Select the length of the fish as a possible feature for discrimination
  • 41. Pattern Classification, Chapter 1 41 The length is a poor feature alone! Select the lightness as a possible feature.
  • 43. Pattern Classification, Chapter 1 43 Threshold decision boundary and cost relationship Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory
  • 44. Pattern Classification, Chapter 1 44 Adopt the lightness and add the width of the fish Fish xT = [x1, x2] Lightness Width
  • 46. Pattern Classification, Chapter 1 46 We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding noisy features Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:
  • 48. Pattern Classification, Chapter 1 48 However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

Editor's Notes