This document describes an algorithm for rough entropy based gene selection. It uses rough entropy to measure uncertainty in rough sets. The algorithm takes as input gene expression data containing genes and a class variable. It outputs a gene subset of size r. It works by first calculating a significance value for each gene based on rough entropy. Genes are then ranked by significance and added one by one to the subset, calculating rough entropy each time to minimize information loss until the subset reaches size r. An example is provided to demonstrate the step-by-step process.
1 of 22
More Related Content
Rough Entropy-Based Gene Selection
1. Rough Entropy Based
Gene Selection
Dr. E. N. Sathishkumar,
Guest Lecturer
Department of computer Science,
Periyar University,
Salem 11.
2. Definition - Rough Entropy
Rough entropy is an extend entropy to measure the
uncertainty in rough sets.
Information system IS = (U, A, V, f)
U - non-empty finite set of objects
A - non-empty finite set of attributes
For any B A, let IND(B) be the equivalence relation as
the form of U/IND(B) = {B1,B2, ...,Bm}
The rough entropy E(B) of equivalence relation
IND(B) is defined by
3. |Bi|/|U| - probability of any element x U being in
equivalence class Bi; 1<= i<=m.
|M| - the cardinality of set M.
In the above definition, for any B A, if U/IND(B) = {U},
then the rough entropy E(B) of equivalence relation IND(B)
achieves the maximum value log|U|.
if U/IND(B) = {{x} :x U}, then the rough entropy E(B) of
equivalence relation IND(B) achieves the minimum value 0.
4. ALGORITHM
Rough Maximum Significance Minimum Entropy
Input
Gene Expression Data contains n genes and a class
variable,
Gene = (gene1, gene2,.., genen)
D = (D1,D2,,Dm)
Output
Gene subset with r genes is denoted by
S= (gene1, gene2,.., gener)
5. Steps
Step 1 : S 甦
Step 2 : For i=1 to n do
Calculate Sgene(genei) according to the formula,
Step 3 : Rank by descending order
S1 = {SGene(gene1), SGene(gene2), .., SGene(genei)}
U
DposDpos
geneS igeneGeneGene
iGene
)()(
)(
}{
6. Step 4 : Choose gene from S1 , from the top
one to the last one and calculate Hs(D I Gene)
according to the formula,
)|(max
)|(
1)|(
GeneDH
GeneDH
GeneDH s
}){|(log)}{|()|(
1
ij
m
j
ij geneDPgeneDPGeneDH ワ
}{
}{
log
}{
}{}{
)|(max
1 U
U
U
U
U
U
GeneDH
j
m
j
j
ワ
}{|}{ iDUgeneUU
Here,
7. Step 5 : Rank the n numbers of H(D Gene) in S2 By
increasing order
Step 6 : While S <r do
For i= 1 to n do
If selected SGene(genei) from S1 and
H(D I Gene) from S2
Satisfy,
RE(genei)=(1-留)SGene(genei)+ 留Hs(D Gene)
Max RE(genei)
then
S S+{genei};
Gene Gene-{genei}
S S +1
end
end
18. max H(D gene4) = 5/8(3/5log3/5+ 2/5 log2/5)
= -0.4206
= 2.5730
4206.0
6616.0
1)4|(
geneDH s
)|(max
)|(
1)|(
GeneDH
GeneDH
GeneDH s
}{
}{
log
}{
}{}{
)|(max
1 U
U
U
U
U
U
GeneDH
j
m
j
j
ワ
19. Similarly for gene1, gene2, gene3
2726.2)1|( GeneDH s
3528.3)2|( GeneDH s
5730.2)3|( GeneDH s
20. Step 5 :
Rank the n numbers of H(D Gene) in S2 by
increasing order
S2={2.2726, 2.5730, 2.5730, 3.3528}
S2={gene1, gene4,gene3, gene2}