際際滷

際際滷Share a Scribd company logo
Rough Entropy Based
Gene Selection
Dr. E. N. Sathishkumar,
Guest Lecturer
Department of computer Science,
Periyar University,
Salem  11.
Definition - Rough Entropy
 Rough entropy is an extend entropy to measure the
uncertainty in rough sets.
 Information system IS = (U, A, V, f)
 U - non-empty finite set of objects
 A - non-empty finite set of attributes
 For any B  A, let IND(B) be the equivalence relation as
the form of U/IND(B) = {B1,B2, ...,Bm}
 The rough entropy E(B) of equivalence relation
IND(B) is defined by
 |Bi|/|U| - probability of any element x  U being in
equivalence class Bi; 1<= i<=m.
 |M| - the cardinality of set M.
 In the above definition, for any B  A, if U/IND(B) = {U},
then the rough entropy E(B) of equivalence relation IND(B)
achieves the maximum value log|U|.
 if U/IND(B) = {{x} :x  U}, then the rough entropy E(B) of
equivalence relation IND(B) achieves the minimum value 0.
ALGORITHM
Rough Maximum Significance Minimum Entropy
 Input
Gene Expression Data contains n genes and a class
variable,
Gene = (gene1, gene2,.., genen)
D = (D1,D2,,Dm)
 Output
Gene subset with r genes is denoted by
S= (gene1, gene2,.., gener)
Steps
 Step 1 : S 甦
 Step 2 : For i=1 to n do
Calculate Sgene(genei) according to the formula,
 Step 3 : Rank by descending order
S1 = {SGene(gene1), SGene(gene2), .., SGene(genei)}
U
DposDpos
geneS igeneGeneGene
iGene
)()(
)(
}{
 Step 4 : Choose gene from S1 , from the top
one to the last one and calculate Hs(D I Gene)
according to the formula,
)|(max
)|(
1)|(
GeneDH
GeneDH
GeneDH s

}){|(log)}{|()|(
1
ij
m
j
ij geneDPgeneDPGeneDH ワ

}{
}{
log
}{
}{}{
)|(max
1 U
U
U
U
U
U
GeneDH
j
m
j
j
ワ

}{|}{ iDUgeneUU 
Here,
 Step 5 : Rank the n numbers of H(D  Gene) in S2 By
increasing order
 Step 6 : While  S  <r do
For i= 1 to n do
If selected SGene(genei) from S1 and
H(D I Gene) from S2
Satisfy,
RE(genei)=(1-留)SGene(genei)+ 留Hs(D  Gene)
Max RE(genei)
then
S S+{genei};
Gene Gene-{genei}
 S    S  +1
end
end
Example
U gene1 gene2 gene3 gene4 D
0 1 0 2 2 0
1 0 1 1 1 1
2 2 0 0 1 0
3 1 1 0 2 1
4 1 0 2 0 0
5 2 2 0 1 0
6 2 1 1 1 1
7 0 1 1 0 0
 Step 1 : S 甦
 Step 2 : for i=1 to 4
Here,
 U  = {0,1,2,3,4,5,6,7} 
=8
U
DposDpos
geneS igeneGeneGene
iGene
)()(
)(
}{
INDISCERNIBILITY
 IND(gene1) = {{0,3,4}, {1,7}, {2,5,6}}
 IND(gene2) = {{0,2,4}, {1,3,6,7}, {5}}
 IND(gene3) = {{2,3,5}, {1,6,7}, {0,4}}
 IND(gene4) = {{4,7}, {1,2,5,6}, {0,3}}

U gene1 gene2 gene3 gene4
0 1 0 2 2
1 0 1 1 1
2 2 0 0 1
3 1 1 0 2
4 1 0 2 0
5 2 2 0 1
6 2 1 1 1
7 0 1 1 0
 Find,  POSGene(D) 
Gene = {gene1, gene2, gene3, gene4}
IND(gene1, gene2, gene3, gene4)={{0}{4}{1}{7}{3}{2}{5}{6}}
IND(D)={{0,2,4,5,7}{1,3,6}}
POSGene (D) = {{0}{1}{2}{3}{4}{5}{6}{7}}
 POSGene (D)  = 8
U D
0 0
1 1
2 0
3 1
4 0
5 0
6 1
7 0
 gene1
Find, POSGene - {gene1}(D)
IND(gene2,gene3,gene4) = {{0}{2}{4}{1,6}{7}{3}{5}}
POSGene - {gene1}(D) = {{0}{2}{4}{1,6}{7}{3}{5}}
POSGene - {gene1}(D) =7
SGene(gene1) = (8-7)/8 = 1/8 = 0.125
U
DposDpos
geneS igeneGeneGene
iGene
)()(
)(
}{
SGene(gene1) = 0.125
Similarly for
 i=2, gene2
SGene(gene2) = 0.125
 i=3, gene3
SGene(gene3) =0
 i=4, gene4
SGene(gene4) =0.375
 Step 3 : rank S1 = {SGene(gene1), SGene(gene2),
.., SGene(genei)} by desending order
 Sgene(gene1)= 0.125
 Sgene(gene2)=0.125
 Sgene(gene3)=0
 Sgene(gene4)=0.375
S1={gene4, gene1,gene2, gene3}
 Step 4 : Choose gene from S1 from the top
one to the last one
= -(5/8 log5/8+ 3/8 log3/8)
H(D  Gene) = 0.6616
}){|(log)}{|()|(
1
ij
m
j
ij geneDPgeneDPGeneDH ワ

U
gene
1
gene
2
gene
3
gene
4
D
0 1 0 2 2 0
1 0 1 1 1 1
2 2 0 0 1 0
3 1 1 0 2 1
4 1 0 2 0 0
5 2 2 0 1 0
6 2 1 1 1 1
7 0 1 1 0 0
 S1 top= gene4,
U {D Ugene4}={U}
{U} = {{0}{2,5}{1,6}{3}{4,7}}
 {U} /U=5/8
}{
}{
log
}{
}{}{
)|(max
1 U
U
U
U
U
U
GeneDH
j
m
j
j
ワ

U gene4 D
0 2 0
1 1 1
2 1 0
3 2 1
4 0 0
5 1 0
6 1 1
7 0 0
Find, {Uj} where j= 1 to m
D=0
 {U0} = {{0}{2,5}{4,7}} 
= 3
D=1
{U1}  = {{1,6}{3}} 
=2
gene4 D
2 0
1 1
1 0
2 1
0 0
1 0
1 1
0 0
gene4 D
2 0
1 1
1 0
2 1
0 0
1 0
1 1
0 0
{U0} {U1}
max H(D  gene4) = 5/8(3/5log3/5+ 2/5 log2/5)
= -0.4206
= 2.5730
4206.0
6616.0
1)4|(

geneDH s
)|(max
)|(
1)|(
GeneDH
GeneDH
GeneDH s

}{
}{
log
}{
}{}{
)|(max
1 U
U
U
U
U
U
GeneDH
j
m
j
j
ワ
 Similarly for gene1, gene2, gene3
2726.2)1|( GeneDH s
3528.3)2|( GeneDH s
5730.2)3|( GeneDH s
 Step 5 :
Rank the n numbers of H(D  Gene) in S2 by
increasing order
S2={2.2726, 2.5730, 2.5730, 3.3528}
S2={gene1, gene4,gene3, gene2}
 Step 6 :
SGene(genei) from S1
S1={gene4, gene1,gene2, gene3}
Hs (D  Gene) from S2
S2={gene1, gene4,gene3, gene2}
if Satisfies,
RE(genei)=(1-留)SGene(genei)+ 留Hs(D  Gene)
RE(gene4) = (1-0.95)*(0.375)+(0.95*2.2726) = 2.17772
RE(gene1) = (1-0.95)*(0.125)+(0.95*2.2726) = 2.16522
RE(gene2) = (1-0.95)*(0.125)+(0.95*2.2726) = 2.16522
RE(gene3) = (1-0.95)*(0)+(0.95*2.2726) = 2.15897
RE(genei)={gene4, gene1,gene2, gene3}
Here,
Max RE(genei) = gene4
then
S S+{gene4};
Now Selected Gene is,
S = gene4

More Related Content

Rough Entropy-Based Gene Selection

  • 1. Rough Entropy Based Gene Selection Dr. E. N. Sathishkumar, Guest Lecturer Department of computer Science, Periyar University, Salem 11.
  • 2. Definition - Rough Entropy Rough entropy is an extend entropy to measure the uncertainty in rough sets. Information system IS = (U, A, V, f) U - non-empty finite set of objects A - non-empty finite set of attributes For any B A, let IND(B) be the equivalence relation as the form of U/IND(B) = {B1,B2, ...,Bm} The rough entropy E(B) of equivalence relation IND(B) is defined by
  • 3. |Bi|/|U| - probability of any element x U being in equivalence class Bi; 1<= i<=m. |M| - the cardinality of set M. In the above definition, for any B A, if U/IND(B) = {U}, then the rough entropy E(B) of equivalence relation IND(B) achieves the maximum value log|U|. if U/IND(B) = {{x} :x U}, then the rough entropy E(B) of equivalence relation IND(B) achieves the minimum value 0.
  • 4. ALGORITHM Rough Maximum Significance Minimum Entropy Input Gene Expression Data contains n genes and a class variable, Gene = (gene1, gene2,.., genen) D = (D1,D2,,Dm) Output Gene subset with r genes is denoted by S= (gene1, gene2,.., gener)
  • 5. Steps Step 1 : S 甦 Step 2 : For i=1 to n do Calculate Sgene(genei) according to the formula, Step 3 : Rank by descending order S1 = {SGene(gene1), SGene(gene2), .., SGene(genei)} U DposDpos geneS igeneGeneGene iGene )()( )( }{
  • 6. Step 4 : Choose gene from S1 , from the top one to the last one and calculate Hs(D I Gene) according to the formula, )|(max )|( 1)|( GeneDH GeneDH GeneDH s }){|(log)}{|()|( 1 ij m j ij geneDPgeneDPGeneDH ワ }{ }{ log }{ }{}{ )|(max 1 U U U U U U GeneDH j m j j ワ }{|}{ iDUgeneUU Here,
  • 7. Step 5 : Rank the n numbers of H(D Gene) in S2 By increasing order Step 6 : While S <r do For i= 1 to n do If selected SGene(genei) from S1 and H(D I Gene) from S2 Satisfy, RE(genei)=(1-留)SGene(genei)+ 留Hs(D Gene) Max RE(genei) then S S+{genei}; Gene Gene-{genei} S S +1 end end
  • 8. Example U gene1 gene2 gene3 gene4 D 0 1 0 2 2 0 1 0 1 1 1 1 2 2 0 0 1 0 3 1 1 0 2 1 4 1 0 2 0 0 5 2 2 0 1 0 6 2 1 1 1 1 7 0 1 1 0 0
  • 9. Step 1 : S 甦 Step 2 : for i=1 to 4 Here, U = {0,1,2,3,4,5,6,7} =8 U DposDpos geneS igeneGeneGene iGene )()( )( }{
  • 10. INDISCERNIBILITY IND(gene1) = {{0,3,4}, {1,7}, {2,5,6}} IND(gene2) = {{0,2,4}, {1,3,6,7}, {5}} IND(gene3) = {{2,3,5}, {1,6,7}, {0,4}} IND(gene4) = {{4,7}, {1,2,5,6}, {0,3}} U gene1 gene2 gene3 gene4 0 1 0 2 2 1 0 1 1 1 2 2 0 0 1 3 1 1 0 2 4 1 0 2 0 5 2 2 0 1 6 2 1 1 1 7 0 1 1 0
  • 11. Find, POSGene(D) Gene = {gene1, gene2, gene3, gene4} IND(gene1, gene2, gene3, gene4)={{0}{4}{1}{7}{3}{2}{5}{6}} IND(D)={{0,2,4,5,7}{1,3,6}} POSGene (D) = {{0}{1}{2}{3}{4}{5}{6}{7}} POSGene (D) = 8 U D 0 0 1 1 2 0 3 1 4 0 5 0 6 1 7 0
  • 12. gene1 Find, POSGene - {gene1}(D) IND(gene2,gene3,gene4) = {{0}{2}{4}{1,6}{7}{3}{5}} POSGene - {gene1}(D) = {{0}{2}{4}{1,6}{7}{3}{5}} POSGene - {gene1}(D) =7 SGene(gene1) = (8-7)/8 = 1/8 = 0.125 U DposDpos geneS igeneGeneGene iGene )()( )( }{
  • 13. SGene(gene1) = 0.125 Similarly for i=2, gene2 SGene(gene2) = 0.125 i=3, gene3 SGene(gene3) =0 i=4, gene4 SGene(gene4) =0.375
  • 14. Step 3 : rank S1 = {SGene(gene1), SGene(gene2), .., SGene(genei)} by desending order Sgene(gene1)= 0.125 Sgene(gene2)=0.125 Sgene(gene3)=0 Sgene(gene4)=0.375 S1={gene4, gene1,gene2, gene3}
  • 15. Step 4 : Choose gene from S1 from the top one to the last one = -(5/8 log5/8+ 3/8 log3/8) H(D Gene) = 0.6616 }){|(log)}{|()|( 1 ij m j ij geneDPgeneDPGeneDH ワ U gene 1 gene 2 gene 3 gene 4 D 0 1 0 2 2 0 1 0 1 1 1 1 2 2 0 0 1 0 3 1 1 0 2 1 4 1 0 2 0 0 5 2 2 0 1 0 6 2 1 1 1 1 7 0 1 1 0 0
  • 16. S1 top= gene4, U {D Ugene4}={U} {U} = {{0}{2,5}{1,6}{3}{4,7}} {U} /U=5/8 }{ }{ log }{ }{}{ )|(max 1 U U U U U U GeneDH j m j j ワ U gene4 D 0 2 0 1 1 1 2 1 0 3 2 1 4 0 0 5 1 0 6 1 1 7 0 0
  • 17. Find, {Uj} where j= 1 to m D=0 {U0} = {{0}{2,5}{4,7}} = 3 D=1 {U1} = {{1,6}{3}} =2 gene4 D 2 0 1 1 1 0 2 1 0 0 1 0 1 1 0 0 gene4 D 2 0 1 1 1 0 2 1 0 0 1 0 1 1 0 0 {U0} {U1}
  • 18. max H(D gene4) = 5/8(3/5log3/5+ 2/5 log2/5) = -0.4206 = 2.5730 4206.0 6616.0 1)4|( geneDH s )|(max )|( 1)|( GeneDH GeneDH GeneDH s }{ }{ log }{ }{}{ )|(max 1 U U U U U U GeneDH j m j j ワ
  • 19. Similarly for gene1, gene2, gene3 2726.2)1|( GeneDH s 3528.3)2|( GeneDH s 5730.2)3|( GeneDH s
  • 20. Step 5 : Rank the n numbers of H(D Gene) in S2 by increasing order S2={2.2726, 2.5730, 2.5730, 3.3528} S2={gene1, gene4,gene3, gene2}
  • 21. Step 6 : SGene(genei) from S1 S1={gene4, gene1,gene2, gene3} Hs (D Gene) from S2 S2={gene1, gene4,gene3, gene2} if Satisfies, RE(genei)=(1-留)SGene(genei)+ 留Hs(D Gene) RE(gene4) = (1-0.95)*(0.375)+(0.95*2.2726) = 2.17772 RE(gene1) = (1-0.95)*(0.125)+(0.95*2.2726) = 2.16522 RE(gene2) = (1-0.95)*(0.125)+(0.95*2.2726) = 2.16522 RE(gene3) = (1-0.95)*(0)+(0.95*2.2726) = 2.15897
  • 22. RE(genei)={gene4, gene1,gene2, gene3} Here, Max RE(genei) = gene4 then S S+{gene4}; Now Selected Gene is, S = gene4