�ݺ�ߣ

Rough K-Means DR. E. N. SATHISHKUMAR
Department of Computer Science, Periyar University, Salem - 11
ROUGH K-MEANS CLUSTERING ALGORITHM
Algorithm: Rough K-Means
Input: Dataset of n objects with d features, number of clusters K and values of
parameters Wlower, Wupper, and Epsilon.
Output: Lower approximation 𝑈(𝐾) and Upper approximation 𝑈(𝐾) of K
Clusters.
Step1: Randomly assign each data object one lower approximation𝑈(𝐾). By
definition (property 2) the data object also belongs to upper approximation
𝑈𝐾 of the same Cluster.
Step 2: Compute Cluster Centroids Cj
If 𝑈(𝐾) ≠ ∅ and 𝑈(𝐾) - 𝑈(𝐾) = ∅
Cj =
𝑥 𝑖
𝑈(𝐾)𝑥∈ 𝑈(𝐾)
Else If 𝑈(𝐾) = ∅ and 𝑈(𝐾) - 𝑈(𝐾) ≠ ∅
Cj =
𝑥 𝑖
𝑈(𝐾) − 𝑈(𝐾)𝑥∈ 𝑈(𝐾) − 𝑈(𝐾)
Else
Cj = Wlower ×
𝑥 𝑖
𝑈(𝐾)𝑥∈ 𝑈(𝐾) + Wupper ×
𝑥 𝑖
𝑈(𝐾) − 𝑈(𝐾)𝑥∈ 𝑈(𝐾) − 𝑈(𝐾)
Step 3: Assign each object to the lower approximation 𝑈 𝐾 or upper
approximation 𝑈(𝐾) of cluster i respectively. For each object vector x, let
d(X, Cj) be the distance between itself and the centroid of cluster Cj.
d(X, Cj) = 𝑚𝑖𝑛1 ≤j ≤K d(X, Cj).
The ratio d(X, Ci) / d(X, Cj), 1≤ i, j ≤ K is used to determine the membership
of x as follow: If d(X, Ci) / d(X, Cj) ≤ epsilon, for any pair (i, j), the
𝑥 ∈ 𝑈(𝐶𝑖) and 𝑥 ∈ 𝑈(𝐶𝑗 ) and x will not be a part of any lower
approximation. Otherwise, 𝑥 ∈ 𝑈(𝐶𝑖), such that d(X, Ci) is the minimum of
1 ≤ i ≤ K. In addition 𝑥 ∈ 𝑈(𝐶𝑖).
Step 4: Repeat Steps 2 and 3 until convergence.

Illustrative Example
Table 1 shows example information system with real-valued conditional attributes.
It consists of six objects/genes, and two features/samples. k = 2, which is the
number of clusters. Weight of the lower approximation Wlower=0.7, Weight of the
upper approximation Wupper = 0.3 and Relative threshold = 2.
Table 1 Example dataset for Rough K-Means
U X Y
1 0 3
2 1 3
3 3 1
4 3 0.5
5 5 0
6 6 0
Step1: Randomly assign each data objects to exactly one lower approximation
K1 = {(0, 3), (1, 3), (3, 1)}
K2 = {(3, 0.5), (5, 0), (6, 0)}
Step 2: In this case 𝑈(𝐾) ≠ ∅ and 𝑈(𝐾) - 𝑈(𝐾) = ∅, so we compute the centroid
using Cj =
𝑥 𝑖
𝑈(𝐾)𝑥∈ 𝑈(𝐾) ,
C1 =
0+1+3
3
,
3+3+1
3
= (1.33, 2.33)
C2 =
3+5+6
3
,
0.5+0+0
3
= (4.67, 0.17)
Find the distance from centroid to each point using equlidean distance,
𝐷𝑖 = (𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2

d1(X, C1):
(0, 3)(1.33, 2.33) ⇒ (1.33 − 0)2 + (2.33 − 3)2 = 1.49
(1, 3)(1.33, 2.33) ⇒ (1.33 − 1)2 + (2.33 − 3)2 = 0.75
(3, 1)(1.33, 2.33) ⇒ (1.33 − 3)2 + (2.33 − 1)2 = 2.13
(3, 0.5)(1.33, 2.33) ⇒ (1.33 − 3)2 + (2.33 − 0.5)2 = 2.48
(5, 0)(1.33, 2.33) ⇒ (1.33 − 5)2 + (2.33 − 0)2 = 4.45
(6, 0)(1.33, 2.33) ⇒ (1.33 − 6)2 + (2.33 − 0)2 = 5.22
d2(X, C2):
(0, 3)(4.67, 0.17) ⇒ (4.67 − 0)2 + (0.17 − 3)2 = 5.46
(1, 3)(4.67, 0.17) ⇒ (4.67 − 1)2 + (0.17 − 3)2 = 4.63
(3, 1)(4.67, 0.17) ⇒ (4.67 − 3)2 + (0.17 − 1)2 = 1.86
(3, 0.5)(4.67, 0.17) ⇒ (4.67 − 3)2 + (0.17 − 0.5)2 = 1.70
(5, 0)(4.67, 0.17) ⇒ (4.67 − 5)2 + (0.17 − 0)2 = 0.37
(6, 0)(4.67, 0.17) ⇒ (4.67 − 6)2 + (0.17 − 0)2 = 1.34
Step 3: Assign each object to the lower approximation 𝑈 𝐾 or upper
approximation 𝑈(𝐾) of cluster i respectively. Check If d(X, Ci) /d(X, Cj) ≤ epsilon.
1. (0, 3) ⇒ d2 / d1 = 5.46/1.49 = 3.66443 ≰ 2. So, x1 will be a part of 𝐾1
2. (1, 3) ⇒ 4.63/0.75 = 6.173 ≰ 2. So, x2 will be a part of 𝐾1
3. (3, 1) ⇒ 2.13/1.86 = 1.145 < 2 , so x3 will not be a part of 𝐾1& 𝐾2
4. (3, 0.5) ⇒ 2.48/1.70 = 1.458 < 2 , so x4 will not be a part of 𝐾1& 𝐾2

Now, we have clusters
𝐾1= {(0, 3), (1, 3)} 𝐾1 = {(0, 3), (1, 3), (3, 1), (3, 0.5)}
𝐾2= {(5, 0), (6, 0)} 𝐾2 = {(5, 0), (6, 0), (3, 1), (3, 0.5)}
Here, 𝑈(𝐾) ≠ ∅ and 𝑈(𝐾) - 𝑈(𝐾) ≠ ∅ then find out the new centroid by using
below equation,
Cj = Wlower ×
𝑥 𝑖
𝑈(𝐾)𝑥∈ 𝑈(𝐾) + Wupper ×
𝑥 𝑖
𝑈(𝐾) − 𝑈(𝐾)𝑥∈ 𝑈(𝐾) − 𝑈(𝐾)
C1 = 0.7 ×
0+1
2
,
3+3
2
+ 0.3 ×
3+3
2
,
1+0.5
2
= (1.25, 2.325)
C2 = 0.7 ×
5+6
2
,
0+0
2
+ 0.3 ×
3+3
2
,
1+0.5
2
= (4.75, 0.225)
Step 4: Repeat Steps 2 and 3 until convergence (Old Centroid = New Centroid).

�ݺ�ߣ

Rough K Means - Numerical Example

Recommended

More Related Content

What's hot (20)

Similar to Rough K Means - Numerical Example (20)

Recently uploaded (20)

Rough K Means - Numerical Example