ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Seminar in Statistics: Survival Analysis

Chapter 2

Kaplan-Meier Survival
Curves and the Log-
Rank Test
Linda Staub & Alexandros Gekenidis
            March 7th, 2011
1 Review
?   Outcome variable of interest: time until an event
    occurs
?   Time = survival time
    Event = failure
?   Censoring: Don¡®t know survival time exactly

                   True survival time

        observed survival time

                                        Right-censored
1 Review
?    ? = failure time with distribution ?, density ?
?    ? = censoring time with distribution ?, density ?
?   Assume that the censoring time ? is
    independent of the variable of interest ?
?    ? = min(?, ?), ¦¤ = 1*?¡Ü?+
?   We observe ? i.i.d. copies of (?, ¦¤)
?   Survivor function

                    ? ? = Pr(? > ?)
?   Alternative (Ordered) Data Layout




Risk set: collection of individuals who have survived at least to time ?(?)
2 Kaplan-Meier Curves
?   Example
    The data: remission times (weeks) for two groups of
    leukemia patients
Group 1 (n=21)       Group 2 (n=21)
treatment            placebo                      # failed      # censored    Total

6, 6, 6, 7, 10,      1, 1, 2, 2, 3,     Group 1     9               12         21
                                        Group 2     21               0         21
13, 16, 22, 23,      4, 4, 5, 5,
6+, 9+, 10+, 11+,    8, 8, 8, 8,
17+, 19+, 20+,       11, 11, 12, 12,   Descriptive statistic:
25+, 32+, 32+,       15, 17, 22, 23    T1 ?ignoring ? 's ? ? 17.1, T2 ? 8.6
34+, 25+
+ denotes censored
?    Table of ordered failure times
    Group 1 (treatment)                                Group 2 (placebo)
     t( j ) nj     mj                     qj           t( j )  nj      mj      qj

      0         21         0               0            0      21      0        0
      6         21         3               1            1      21      2        0
      7         17         1               1            2      19      2        0
     10         15         1               2            3      17      1        0
     13         12         1               0            4      16      2        0
     16         11         1               3            5      14      2        0
     22          7         1               0            8      12      4        0
     23          6         1               5           11       8      2        0
    >23           -         -              -           12       6      2        0
                                                       15       4      1        0
     Group 1 (treatment)        Group 2 (placebo)      17       3      1        0
                                                       22       2      1        0
     6, 6, 6, 7, 10,              1, 1, 2, 2, 3,
     13, 16, 22, 23,              4, 4, 5, 5,
                                                       23       1      1        0
     6+, 9+, 10+, 11+,            8, 8, 8, 8,
     17+, 19+, 20+,               11, 11, 12, 12,
     25+, 32+, 32+,
     34+, 25+
                                  15, 17, 22, 23    ¡ú Remark: no censorship in group 2
    + denotes
    censored
?    Computation of KM-curve for group 2 (no censoring)
    t( j )   nj   mj   qj     ? ?   ?

     0       21   0    0            1
     1       21   2    0    19/21 = .90
     2       19   2    0    17/21 = .81
     3       17   1    0    16/21 = .76
     4       16   2    0    14/21 = .67
     5       14   2    0    12/21 = .57               # ????????? ???? ?(?)
                                          ? ?   ?   =
     8       12   4    0     8/21 = .38                       21

    11        8   2    0     6/21 = .29
    12        6   2    0     4/21 = .19
    15        4   1    0     3/21 = .14
    17        3   1    0     2/21 = .10

    22        2   1    0     1/21 = .05
    23        1   1    0     0/21 = .00
?    Computation of KM-curve for group 2 (no censoring)
    t( j )   nj   mj   qj     ? ?   ?

     0       21   0    0            1
     1       21   2    0    19/21 = .90
     2       19   2    0    17/21 = .81
     3       17   1    0    16/21 = .76
     4       16   2    0    14/21 = .67
     5       14   2    0    12/21 = .57               # ????????? ???? ?(?)
                                          ? ?   ?   =
     8       12   4    0     8/21 = .38                       21

    11        8   2    0     6/21 = .29
    12        6   2    0     4/21 = .19
    15        4   1    0     3/21 = .14
    17        3   1    0     2/21 = .10

    22        2   1    0     1/21 = .05
    23        1   1    0     0/21 = .00
?    Computation of KM-curve for group 2 (no censoring)
    t( j )   nj   mj   qj     ? ?   ?

     0       21   0    0            1
     1       21   2    0    19/21 = .90
     2       19   2    0    17/21 = .81
     3       17   1    0    16/21 = .76
     4       16   2    0    14/21 = .67
     5       14   2    0    12/21 = .57               # ????????? ???? ?(?)
                                          ? ?   ?   =
     8       12   4    0     8/21 = .38                       21

    11        8   2    0     6/21 = .29
    12        6   2    0     4/21 = .19
    15        4   1    0     3/21 = .14
    17        3   1    0     2/21 = .10

    22        2   1    0     1/21 = .05
    23        1   1    0     0/21 = .00
?    Computation of KM-curve for group 2 (no censoring)
    t( j )   nj   mj   qj     ? ?   ?

     0       21   0    0            1
     1       21   2    0    19/21 = .90
     2       19   2    0    17/21 = .81
     3       17   1    0    16/21 = .76
     4       16   2    0    14/21 = .67
     5       14   2    0    12/21 = .57               # ????????? ???? ?(?)
                                          ? ?   ?   =
     8       12   4    0     8/21 = .38                       21

    11        8   2    0     6/21 = .29
    12        6   2    0     4/21 = .19
    15        4   1    0     3/21 = .14
    17        3   1    0     2/21 = .10

    22        2   1    0     1/21 = .05
    23        1   1    0     0/21 = .00
?    Computation of KM-curve for group 2 (no censoring)
    t( j )   nj   mj   qj     ? ?   ?

     0       21   0    0            1
     1       21   2    0    19/21 = .90
     2       19   2    0    17/21 = .81
     3       17   1    0    16/21 = .76
     4       16   2    0    14/21 = .67
     5       14   2    0    12/21 = .57               # ????????? ???? ?(?)
                                          ? ?   ?   =
     8       12   4    0     8/21 = .38                       21

    11        8   2    0     6/21 = .29
    12        6   2    0     4/21 = .19
    15        4   1    0     3/21 = .14
    17        3   1    0     2/21 = .10

    22        2   1    0     1/21 = .05
    23        1   1    0     0/21 = .00
KM Curve for Group 2 (Placebo)


> time2 <-
c(1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,
22,23)
> status2 <-
c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)

> fit2 <- survfit(Surv(time2, status2) ~ 1)

> plot(fit2,conf.int=0, col = 'red', xlab =
'Time (weeks)', ylab = 'Survival Probability')
> title(main='KM Curve for Group 2 (placebo)')
General KM formula
?   Alternative way to calculate the survival probabilities
?   KM formula = product limit formula

                    ?

     ? ?   ?   =         ? ? ? > ?(?)   ? ¡Ý ?(?)
                   ?=1

               = ? ?(??1) ¡Á ? ? ? > ?(?)           ? ¡Ý ?(?)

    Proof: blackboard
Computation of KM-curve for group 1
(treatment)
t( j )   nj   mj   qj   ? ?(?)

0        21   0    0    1                         Fraction at ?(?) :
                                                     Pr ? > ?(?) ? ¡Ý ?(?) )
6        21   3    1    1¡Á18/21=.8571
7        17   1    1    .8571¡Á16/17=.8067
10 15         1    2    .8067¡Á14/15=.7529
13 12         1    0    .7529¡Á11/12=.6902
                                                       Not available at t? j ?
16       11   1    3    .6902¡Á10/11=.6275
22       7    1    0    .6275¡Á6/7=.5378     failed prior to t? j ?
23       6    1    5    .5378¡Á5/6=.4482
                                                              Censored prior to
                                                                    t? j ?
Computation of KM-curve for group 1
(treatment)
t( j )   nj   mj   qj   ? ?(?)

0        21   0    0    1                         Fraction at ?(?) :
                                                     Pr ? > ?(?) ? ¡Ý ?(?) )
6        21   3    1    1¡Á18/21=.8571
7        17   1    1    .8571¡Á16/17=.8067
10 15         1    2    .8067¡Á14/15=.7529
13 12         1    0    .7529¡Á11/12=.6902
                                                       Not available at t? j ?
16       11   1    3    .6902¡Á10/11=.6275
22       7    1    0    .6275¡Á6/7=.5378     failed prior to t? j ?
23       6    1    5    .5378¡Á5/6=.4482
                                                              Censored prior to
                                                                    t? j ?
Computation of KM-curve for group 1
(treatment)
t( j )   nj   mj   qj   ? ?(?)

0        21   0    0    1                         Fraction at ?(?) :
                                                     Pr ? > ?(?) ? ¡Ý ?(?) )
6        21   3    1    1¡Á18/21=.8571
7        17   1    1    .8571¡Á16/17=.8067
10 15         1    2    .8067¡Á14/15=.7529
13 12         1    0    .7529¡Á11/12=.6902
                                                       Not available at t? j ?
16       11   1    3    .6902¡Á10/11=.6275
22       7    1    0    .6275¡Á6/7=.5378     failed prior to t? j ?
23       6    1    5    .5378¡Á5/6=.4482
                                                              Censored prior to
                                                                    t? j ?
Computation of KM-curve for group 1
(treatment)
t( j )   nj   mj   qj   ? ?(?)

0        21   0    0    1                         Fraction at ?(?) :
                                                     Pr ? > ?(?) ? ¡Ý ?(?) )
6        21   3    1    1¡Á18/21=.8571
7        17   1    1    .8571¡Á16/17=.8067
                                                            ?? ? ??
10 15         1    2    .8067¡Á14/15=.7529               =
                                                               ??
13 12         1    0    .7529¡Á11/12=.6902
                                                       Not available at t? j ?
16       11   1    3    .6902¡Á10/11=.6275
22       7    1    0    .6275¡Á6/7=.5378     failed prior to t? j ?
23       6    1    5    .5378¡Á5/6=.4482
                                                              Censored prior to
                                                                    t? j ?
Computation of KM-curve for group 1
(treatment)
t( j )   nj   mj   qj   ? ?(?)

0        21   0    0    1                         Fraction at ?(?) :
                                                     Pr ? > ?(?) ? ¡Ý ?(?) )
6        21   3    1    1¡Á18/21=.8571
7        17   1    1    .8571¡Á16/17=.8067
10 15         1    2    .8067¡Á14/15=.7529
13 12         1    0    .7529¡Á11/12=.6902
                                                       Not available at t? j ?
16       11   1    3    .6902¡Á10/11=.6275
22       7    1    0    .6275¡Á6/7=.5378     failed prior to t? j ?
23       6    1    5    .5378¡Á5/6=.4482
                                                              Censored prior to
                                                                    t? j ?
KM-curve for group 1 (treatment)

> time1 <-
c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20,
25,32,32,34,35)
> status1 <-
c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0)

> fit1 <- survfit(Surv(time1, status1) ~ 1)

> plot(fit1,conf.int=0, col = 'red', xlab =
'Time (weeks)', ylab = 'Survival
Probability')
> title(main='KM Curve for Group 1
(treatment)')
KM-estimator = Nonparametric MLE
Model
 ? = failure time                distr. function ?, density ?
 ? = censoring time              distr. function ?, density ?
Assume that ? is independent of ?
 ? = min(?, ?)                   ¦¤ = 1*?¡Ü?+
We observe ? i.i.d. copies of (?, ¦¤)
Derivation of the likelihood for ?
Claim
The density of observing (?, 1) is:        ?(?)(1 ? ?(?))
The density of observing (?, 0) is:        ?(?)(1 ? ?(?))
Proof of the Claim: Blackboard
 ? Density of observing (?, ?) is:
                                   ?                     1??
              ? ? 1? ? ?               ? ? ? 1? ? ?
                    ?             1??              ?         1??
           = ? ?        1? ? ?          ? 1? ? ?       ? ?
? The likelihood for ? and ? of ? i.i.d. observations (?1 , ?1 ), ¡­ , (? ? , ? ? ) is:
                 ?
                              ??                 1?? ?              ??          1?? ?
                      ? ??         1 ? ? ??              1 ? ? ??        ? ??
                ?=1


? and ? independent ? Ignore part that involves ?
In order to find the nonparametric maximum likelihood estimator ? ? , we
need to maximize this expression over all possible distribution
functions ? (with corresponding density ?).

Optimization problem
                                   sup ? ? (?)
                                   ?¡Ê?

where ? is the class of all distribution functions on ? and
                              ?
                                          ??               1?? ?
              ?? ? =               ? ??        1 ? ? ??
                             ?=1
But: Problem is not well-defined!
Solution: Let ? be a density w.r.t. counting measure on the observed
failure times (instead of a density w.r.t. Lebesgue measure)
? Replace ?(? ? ) by ?         ??     = ?          ? ? , the jump of the distribution /
survival function at ? ?


Parametrizing everything in terms of the survival function ? = 1 ? ?:
                 ?             ??          1?? ?
? ?? ? =        ?=1   ?   ??        ? ??


And ? satisfies

? ? ? = max ? ? ? , where ? is the space of all survival functions
          ?¡Ê?


One can show that the Kaplan-Meier estimator maximizes the likelihood
? KM-estimator is the NPMLE
Comparison of KM Plots for Remission Data
> time1 <-
c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20,25
,32,32,34,35)
> status1 <-
c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0)

> time2 <-
c(1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,
22,23)
> status2 <-
c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)

> fit1 <- survfit(Surv(time1, status1) ~ 1)
> fit2 <- survfit(Surv(time2, status2) ~ 1)

> plot(fit1,conf.int=0, col = 'blue', xlab =
'Time (weeks)', ylab = 'Survival Probability')
> lines(fit2, col = 'red')
> legend(21,1,c('Group 1 (treatment)', 'Group
2 (placebo)'), col = c('blue','red'), lty = 1)
> title(main='KM-Curves for Remission Data')




 ¡ú Question: Do we have any reason to claim that group 1 (treatment)
             has better survival prognosis than group 2?
3 The Log-Rank Test
?   We look at 2 groups ¡ú extensions to several groups
    possible
?   When are two KM curves statistically equivalent?
    ¡ú testing procedure compares the two curves
    ¡ú we don¡®t have evidence to indicate that the true
        survival curves are different
?   Nullhypothesis
    H 0 : no difference between (true) survival curves
?   Goal: To find an expression (depending on the data)
    from which we know the distribution (or at least
    approximately) under the nullhypothesis
Derivation of test statistic
Remission data: n=42
         # failures       # in risk set
t? j ?        m1 j m2 j    n1 j   n2 j    Expected cell counts:

                                                   ?                ?
                                                                    ? ? ?m1 j ? m2 j ?
 1            0    2       21     21                    n1 j
                                          e1 j   ? ?
 2            0    2       21     19               ?n ?n            ?
 3            0    1       21     17               ? 1j      2j     ?
 4            0    2       21     16
 5            0    2       21     14
                                                      Proportion           # of failures
 6            3    0       21     12                                       over both
                                                      in risk set
 7            1    0       17     12                                       groups
 8            0    4       16     12
10            1    0       15      8
                                                  ? n2 j            ?
11            0    2       13      8
                                          e2 j   ??                 ? ? ?m1 j ? m2 j ?
12            0   12       12      6              ?n ?n             ?
13            1    0       12      4              ? 1j   2j         ?
15            0    1       11      4
16            1    0       11      3
17            0    1       10      3
22            1    1        7      2
23            1    1        6      1
? ?m           ?eij ?
             # failure times
 Oi ? Ei ?                  ij
                     j ?1

O1 ? E1 ? ?10.26
O2 ? E2 ? 10.26

                         ?O2 ? E2 ?2
                        Var ?O2 ? E2 ?
Log-rank statistic    =



Remark: We could also work
with O1 ? E1 and would get the
same statistic! Why?
Distribution of log-rank statistic
H 0 : no difference between survival curves

                                        ?O2 ? E2 ?2            ?12
                                       Var ?O2 ? E2 ?
Log-rank statistic for two groups =


Idea of the Proof:
 ? If ? is standard normal disitributed then ? 2 has a ? 2 distribution with 1 df
   (assuming ? to be one-dim)
                ?2 ? ?2
 ? Set ? =
              ??? ?2 ? ?2

 ? Then ? is standardized and appr. normal distributed for large samples
 ? Hence ? 2 , which is exactly our statistic, has appr. a ? 2 distribution.
Log-Rank Test for Remission data
?   R-code
    > time <-
    c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20,25,32,32,34,35,1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,
    12,12,15,17,22,23)
    > status <-
    c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
    > treatment <-
    c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
    > fit <- survdiff(Surv(time, status) ~ treatment)


                                                 p-value is the probability of obtaining a test
                                                 statistic at least as extreme as the one that
?   Result                                       was actually observed!

    > fit
    Call:
    survdiff(formula = Surv(time, status) ~ treatment)

                 N Observed Expected (O-E)^2/E (O-E)^2/V
    treatment=1 21        9     19.3      5.46      16.8
    treatment=2 21       21     10.7      9.77      16.8

    Chisq = 16.8 on 1 degrees of freedom, p = 4.17e-05

    What does this tell us?
The Log-Rank Test for Several
Groups
?    ?0 : All survival curves are the same
?   Log-rank statistics for > 2 groups involves variances
    and covariances of ? ? ? ? ?
?    ? (¡Ý 2) groups:
    log-rank statistic ~? 2 with ? ? 1 df
Remarks
?   Alternatives to the Log-Rank Test

    Wilcoxen                            Variations of the log
    Tarone-Ware                         rank test, derived by
    Peto                                applying different
                                        weights at the jth
    Flemington-Harrington
                                        failure time


                                                 2
                        ?                       ?
                        ? ? w(t j )(mij ? eij ) ?
                        ?                       ?
    Weighting the       ? j                     ?
    Test statistic:        ?                       ?
                       Var ? ? w(t j )(mij ? eij ) ?
                           ?                       ?
                           ? j                     ?

                              Weight at jth
                              failure time
Remarks
?   Choosing a Test
    ¡ú   Results of different weightings usually
        lead to similar conclusions
    ¡ú   The best choice is test with most power
    ¡ú   There may be a clinical reason to choose a particular
        weighting
    ¡ú   Choice of weighting should be a priori! Not fish for a
        desired p-value!
Stratified log rank test
?   Variation of log rank test
?   Allows controlling for additional (?stratified¡°) variable
?   Split data into stratas, depending on value of
    stratified variable
?   Calculate ? ? ? scores within strata
?   Sum ? ? ? across strata
Stratified log rank test - Example
?   Remission data
?   Stratified variable: 3-level variable (LWBC3) indicating
    low, medium, or high log white blood cell count (coded 1,
    2, and 3, respectively)




                      Treated Group: rx = 0
                      Placebo Group: rx = 1

                      Recall: Non-stratified test ? ? 2 -value of 16.79
                      and corresponding p-value rounded to 0.0000
Stratified Log-Rank Test for
Remission data
?   R-code
    > data <- read.table("http://www.sph.emory.edu/~dkleinb/surv2datasets/anderson.dat")
    > lwbc3 <-
    c(1,1,1,2,1,2,2,1,1,1,3,2,2,2,2,2,3,3,2,3,3,1,2,2,1,1,3,3,1,3,3,2,3,3,3,3,2,3,3,3,2,3)
    > fit <- survdiff(Surv(data$V1,data$V2)~data$V5+strata(lwbc3))




?   Result
    > fit
    Call:
    survdiff(formula = Surv(data$V1, data$V2) ~ data$V5 + strata(lwbc3))

               N Observed Expected (O-E)^2/E (O-E)^2/V
    data$V5=0 21        9     16.4      3.33      10.1
    data$V5=1 21       21     13.6      4.00      10.1

    Chisq = 10.1      on 1 degrees of freedom, p = 0.00145
Stratified vs. unstratified approach
Stratified vs. unstratified approach




          Limitation: Sample size may be
          small within strata
Stratified vs. unstratified approach




          Limitation: Sample size may be
          small within strata

         In next chapter: controlling for
         other explanatory variables!
References

? KLEINBAUM, D.G. and KLEIN, M. (2005).
  Survival Analysis. A self-learning text.
  Springer.
? MAATHUIS, M. (2007). Survival analysis for
  interval censored data. Part I.

More Related Content

Kaplan meier survival curves and the log-rank test

  • 1. Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011
  • 2. 1 Review ? Outcome variable of interest: time until an event occurs ? Time = survival time Event = failure ? Censoring: Don¡®t know survival time exactly True survival time observed survival time Right-censored
  • 3. 1 Review ? ? = failure time with distribution ?, density ? ? ? = censoring time with distribution ?, density ? ? Assume that the censoring time ? is independent of the variable of interest ? ? ? = min(?, ?), ¦¤ = 1*?¡Ü?+ ? We observe ? i.i.d. copies of (?, ¦¤)
  • 4. ? Survivor function ? ? = Pr(? > ?)
  • 5. ? Alternative (Ordered) Data Layout Risk set: collection of individuals who have survived at least to time ?(?)
  • 6. 2 Kaplan-Meier Curves ? Example The data: remission times (weeks) for two groups of leukemia patients Group 1 (n=21) Group 2 (n=21) treatment placebo # failed # censored Total 6, 6, 6, 7, 10, 1, 1, 2, 2, 3, Group 1 9 12 21 Group 2 21 0 21 13, 16, 22, 23, 4, 4, 5, 5, 6+, 9+, 10+, 11+, 8, 8, 8, 8, 17+, 19+, 20+, 11, 11, 12, 12, Descriptive statistic: 25+, 32+, 32+, 15, 17, 22, 23 T1 ?ignoring ? 's ? ? 17.1, T2 ? 8.6 34+, 25+ + denotes censored
  • 7. ? Table of ordered failure times Group 1 (treatment) Group 2 (placebo) t( j ) nj mj qj t( j ) nj mj qj 0 21 0 0 0 21 0 0 6 21 3 1 1 21 2 0 7 17 1 1 2 19 2 0 10 15 1 2 3 17 1 0 13 12 1 0 4 16 2 0 16 11 1 3 5 14 2 0 22 7 1 0 8 12 4 0 23 6 1 5 11 8 2 0 >23 - - - 12 6 2 0 15 4 1 0 Group 1 (treatment) Group 2 (placebo) 17 3 1 0 22 2 1 0 6, 6, 6, 7, 10, 1, 1, 2, 2, 3, 13, 16, 22, 23, 4, 4, 5, 5, 23 1 1 0 6+, 9+, 10+, 11+, 8, 8, 8, 8, 17+, 19+, 20+, 11, 11, 12, 12, 25+, 32+, 32+, 34+, 25+ 15, 17, 22, 23 ¡ú Remark: no censorship in group 2 + denotes censored
  • 8. ? Computation of KM-curve for group 2 (no censoring) t( j ) nj mj qj ? ? ? 0 21 0 0 1 1 21 2 0 19/21 = .90 2 19 2 0 17/21 = .81 3 17 1 0 16/21 = .76 4 16 2 0 14/21 = .67 5 14 2 0 12/21 = .57 # ????????? ???? ?(?) ? ? ? = 8 12 4 0 8/21 = .38 21 11 8 2 0 6/21 = .29 12 6 2 0 4/21 = .19 15 4 1 0 3/21 = .14 17 3 1 0 2/21 = .10 22 2 1 0 1/21 = .05 23 1 1 0 0/21 = .00
  • 9. ? Computation of KM-curve for group 2 (no censoring) t( j ) nj mj qj ? ? ? 0 21 0 0 1 1 21 2 0 19/21 = .90 2 19 2 0 17/21 = .81 3 17 1 0 16/21 = .76 4 16 2 0 14/21 = .67 5 14 2 0 12/21 = .57 # ????????? ???? ?(?) ? ? ? = 8 12 4 0 8/21 = .38 21 11 8 2 0 6/21 = .29 12 6 2 0 4/21 = .19 15 4 1 0 3/21 = .14 17 3 1 0 2/21 = .10 22 2 1 0 1/21 = .05 23 1 1 0 0/21 = .00
  • 10. ? Computation of KM-curve for group 2 (no censoring) t( j ) nj mj qj ? ? ? 0 21 0 0 1 1 21 2 0 19/21 = .90 2 19 2 0 17/21 = .81 3 17 1 0 16/21 = .76 4 16 2 0 14/21 = .67 5 14 2 0 12/21 = .57 # ????????? ???? ?(?) ? ? ? = 8 12 4 0 8/21 = .38 21 11 8 2 0 6/21 = .29 12 6 2 0 4/21 = .19 15 4 1 0 3/21 = .14 17 3 1 0 2/21 = .10 22 2 1 0 1/21 = .05 23 1 1 0 0/21 = .00
  • 11. ? Computation of KM-curve for group 2 (no censoring) t( j ) nj mj qj ? ? ? 0 21 0 0 1 1 21 2 0 19/21 = .90 2 19 2 0 17/21 = .81 3 17 1 0 16/21 = .76 4 16 2 0 14/21 = .67 5 14 2 0 12/21 = .57 # ????????? ???? ?(?) ? ? ? = 8 12 4 0 8/21 = .38 21 11 8 2 0 6/21 = .29 12 6 2 0 4/21 = .19 15 4 1 0 3/21 = .14 17 3 1 0 2/21 = .10 22 2 1 0 1/21 = .05 23 1 1 0 0/21 = .00
  • 12. ? Computation of KM-curve for group 2 (no censoring) t( j ) nj mj qj ? ? ? 0 21 0 0 1 1 21 2 0 19/21 = .90 2 19 2 0 17/21 = .81 3 17 1 0 16/21 = .76 4 16 2 0 14/21 = .67 5 14 2 0 12/21 = .57 # ????????? ???? ?(?) ? ? ? = 8 12 4 0 8/21 = .38 21 11 8 2 0 6/21 = .29 12 6 2 0 4/21 = .19 15 4 1 0 3/21 = .14 17 3 1 0 2/21 = .10 22 2 1 0 1/21 = .05 23 1 1 0 0/21 = .00
  • 13. KM Curve for Group 2 (Placebo) > time2 <- c(1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17, 22,23) > status2 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) > fit2 <- survfit(Surv(time2, status2) ~ 1) > plot(fit2,conf.int=0, col = 'red', xlab = 'Time (weeks)', ylab = 'Survival Probability') > title(main='KM Curve for Group 2 (placebo)')
  • 14. General KM formula ? Alternative way to calculate the survival probabilities ? KM formula = product limit formula ? ? ? ? = ? ? ? > ?(?) ? ¡Ý ?(?) ?=1 = ? ?(??1) ¡Á ? ? ? > ?(?) ? ¡Ý ?(?) Proof: blackboard
  • 15. Computation of KM-curve for group 1 (treatment) t( j ) nj mj qj ? ?(?) 0 21 0 0 1 Fraction at ?(?) : Pr ? > ?(?) ? ¡Ý ?(?) ) 6 21 3 1 1¡Á18/21=.8571 7 17 1 1 .8571¡Á16/17=.8067 10 15 1 2 .8067¡Á14/15=.7529 13 12 1 0 .7529¡Á11/12=.6902 Not available at t? j ? 16 11 1 3 .6902¡Á10/11=.6275 22 7 1 0 .6275¡Á6/7=.5378 failed prior to t? j ? 23 6 1 5 .5378¡Á5/6=.4482 Censored prior to t? j ?
  • 16. Computation of KM-curve for group 1 (treatment) t( j ) nj mj qj ? ?(?) 0 21 0 0 1 Fraction at ?(?) : Pr ? > ?(?) ? ¡Ý ?(?) ) 6 21 3 1 1¡Á18/21=.8571 7 17 1 1 .8571¡Á16/17=.8067 10 15 1 2 .8067¡Á14/15=.7529 13 12 1 0 .7529¡Á11/12=.6902 Not available at t? j ? 16 11 1 3 .6902¡Á10/11=.6275 22 7 1 0 .6275¡Á6/7=.5378 failed prior to t? j ? 23 6 1 5 .5378¡Á5/6=.4482 Censored prior to t? j ?
  • 17. Computation of KM-curve for group 1 (treatment) t( j ) nj mj qj ? ?(?) 0 21 0 0 1 Fraction at ?(?) : Pr ? > ?(?) ? ¡Ý ?(?) ) 6 21 3 1 1¡Á18/21=.8571 7 17 1 1 .8571¡Á16/17=.8067 10 15 1 2 .8067¡Á14/15=.7529 13 12 1 0 .7529¡Á11/12=.6902 Not available at t? j ? 16 11 1 3 .6902¡Á10/11=.6275 22 7 1 0 .6275¡Á6/7=.5378 failed prior to t? j ? 23 6 1 5 .5378¡Á5/6=.4482 Censored prior to t? j ?
  • 18. Computation of KM-curve for group 1 (treatment) t( j ) nj mj qj ? ?(?) 0 21 0 0 1 Fraction at ?(?) : Pr ? > ?(?) ? ¡Ý ?(?) ) 6 21 3 1 1¡Á18/21=.8571 7 17 1 1 .8571¡Á16/17=.8067 ?? ? ?? 10 15 1 2 .8067¡Á14/15=.7529 = ?? 13 12 1 0 .7529¡Á11/12=.6902 Not available at t? j ? 16 11 1 3 .6902¡Á10/11=.6275 22 7 1 0 .6275¡Á6/7=.5378 failed prior to t? j ? 23 6 1 5 .5378¡Á5/6=.4482 Censored prior to t? j ?
  • 19. Computation of KM-curve for group 1 (treatment) t( j ) nj mj qj ? ?(?) 0 21 0 0 1 Fraction at ?(?) : Pr ? > ?(?) ? ¡Ý ?(?) ) 6 21 3 1 1¡Á18/21=.8571 7 17 1 1 .8571¡Á16/17=.8067 10 15 1 2 .8067¡Á14/15=.7529 13 12 1 0 .7529¡Á11/12=.6902 Not available at t? j ? 16 11 1 3 .6902¡Á10/11=.6275 22 7 1 0 .6275¡Á6/7=.5378 failed prior to t? j ? 23 6 1 5 .5378¡Á5/6=.4482 Censored prior to t? j ?
  • 20. KM-curve for group 1 (treatment) > time1 <- c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20, 25,32,32,34,35) > status1 <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0) > fit1 <- survfit(Surv(time1, status1) ~ 1) > plot(fit1,conf.int=0, col = 'red', xlab = 'Time (weeks)', ylab = 'Survival Probability') > title(main='KM Curve for Group 1 (treatment)')
  • 21. KM-estimator = Nonparametric MLE Model ? = failure time distr. function ?, density ? ? = censoring time distr. function ?, density ? Assume that ? is independent of ? ? = min(?, ?) ¦¤ = 1*?¡Ü?+ We observe ? i.i.d. copies of (?, ¦¤) Derivation of the likelihood for ? Claim The density of observing (?, 1) is: ?(?)(1 ? ?(?)) The density of observing (?, 0) is: ?(?)(1 ? ?(?)) Proof of the Claim: Blackboard ? Density of observing (?, ?) is: ? 1?? ? ? 1? ? ? ? ? ? 1? ? ? ? 1?? ? 1?? = ? ? 1? ? ? ? 1? ? ? ? ?
  • 22. ? The likelihood for ? and ? of ? i.i.d. observations (?1 , ?1 ), ¡­ , (? ? , ? ? ) is: ? ?? 1?? ? ?? 1?? ? ? ?? 1 ? ? ?? 1 ? ? ?? ? ?? ?=1 ? and ? independent ? Ignore part that involves ? In order to find the nonparametric maximum likelihood estimator ? ? , we need to maximize this expression over all possible distribution functions ? (with corresponding density ?). Optimization problem sup ? ? (?) ?¡Ê? where ? is the class of all distribution functions on ? and ? ?? 1?? ? ?? ? = ? ?? 1 ? ? ?? ?=1 But: Problem is not well-defined!
  • 23. Solution: Let ? be a density w.r.t. counting measure on the observed failure times (instead of a density w.r.t. Lebesgue measure) ? Replace ?(? ? ) by ? ?? = ? ? ? , the jump of the distribution / survival function at ? ? Parametrizing everything in terms of the survival function ? = 1 ? ?: ? ?? 1?? ? ? ?? ? = ?=1 ? ?? ? ?? And ? satisfies ? ? ? = max ? ? ? , where ? is the space of all survival functions ?¡Ê? One can show that the Kaplan-Meier estimator maximizes the likelihood ? KM-estimator is the NPMLE
  • 24. Comparison of KM Plots for Remission Data > time1 <- c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20,25 ,32,32,34,35) > status1 <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0) > time2 <- c(1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17, 22,23) > status2 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) > fit1 <- survfit(Surv(time1, status1) ~ 1) > fit2 <- survfit(Surv(time2, status2) ~ 1) > plot(fit1,conf.int=0, col = 'blue', xlab = 'Time (weeks)', ylab = 'Survival Probability') > lines(fit2, col = 'red') > legend(21,1,c('Group 1 (treatment)', 'Group 2 (placebo)'), col = c('blue','red'), lty = 1) > title(main='KM-Curves for Remission Data') ¡ú Question: Do we have any reason to claim that group 1 (treatment) has better survival prognosis than group 2?
  • 25. 3 The Log-Rank Test ? We look at 2 groups ¡ú extensions to several groups possible ? When are two KM curves statistically equivalent? ¡ú testing procedure compares the two curves ¡ú we don¡®t have evidence to indicate that the true survival curves are different ? Nullhypothesis H 0 : no difference between (true) survival curves ? Goal: To find an expression (depending on the data) from which we know the distribution (or at least approximately) under the nullhypothesis
  • 26. Derivation of test statistic Remission data: n=42 # failures # in risk set t? j ? m1 j m2 j n1 j n2 j Expected cell counts: ? ? ? ? ?m1 j ? m2 j ? 1 0 2 21 21 n1 j e1 j ? ? 2 0 2 21 19 ?n ?n ? 3 0 1 21 17 ? 1j 2j ? 4 0 2 21 16 5 0 2 21 14 Proportion # of failures 6 3 0 21 12 over both in risk set 7 1 0 17 12 groups 8 0 4 16 12 10 1 0 15 8 ? n2 j ? 11 0 2 13 8 e2 j ?? ? ? ?m1 j ? m2 j ? 12 0 12 12 6 ?n ?n ? 13 1 0 12 4 ? 1j 2j ? 15 0 1 11 4 16 1 0 11 3 17 0 1 10 3 22 1 1 7 2 23 1 1 6 1
  • 27. ? ?m ?eij ? # failure times Oi ? Ei ? ij j ?1 O1 ? E1 ? ?10.26 O2 ? E2 ? 10.26 ?O2 ? E2 ?2 Var ?O2 ? E2 ? Log-rank statistic = Remark: We could also work with O1 ? E1 and would get the same statistic! Why?
  • 28. Distribution of log-rank statistic H 0 : no difference between survival curves ?O2 ? E2 ?2 ?12 Var ?O2 ? E2 ? Log-rank statistic for two groups = Idea of the Proof: ? If ? is standard normal disitributed then ? 2 has a ? 2 distribution with 1 df (assuming ? to be one-dim) ?2 ? ?2 ? Set ? = ??? ?2 ? ?2 ? Then ? is standardized and appr. normal distributed for large samples ? Hence ? 2 , which is exactly our statistic, has appr. a ? 2 distribution.
  • 29. Log-Rank Test for Remission data ? R-code > time <- c(6,6,6,7,10,13,16,22,23,6,9,10,11,17,19,20,25,32,32,34,35,1,1,2,2,3,4,4,5,5,8,8,8,8,11,11, 12,12,15,17,22,23) > status <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1) > treatment <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) > fit <- survdiff(Surv(time, status) ~ treatment) p-value is the probability of obtaining a test statistic at least as extreme as the one that ? Result was actually observed! > fit Call: survdiff(formula = Surv(time, status) ~ treatment) N Observed Expected (O-E)^2/E (O-E)^2/V treatment=1 21 9 19.3 5.46 16.8 treatment=2 21 21 10.7 9.77 16.8 Chisq = 16.8 on 1 degrees of freedom, p = 4.17e-05 What does this tell us?
  • 30. The Log-Rank Test for Several Groups ? ?0 : All survival curves are the same ? Log-rank statistics for > 2 groups involves variances and covariances of ? ? ? ? ? ? ? (¡Ý 2) groups: log-rank statistic ~? 2 with ? ? 1 df
  • 31. Remarks ? Alternatives to the Log-Rank Test Wilcoxen Variations of the log Tarone-Ware rank test, derived by Peto applying different weights at the jth Flemington-Harrington failure time 2 ? ? ? ? w(t j )(mij ? eij ) ? ? ? Weighting the ? j ? Test statistic: ? ? Var ? ? w(t j )(mij ? eij ) ? ? ? ? j ? Weight at jth failure time
  • 32. Remarks ? Choosing a Test ¡ú Results of different weightings usually lead to similar conclusions ¡ú The best choice is test with most power ¡ú There may be a clinical reason to choose a particular weighting ¡ú Choice of weighting should be a priori! Not fish for a desired p-value!
  • 33. Stratified log rank test ? Variation of log rank test ? Allows controlling for additional (?stratified¡°) variable ? Split data into stratas, depending on value of stratified variable ? Calculate ? ? ? scores within strata ? Sum ? ? ? across strata
  • 34. Stratified log rank test - Example ? Remission data ? Stratified variable: 3-level variable (LWBC3) indicating low, medium, or high log white blood cell count (coded 1, 2, and 3, respectively) Treated Group: rx = 0 Placebo Group: rx = 1 Recall: Non-stratified test ? ? 2 -value of 16.79 and corresponding p-value rounded to 0.0000
  • 35. Stratified Log-Rank Test for Remission data ? R-code > data <- read.table("http://www.sph.emory.edu/~dkleinb/surv2datasets/anderson.dat") > lwbc3 <- c(1,1,1,2,1,2,2,1,1,1,3,2,2,2,2,2,3,3,2,3,3,1,2,2,1,1,3,3,1,3,3,2,3,3,3,3,2,3,3,3,2,3) > fit <- survdiff(Surv(data$V1,data$V2)~data$V5+strata(lwbc3)) ? Result > fit Call: survdiff(formula = Surv(data$V1, data$V2) ~ data$V5 + strata(lwbc3)) N Observed Expected (O-E)^2/E (O-E)^2/V data$V5=0 21 9 16.4 3.33 10.1 data$V5=1 21 21 13.6 4.00 10.1 Chisq = 10.1 on 1 degrees of freedom, p = 0.00145
  • 37. Stratified vs. unstratified approach Limitation: Sample size may be small within strata
  • 38. Stratified vs. unstratified approach Limitation: Sample size may be small within strata In next chapter: controlling for other explanatory variables!
  • 39. References ? KLEINBAUM, D.G. and KLEIN, M. (2005). Survival Analysis. A self-learning text. Springer. ? MAATHUIS, M. (2007). Survival analysis for interval censored data. Part I.