ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Erlang and
              Scalability
         Percona Performance 2009



Jan Henry Nystrom
henry@erlang-consulting.com
Introduction
?     Scalability Killers
?     Design Decisions ¨C Language and Yours
?     Thinking Scalable/Parallel
?     Code for the correct case
?     Rules of Thumb
?     Scalability in the small: SMP




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   2
Scalability Killers
? Synchronization
? Resource contention
  Synchronization




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   3
Design Decisions
                                               No sharing

? Processes
? Encapsulation
? No implicit synchronization




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   4
Design Decisions
                         No implicit synchronization

?     Spawn always succeed
?     Sending always succeed
?     Random access message buffer
?     Fire and forget unless you need the synchronization




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   5
Design Decisions
              Concurrency oriented programming

?     Concurrency support an integral part of the language
?     Distribution support
?     Sets the focus firmly on the concurrent tasks
?     Code for the correct case
?     Clear Code


                                      Clarity is King!
        I rather try to get clear code correct than correct code clear


Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   6
Thinking Scalable/Parallel
                         List length: Obviously Linear

      4
      3
      2
      1
      0:




                         But not when you have n processors?


Erlang and Scalability    Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   7
Thinking Scalable/Parallel
 List length: O(logN) with sufficient processors

          4


           2                                                        2


           1                           1                             1                             1




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   8
Thinking Scalable/Parallel
                                 In the Erlang setting

?     Do not introduce unneeded synchronization
?     Remember processes are cheap
?     Do not introduce unneeded synchronization
?     A terminated process is all garbage
?     Do not introduce unneeded synchronization




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   9
Code for the Correct Case


                                     request
        set timer
                                          set timer                      request
                                                                                                        request
                                                                            set timer
                                                                                                            r
                                                                                                   answe

                                                                    release timer
                                                                            check
                                                                a n sw er

                                release timer
                                  w er  check
                             a ns

release timer
        check
 Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting             10
Code for the Correct Case


                                    request
       set timer
                                                                        request
                                                                                                       request



                                            r
                                 answe
release timer
        check




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting             11
Rules of Thumb
? Rule 1 - All independent tasks should be processes
? Rule 2 - Do not invent concurrency that is not there!



                            f()



                            g()
                                                                    h(g(f()))
                                                                       h(g(f()))
                                                                           h(g(f()))
                                                                              h(g(f()))
                            h()




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   12
Scalability in the small: SMP
                                   Erlang SMP ¡±Credo¡±


 SMP should be transparent to the programmer in
    much the same way as Erlang Distribution

? You shouldn¡¯t have to think about it
   ...but sometimes you must
? Use SMP mainly for stuff that you¡¯d make concurrent anyway
? Erlang uses concurrency as a structuring principle
  ? Model for the natural concurrency in your problem




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   13
Scalability in the small: SMP
                                                                        ¡±Big bang¡± benchmark on Sunfire T2000
? Erlang on multicore

                                                                                            1 scheduler
? SMP prototype ¡®97,
  First OTP release May ¡®06.
? Mid -06 benchmark mimicking
  call handling (axdmark) on the
  (experimental) SMP emulator.
  Observed speedup/core: 0.95
? First Ericsson product (TGC)
  released on SMP Erlang
  in Q207.
                                                                                                          16 schedulers
                                                                                 Simultaneous processes



Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting             14
Scalability in the small: SMP
   Case Study: Telephony Gateway Controller

                                                                             AXE                       TGC

? Mediates between legacy telephony and multimedia
  networks.
? Hugely complex state machines
? + massive concurrency.                        GW GW                                                    GW
? Developed in Erlang.
? Multicore version shipped to customer Q207.
? Porting from 1-core PPC to 2-core Intel took < 1 man-year
  (including testing).

Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting     15
Scalability in the small: SMP
   Case Study: Telephony Gateway Controller
       Traffic            IS/GCP                 IS/GEP                   IS/GEP                AXD          AXD
      scenario
                          1slot/board          Dual core                 Dual core              CPB5         CPB6
                                               One core                  Two cores
                                                running                   running
                                              2slots/board              2slots/board


    POTS-POTS             X call/sec         2.3X call/sec           4.3X call/sec               0.4X      2.1X call/sec
      /AGW
                                                                        OTP R11_3               call/sec
                                             One core used
                                                                      beta+patches

  ISUP-ISUP /Inter       3.6X call/sec 7.7X call/sec                  13X call/sec               1.55X     7.6X call/sec
      MGW
                                                                        OTP R11_3               call/sec
                                             One core used
                                                                      beta+patches

  ISUP-ISUP /Intra       5.5X call/sec                                  26X call/sec             3.17X     14X call/sec
      MGW
                                                                                                call/sec


Erlang and Scalability    Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting               16
Scalability in the small: SMP
                                               Speedup on 4 Hyper Threaded Pentium4

                                         4.5
                                          4                                                                                3.96
                                                                                                                    3.79
                                         3.5                                                             3.63

                               Speddup    3                                                   3.11
                                                                                   2.73
                                         2.5
                                          2                  1.92       2.05

                                         1.5
                                          1          1
                                         0.5
                                          0
                                                 1       2          3          4          5          6          7          8

? Chatty                                                                  # Schedulers


? 1000 processes created
? Each process randomly sends req/recieves ack from all other
  processes
Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting                           17
Scalability in the small: SMP
                                                    Erlang VM
non-SMP VM
                                                                                                run queue



                                               Scheduler




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting        18
Scalability in the small: SMP
                                                       Erlang VM
Current SMP VM
                                                                                                   run queue
 OTP R11/R12                                      Scheduler #1



                                                Scheduler #2




                                            Scheduler #N



 Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting          19
Scalability in the small: SMP
                                             Erlang VM
New SMP VM
  OTP R13                                  Scheduler #1 run queue

    Released
    21th April                                                                                         migration
                                       Scheduler #2                         run queue                    logic




                                      Scheduler #N                          run queue



Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting               20
Scalability in the small: SMP
                                          Multiple
                                         run queues



                                                          Speedup: Ca 0.43 * N @ 32 cores
                                                                               Memory allocation locks
                                                                                   dominate...


                                                    Single
                                                  run queue




? Speedup of ¡±Big Bang¡± on a Tilera Tile64 chip (R13A)
  ? 1000 processes, all talking to each other
 Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting    21
Scalability in the small: SMP
                                  Shift in Bottlenecks

? All scalable Erlang systems were stress tested
   ? for CPU usage
   ? for network usage
? With SMP hardware we must stress test for memory usage
? In the typical SMP system, the bottleneck has shifted from
  the CPU to the memory




Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   22
Scalability in the small: SMP
                          Death by a thousand cuts

? Many requests that generate short spikes in memory usage
? Limit or serialize those requests
? More on this in coming paper from CTO Ulf Wiger

loop(State) ->
  receive
   {request, typeA, Data} ->
       Data1 = allocate_lots_of_memory(Data),
        a_server ! {request, typeA, self()},
      receive
          {answer, ¡­

Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   23
Questions




                                                   ???


Erlang and Scalability   Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting   24

More Related Content

Erlang and Scalability

  • 1. Erlang and Scalability Percona Performance 2009 Jan Henry Nystrom henry@erlang-consulting.com
  • 2. Introduction ? Scalability Killers ? Design Decisions ¨C Language and Yours ? Thinking Scalable/Parallel ? Code for the correct case ? Rules of Thumb ? Scalability in the small: SMP Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 2
  • 3. Scalability Killers ? Synchronization ? Resource contention Synchronization Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 3
  • 4. Design Decisions No sharing ? Processes ? Encapsulation ? No implicit synchronization Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 4
  • 5. Design Decisions No implicit synchronization ? Spawn always succeed ? Sending always succeed ? Random access message buffer ? Fire and forget unless you need the synchronization Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 5
  • 6. Design Decisions Concurrency oriented programming ? Concurrency support an integral part of the language ? Distribution support ? Sets the focus firmly on the concurrent tasks ? Code for the correct case ? Clear Code Clarity is King! I rather try to get clear code correct than correct code clear Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 6
  • 7. Thinking Scalable/Parallel List length: Obviously Linear 4 3 2 1 0: But not when you have n processors? Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 7
  • 8. Thinking Scalable/Parallel List length: O(logN) with sufficient processors 4 2 2 1 1 1 1 Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 8
  • 9. Thinking Scalable/Parallel In the Erlang setting ? Do not introduce unneeded synchronization ? Remember processes are cheap ? Do not introduce unneeded synchronization ? A terminated process is all garbage ? Do not introduce unneeded synchronization Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 9
  • 10. Code for the Correct Case request set timer set timer request request set timer r answe release timer check a n sw er release timer w er check a ns release timer check Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 10
  • 11. Code for the Correct Case request set timer request request r answe release timer check Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 11
  • 12. Rules of Thumb ? Rule 1 - All independent tasks should be processes ? Rule 2 - Do not invent concurrency that is not there! f() g() h(g(f())) h(g(f())) h(g(f())) h(g(f())) h() Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 12
  • 13. Scalability in the small: SMP Erlang SMP ¡±Credo¡± SMP should be transparent to the programmer in much the same way as Erlang Distribution ? You shouldn¡¯t have to think about it ...but sometimes you must ? Use SMP mainly for stuff that you¡¯d make concurrent anyway ? Erlang uses concurrency as a structuring principle ? Model for the natural concurrency in your problem Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 13
  • 14. Scalability in the small: SMP ¡±Big bang¡± benchmark on Sunfire T2000 ? Erlang on multicore 1 scheduler ? SMP prototype ¡®97, First OTP release May ¡®06. ? Mid -06 benchmark mimicking call handling (axdmark) on the (experimental) SMP emulator. Observed speedup/core: 0.95 ? First Ericsson product (TGC) released on SMP Erlang in Q207. 16 schedulers Simultaneous processes Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 14
  • 15. Scalability in the small: SMP Case Study: Telephony Gateway Controller AXE TGC ? Mediates between legacy telephony and multimedia networks. ? Hugely complex state machines ? + massive concurrency. GW GW GW ? Developed in Erlang. ? Multicore version shipped to customer Q207. ? Porting from 1-core PPC to 2-core Intel took < 1 man-year (including testing). Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 15
  • 16. Scalability in the small: SMP Case Study: Telephony Gateway Controller Traffic IS/GCP IS/GEP IS/GEP AXD AXD scenario 1slot/board Dual core Dual core CPB5 CPB6 One core Two cores running running 2slots/board 2slots/board POTS-POTS X call/sec 2.3X call/sec 4.3X call/sec 0.4X 2.1X call/sec /AGW OTP R11_3 call/sec One core used beta+patches ISUP-ISUP /Inter 3.6X call/sec 7.7X call/sec 13X call/sec 1.55X 7.6X call/sec MGW OTP R11_3 call/sec One core used beta+patches ISUP-ISUP /Intra 5.5X call/sec 26X call/sec 3.17X 14X call/sec MGW call/sec Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 16
  • 17. Scalability in the small: SMP Speedup on 4 Hyper Threaded Pentium4 4.5 4 3.96 3.79 3.5 3.63 Speddup 3 3.11 2.73 2.5 2 1.92 2.05 1.5 1 1 0.5 0 1 2 3 4 5 6 7 8 ? Chatty # Schedulers ? 1000 processes created ? Each process randomly sends req/recieves ack from all other processes Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 17
  • 18. Scalability in the small: SMP Erlang VM non-SMP VM run queue Scheduler Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 18
  • 19. Scalability in the small: SMP Erlang VM Current SMP VM run queue OTP R11/R12 Scheduler #1 Scheduler #2 Scheduler #N Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 19
  • 20. Scalability in the small: SMP Erlang VM New SMP VM OTP R13 Scheduler #1 run queue Released 21th April migration Scheduler #2 run queue logic Scheduler #N run queue Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 20
  • 21. Scalability in the small: SMP Multiple run queues Speedup: Ca 0.43 * N @ 32 cores Memory allocation locks dominate... Single run queue ? Speedup of ¡±Big Bang¡± on a Tilera Tile64 chip (R13A) ? 1000 processes, all talking to each other Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 21
  • 22. Scalability in the small: SMP Shift in Bottlenecks ? All scalable Erlang systems were stress tested ? for CPU usage ? for network usage ? With SMP hardware we must stress test for memory usage ? In the typical SMP system, the bottleneck has shifted from the CPU to the memory Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 22
  • 23. Scalability in the small: SMP Death by a thousand cuts ? Many requests that generate short spikes in memory usage ? Limit or serialize those requests ? More on this in coming paper from CTO Ulf Wiger loop(State) -> receive {request, typeA, Data} -> Data1 = allocate_lots_of_memory(Data), a_server ! {request, typeA, self()}, receive {answer, ¡­ Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 23
  • 24. Questions ??? Erlang and Scalability Percona Performance Conference ? 2009 -2009, Erlang Training and Consulting 24