際際滷

際際滷Share a Scribd company logo
Spark memory model
Major categorization
 Java heap memory
 Characterised by Garbage collection
 JVM off-heap memory / direct memory
 Python memory
Java heap memory
Java heap memory
1. Storage Memory  JVM heap space reserved for cached data
2. Execution (or shuffle) Memory  JVM heap space used by data-structures during
shuffle operations (joins, group-bys and aggregations). Earlier (before Spark 1.6), the
term shuffle memory was also used to describe this section of the memory.
3. User Memory  For storing the data-structures created and managed by the users
code
4. Reserved Memory  Reserved by Spark for internal purposes.
Java heap memory
(spark.memory.fraction)
(spark.memory.storageFraction)
Java heap memory (legacy)
Spark2.x vs Spark3.x
Java off-heap memory
Java off-heap memory
1. Off heap dataframes
2. VM overheads  Interned strings, etc.
Java off-heap memory
Java off-heap memory (legacy)
Python worker memory
1. Python worker memory  limits the memory in JVM for Python objects
2. Pyspark Executor memory  limits the memory of the actual Python process
Python worker memory
Python worker memory
References
Monitoring and Instrumentation - Spark 3.3.2 Documentation
Decoding Memory in Spark  Parameters that are often confused | by Sohom Majumdar | Walmart Global Tech Blog | Medium
Apache Spark Memory Management. This blog describes the concepts behind | by Suhas N M | Analytics Vidhya | Medium

More Related Content

Spark3's new memory model/management