Apache Spark is widely used for big data analytics & processing. The memory (model) management has changed considerably in Spark3. These slides attempt to explain those differences and how data engineers could leverage it.
4. Java heap memory
1. Storage Memory JVM heap space reserved for cached data
2. Execution (or shuffle) Memory JVM heap space used by data-structures during
shuffle operations (joins, group-bys and aggregations). Earlier (before Spark 1.6), the
term shuffle memory was also used to describe this section of the memory.
3. User Memory For storing the data-structures created and managed by the users
code
4. Reserved Memory Reserved by Spark for internal purposes.
12. 1. Python worker memory limits the memory in JVM for Python objects
2. Pyspark Executor memory limits the memory of the actual Python process
Python worker memory
14. References
Monitoring and Instrumentation - Spark 3.3.2 Documentation
Decoding Memory in Spark Parameters that are often confused | by Sohom Majumdar | Walmart Global Tech Blog | Medium
Apache Spark Memory Management. This blog describes the concepts behind | by Suhas N M | Analytics Vidhya | Medium