Public View
Suggest
Download this page (.md) Download entire wiki (.zip)
Clone entire wiki

Optimizing Spark

In the event your domain knowledge can help you make decisions about how spark load-balances or stripes data across worker nodes.
Persistence “you should store this data in faster/slower memory”
MEMORY_ONLY, MEMORY_ONLY_SER, MEMORY_AND_DISK, MEMORY_AND_DISK_SER, DISK_ONLY
rdd.persist(StorageLevel.MEMORY_AND_DISK) # ... do work ... rdd.unpersist() Parallel Programming

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?