Ensuring Spark Launches the ApplicationMaster on an On-demand Node of EMR Cluster

The instance fleets configuration for EMR clusters allows us to provision core nodes with different purchasing options (On-Demand/Spot). When a job is submitted to EMR, it may run the application master process in any of the available core nodes. Since some of these core nodes can be Spot instances, if the application master process is launched in a spot core node, the job may fail because of Core Node termination due to spot availability/pricing. Thus, we need a workaround to ensure that Spark/Hadoop job launches the Application Master on an On-Demand node.

With the advent of version 5.19.0, Amazon EMR uses the built-in YARN node labels feature to prevent job failure because of Task Node spot instance termination. EMR does this by allowing application master processes to run only on core nodes. The application master process controls running jobs and needs to stay alive for the life of the job.[1]

The documentation[1] mentions the configurations EMR sets to achieve the functionality with Task Nodes. Following are the updates we need to make to ensure our use case:

yarn-site (yarn-site.xml) On All Nodes
    yarn.node-labels.enabled: true
    yarn.node-labels.am.default-node-label-expression: 'CORE_ONDEMAND'
    yarn.node-labels.fs-store.root-dir: '/apps/yarn/nodelabels'
    yarn.node-labels.configuration-type: 'distributed'
yarn-site (yarn-site.xml) On Master And Core Nodes (On-demand nodes)
    yarn.nodemanager.node-labels.provider: 'config'
    yarn.nodemanager.node-labels.provider.configured-node-partition: 'CORE_ONDEMAND'
yarn-site (yarn-site.xml) Core Nodes (spot nodes)
    yarn.nodemanager.node-labels.provider: 'config'
    yarn.nodemanager.node-labels.provider.configured-node-partition: 'CORE'
capacity-scheduler (capacity-scheduler.xml) On All Nodes
    yarn.scheduler.capacity.root.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
    yarn.scheduler.capacity.root.accessible-node-labels.CORE_ONDEMAND.capacity: 100
    yarn.scheduler.capacity.root.default.accessible-node-labels: '*'
    yarn.scheduler.capacity.root.default.accessible-node-labels.CORE.capacity: 100
    yarn.scheduler.capacity.root.default.accessible-node-labels.CORE_ONDEMAND.capacity: 100

After making the above changes, restart all the Node Managers as well as the Resource Manager.

On the Core Nodes

sudo stop hadoop-yarn-nodemanager
sudo start hadoop-yarn-nodemanager

On the Master Node

sudo stop hadoop-yarn-resourcemanager
sudo start hadoop-yarn-resourcemanager
yarn rmadmin -refreshQueues

PS: The above configuration changes isolated the CORE_ONDEMAND only for the Application Masters whilst all the other containers were launched in rest of the nodes.


[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html#emr-plan-spot-YARN