Ensuring Spark Launches the ApplicationMaster on an On-demand Node of EMR Cluster
The instance fleets configuration for EMR clusters allows us to provision core nodes with different purchasing options (On-Demand/Spot). When a job is submitted to EMR, it may run the application master process in any of the available core nodes. Since some of these core nodes can be Spot instances, if the application master process is launched in a spot core node, the job may fail because of Core Node termination due to spot availability/pricing. Thus, we need a workaround to ensure that Spark/Hadoop job launches the Application Master on an On-Demand node.
With the advent of version 5.19.0, Amazon EMR uses the built-in YARN node labels feature to prevent job failure because of Task Node spot instance termination. EMR does this by allowing application master processes to run only on core nodes. The application master process controls running jobs and needs to stay alive for the life of the job.
The documentation mentions the configurations EMR sets to achieve the functionality with Task Nodes. Following are the updates we need to make to ensure our use case:
yarn-site (yarn-site.xml) On All Nodes yarn.node-labels.enabled: true yarn.node-labels.am.default-node-label-expression: 'CORE_ONDEMAND' yarn.node-labels.fs-store.root-dir: '/apps/yarn/nodelabels' yarn.node-labels.configuration-type: 'distributed' yarn-site (yarn-site.xml) On Master And Core Nodes (On-demand nodes) yarn.nodemanager.node-labels.provider: 'config' yarn.nodemanager.node-labels.provider.configured-node-partition: 'CORE_ONDEMAND' yarn-site (yarn-site.xml) Core Nodes (spot nodes) yarn.nodemanager.node-labels.provider: 'config' yarn.nodemanager.node-labels.provider.configured-node-partition: 'CORE' capacity-scheduler (capacity-scheduler.xml) On All Nodes yarn.scheduler.capacity.root.accessible-node-labels: '*' yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100 yarn.scheduler.capacity.root.accessible-node-labels.CORE_ONDEMAND.capacity: 100 yarn.scheduler.capacity.root.default.accessible-node-labels: '*' yarn.scheduler.capacity.root.default.accessible-node-labels.CORE.capacity: 100 yarn.scheduler.capacity.root.default.accessible-node-labels.CORE_ONDEMAND.capacity: 100
After making the above changes, restart all the Node Managers as well as the Resource Manager.
On the Core Nodes
sudo stop hadoop-yarn-nodemanager sudo start hadoop-yarn-nodemanager exit
On the Master Node
sudo stop hadoop-yarn-resourcemanager sudo start hadoop-yarn-resourcemanager yarn rmadmin -refreshQueues
PS: The above configuration changes isolated the CORE_ONDEMAND only for the Application Masters whilst all the other containers were launched in rest of the nodes.