Publish and Monitor SageMaker Notebook Instance Metrics
SageMaker Notebook Instances do not publish any metrics to CloudWatch unlike other SageMaker components like Endpoints. This prevents us from observing any metrics and in turn creating alarms on those metrics.
However, considering the fact that we have access to the Terminal inside the Notebook Instance (Open Jupyter -> New -> Terminal)
, we can consider the following options to publish these metrics.
Set Up the Unified CloudWatch Agent
The unified CloudWatch agent enables you to collect system-level metrics from Amazon EC2 instances as well as on-premises servers across operating systems. You can store and view the metrics that you collect with the CloudWatch agent in CloudWatch just as you can with any other CloudWatch metrics. SageMaker Notebook Instances come preinstalled with the Unified CloudWatch Agent. You can set it up using the following command.
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
This will launch the CloudWatch Agent Configuration Wizard. Answer its questions to customize the configuration file for your server. In my setup, I made the following changes, keeping the rest as default.
After this, start the CloudWatch agent on a server with the command:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:///opt/aws/amazon-cloudwatch-agent/bin/config.json -s
This sets up metrics under the Custom Namespace 'CWAgent' and you can find the following metrics in this namespace.
Cron Job to Publish Metrics from Custom Monitoring Scripts
If you have a custom monitorting script that emits metrics, you can use the AWS CLI to publish these custom metrics to CloudWatch, assuming that the IAM role being used on the SageMaker instance allows these API calls. You can then setup cron jobs to publish the metrics at a fixed interval. Similar to the above case, you can set up Cloudwatch alarms with these metrics as well.
The following bash script collects the amount of used EBS volume (in percentage) and sends the metric to CloudWatch.
#!/bin/sh
INSTANCE_NAME=MyInstance
EBSUsed=`df -h | grep /home/ec2-user/SageMaker | awk '{print $5}'`
aws cloudwatch put-metric-data --metric-name EBSUsed --namespace MySageMaker --unit Percent --value ${EBSUsed::-1} --dimensions InstanceName=$INSTANCE_NAME
Thereafter, you can setup cron jobs to publish the metrics at a fixed interval. You can run sudo crontab -e
and add the following line to your cron jobs so that it automatically runs every minute:
* * * * * /home/ec2-user/SageMaker/monitor.sh
This gives the following graph metrics as a result.
Follow up on the blog post for persisting the publishing of Cloudwatch metrics across stop/start of the Notebook Instance.