Lifecycle Configuration to Persist SageMaker Notebook Instance Metrics Setup
It the previous blog post, we discussed on how to Publish and Monitor Metrics from a SageMaker Notebook Instance. There is one caveat while implementing this approach. If we restart the notebook the changes won’t persist. Only the changes made to the ML storage volume are persisted with a stop/start. Thus, all the configurations that we set up to emit metrics to CloudWatch will be lost.
In this case, we’ll use Lifecycle Configuration to set up the configurations every time the notebook starts.
We can create a lifecycle policy as shown in the following diagram:
Now we can create our notebooks specifying the Lifecycle configurations under the Additional configuration
in the Notebook instance settings block:
With this setup when the notebook gets created or when it’s stopped and restarted, we will have the Unified Logs Agent configurations that we expect.
In the Lifecycle configuration script, we copy a config.json
file uploaded to S3 and then use it to set up the CloudWatch agent. The configuration file is the same as the one created by the Cloudwatch agent wizard in the previous post. You can edit or manually configure it as described in the documentation
The configurations file I used in my deployment:
{
"metrics": {
"namespace": "SageMakerNotebookInstances",
"metrics_collected": {
"cpu": {
"measurement": [ "cpu_usage_idle" ],
"metrics_collection_interval": 60,
"resources": [ "*" ],
"totalcpu": true
},
"disk": {
"measurement": [ "used_percent" ],
"metrics_collection_interval": 60,
"resources": [ "*" ]
},
"diskio": {
"measurement": [ "write_bytes","read_bytes", "writes", "reads" ],
"metrics_collection_interval": 60,
"resources": [ "*" ]
},
"mem": {
"measurement": [ "mem_used_percent" ],
"metrics_collection_interval": 60
},
"net": {
"measurement": [ "bytes_sent", "bytes_recv", "packets_sent", "packets_recv" ],
"metrics_collection_interval": 60,
"resources": [ "*" ]
},
"swap": {
"measurement": [ "swap_used_percent" ],
"metrics_collection_interval": 60
}
}
}
}
Similarly, we can create a lifecycle configuration to set up Cron Job to Publish Metrics from Custom Monitoring Scripts
:
#!/bin/bash
set -e
aws s3 cp s3://<your-bucket-name>/monitor.sh .
crontab -l > mycronfile
echo "0/1 * * * ? * monitor.sh" >> mycronfile
crontab mycron
rm mycron