Software Development Engineer

Blog PostsResume

Lifecycle Configuration to Persist SageMaker Notebook Instance Metrics Setup

It the previous blog post, we discussed on how to Publish and Monitor Metrics from a SageMaker Notebook Instance. There is one caveat while implementing this approach. If we restart the notebook the changes won’t persist. Only the changes made to the ML storage volume are persisted with a stop/start. Thus, all the configurations that we set up to emit metrics to CloudWatch will be lost.

In this case, we’ll use Lifecycle Configuration to set up the configurations every time the notebook starts.

We can create a lifecycle policy as shown in the following diagram:

lifecycleconfig

Now we can create our notebooks specifying the Lifecycle configurations under the Additional configuration in the Notebook instance settings block:

lcnotebookinstance

With this setup when the notebook gets created or when it’s stopped and restarted, we will have the Unified Logs Agent configurations that we expect.

In the Lifecycle configuration script, we copy a config.json file uploaded to S3 and then use it to set up the CloudWatch agent. The configuration file is the same as the one created by the Cloudwatch agent wizard in the previous post. You can edit or manually configure it as described in the documentation

The configurations file I used in my deployment:

{
  "metrics": {
    "namespace": "SageMakerNotebookInstances",
    "metrics_collected": {
      "cpu": {
        "measurement": [ "cpu_usage_idle" ],
        "metrics_collection_interval": 60,
        "resources": [ "*" ],
        "totalcpu": true
      },
      "disk": {
        "measurement": [ "used_percent" ],
        "metrics_collection_interval": 60,
        "resources": [ "*" ]
      },
      "diskio": {
        "measurement": [ "write_bytes","read_bytes", "writes", "reads" ],
        "metrics_collection_interval": 60,
        "resources": [ "*" ]
      },
      "mem": {
        "measurement": [ "mem_used_percent" ],
        "metrics_collection_interval": 60
      },
      "net": {
        "measurement": [ "bytes_sent", "bytes_recv", "packets_sent", "packets_recv" ],
        "metrics_collection_interval": 60,
        "resources": [ "*" ]
      },
      "swap": {
        "measurement": [ "swap_used_percent" ],
        "metrics_collection_interval": 60
      }
    }
  }
}

Similarly, we can create a lifecycle configuration to set up Cron Job to Publish Metrics from Custom Monitoring Scripts:

#!/bin/bash

set -e

aws s3 cp s3://<your-bucket-name>/monitor.sh .

crontab -l > mycronfile	

echo "0/1 * * * ? * monitor.sh" >> mycronfile
crontab mycron
rm mycron

© 2024 Ujjwal Bhardwaj. All Rights Reserved.