Update and Persist Git Configurations in AWS Sagemaker Notebook Instance

Amazon SageMaker is a fully-managed Machine Learning Service that lets its users build, train, and deploy machine learning models quickly without the need of setting up and managing the infrastructure behind it.

With the advent of AWS reInvent 2018, the integration of Git for increased persistence, collaboration, and reproducibility is now supported with SageMaker. It’s now possible to associate GitHub, AWS CodeCommit, and any self-hosted Git repository with Amazon SageMaker notebook instances to easily and securely collaborate and ensure version-control with Jupyter Notebooks.

Since most Machine Learning projects are built with a collaborative effort, this feature is a step closer in making the development of projects much easier. However, there is one caveat here. Whenever a Sagemaker Notebook instance is restarted, all the configuration and additional libraries added are removed, requiring that you manually add them again when you restart your notebook instances.

This creates a roadblock as when we clone any Git repository in SageMaker the default committer will be set as 'EC2 Default User'. Therefore, in order to update the user name and email of the committer, we need to go to the local git repository's directory from Jupyterhub's Terminal, and use the below commands to update username and linked email -

$ git config user.name "Ujjwal Bhardwaj"
$ git config user.email "ujjwalb1996@gmail.com"

In the above configuration update, ensure that the '--global' tag is avoided as thereafter whenever the Notebook instance is restarted, this information does not persist.

After executing the above commands, even on restarting the instance, when we traverse to our local repository, we can verify the previous set git configurations using the below command -

$ git config --list

Where we should be able to see something like -

user.name=Ujjwal Bhardwaj

Post this when I commit anything from this local directory, the user name is picked up as 'Ujjwal Bhardwaj' or mainly the one linked with the specified email ID.