Disable Download Button on the SageMaker Jupyter Notebook
Amazon SageMaker notebook instance is a managed ML compute instance that runs the Jupyter Notebook Application. The Jupyter notebook enables you to fetch raw files and download them, and even exposes a download button. Due to security and compliant reasons, you would want to limit your data scientists to access data from the notebook instance only and would like to restrict them from downloading the data to their local machines.
You can achieve this by updating the jupyter_notebook_config.py
file, overriding the c.ContentsManager.files_handler_class
parameter. This parameter is used to define the handler class to use when serving raw file requests.
Referring the default FilesHandler class, we can create a new class that handles the HEAD and GET requests to return an HTTP 403: Forbidden error.
from tornado import web
from notebook.base.handlers import IPythonHandler
class ForbidFilesHandler(IPythonHandler):
@web.authenticated
def head(self, path):
self.log.info("HEAD: File download forbidden.")
raise web.HTTPError(403)
@web.authenticated
def get(self, path, include_body=True):
self.log.info("GET: File download forbidden.")
raise web.HTTPError(403)
Since SageMaker Notebook Instances do no persist any configuration changes or data stored outside the /home/ec2-user/SageMaker
folder, it is essential that we make these changes via a Lifecycle Configurations. You can create a Lifecycle Configuration, adding the following as the Start Notebook script.
# Creating ForbidFilesHandler class, overriding the default files_handler_class
cat <<END >/home/ec2-user/.jupyter/handlers.py
from tornado import web
from notebook.base.handlers import IPythonHandler
class ForbidFilesHandler(IPythonHandler):
@web.authenticated
def head(self, path):
self.log.info("HEAD: File download forbidden.")
raise web.HTTPError(403)
@web.authenticated
def get(self, path, include_body=True):
self.log.info("GET: File download forbidden.")
raise web.HTTPError(403)
END
# Updating the files_handler_class
cat <<END >>/home/ec2-user/.jupyter/jupyter_notebook_config.py
import os, sys
sys.path.append('/home/ec2-user/.jupyter/')
import handlers
c.ContentsManager.files_handler_class = 'handlers.ForbidFilesHandler'
c.ContentsManager.files_handler_params = {}
END
# Reboot the Jupyterhub notebook
reboot
The reboot statement is required at the end as you would need to restart the docker container running the Jupyterhub notebook and not the notebook instance itself for the changes to take effect.
There are other workarounds to this such as using AWS Workspaces. You can launch the Workspaces resource and access the Notebook instances via this. Another option is to use Amazon Appstream and follow a similar process to access the instances. However, these workarounds require additional management overhead, cost, and complexity and thus would generally be avoided.