Software Development Engineer

Blog PostsResume

Create Virtual Views with AWS Glue and Query them Using Athena

Amazon Athena added support for Views with the release of a new version on June 5, 2018 allowing users to use commands like CREATE VIEW, DESCRIBE VIEW, DROP VIEW, SHOW CREATE VIEW, and SHOW VIEWS in Athena. The query that defines the view runs each time you reference the view in your query. However, it comes with certain limitations.

Glue allows the creation of tables with type VIRTUAL_VIEW in the Data Catalog. When Athena has been migrated to Data Catalog, if a table of type VIRTUAL_VIEW exists, the database will not list its tables in the Athena console and we cannot query LIST TABLES; as it returns an error. However, Athena can still query the tables in the database that are not virtual views and will throw an error when tying to query the VIRTUAL_VIEW table.

Subsequently, if your use case involves performing SELECT operation on the VIEW from Athena, you can perform the operation by updating the glue.createTable() method of the AWS Glue SDK, changing the parameters as:

  var params = {
    ......
    TableInput: {
        ......
        Parameters: {
            "comment": "Presto View",
            "presto_view": "true"
        }
        StorageDescriptor: {
            Columns: [
                {
                    Name: 'col0',
                    Type: 'bigint'
                },
                {
                    Name: 'col1',
                    Type: 'string'
                }
            ],
            ...........

        },
       ..........
        TableType: 'VIRTUAL_VIEW',
        ViewOriginalText: '/* Presto View: eyJvcmlnaW5hbFNxbCI6IlNFTEVDVFxuICBcImNvbDBcIlxuLCBcImNvbDFcIlxuRlJPTVxuICBrbF9kYXRhMDRcbiIsImNhdGFsb2ciOiJhd3NkYXRhY2F0YWxvZyIsInNjaGVtYSI6ImRlZmF1bHQiLCJjb2x1bW5zIjpbeyJuYW1lIjoiY29sMCIsInR5cGUiOiJiaWdpbnQifSx7Im5hbWUiOiJjb2wxIiwidHlwZSI6InZhcmNoYXIifV19 */',
        ViewExpandedText: '/* Presto View */'
    },
};

In the above code snippet, the value for the "ViewOriginalText" contains the base64 encoded format of the text:

  {
    "originalSql":"SELECT  \"col0\", \"col1\"FROM kl_data04",
    "catalog":"awsdatacatalog",
    "schema":"default",
    "columns":[
      {
        "name":"col0",
        "type":"bigint"
      },
      {
        "name":"col1",
        "type":"varchar"
      }
    ]
  }

where, originalSql is the SQL command used to generate the View, selecting specific columns from the table (here, kl_data04).

Background

When we run the query CREATE OR REPLACE VIEW view_name AS... from Athena, it creates a table in Glue's Data Catalog. Upon running the GetTable action on this table, we retrieve a JSON output. This JSON output is masqueraded into a new CreateTable action to create the Virtual View.


© 2024 Ujjwal Bhardwaj. All Rights Reserved.