Building a serverless application allows you to focus on your application code instead of managing and operating infrastructure. You do not have to think about provisioning or configuring servers since the cloud providers handles all of this for you…
As described in the Wikipedia page, "extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s)". Even though the crux of all ETL…
Amazon SageMaker is a fully-managed Machine Learning Service that lets its users build, train, and deploy machine learning models quickly without the need of setting up and managing the infrastructure behind it. With the advent of AWS reInvent 201…
AWS Glue is a fully managed ETL service provided by Amazon that makes it easy to extract and migrate data from one source to another whilst performing a transformation on the source data. Amongst these transformation is the Relationalize[…
AWS Glue is a fully managed ETL service provided by Amazon that makes it easy to extract and migrate data from one source to another whilst performing a transformation on the source data. AWS Glue is a combination of multiple microservices that works…
CloudWatchEvents let users create Rules for EMR cluster for events including State Change and EMR Configuration Error. With the EMR State Change event, CWEvents let users pulbish notifications for the creation of EMR clusters, among other states…
When we try to implement Cloudwatch notifications via SNS/email that are triggered when a specific EMR cluster has step state changes, the event pattern structure works when used with clusterId but fails when used with ClusterARN/cluster name. There…
The combination of Spark and Parquet is a very popular foundation for building scalable analytics platforms. In particular performance, scalability and ease of use are key elements of this solution that make it very appealing to its users. Predicate…
Amazon Athena added support for Views with the release of a new version on June 5, 2018 allowing users to use commands like CREATE VIEW, DESCRIBE VIEW, DROP VIEW, SHOW CREATE VIEW, and SHOW VIEWS in Athena. The query that defines the view runs each…
Amazon Elastic MapReduce (EMR) is a web service that uses Hadoop to quickly & cost-effectively process vast amounts of data. It helps us analyze and process vast amounts of data by distributing the computational work across a cluster of virtual…