transform-design
Last updated
Was this helpful?
Last updated
Was this helpful?
We'll build the solution on a laptop for development - with the goal of porting it to the Amazon s3 Parquet filesystem, and using Amazon Athena as our database
For the development environment today, well need:
The code is available here:
The files for this project are organized as follows:
An input
directory for some "input" data that we can test with
A sql
directory for SQL statements we'll use to query the final data
A src
directory for the pyspark code that will transform the raw SES events into our records
The program will create:
An output_local directory - when we run the non-streaming version
An output_streaming directory for the streaming examples