architecture

The data is generated at AWS and pushed into Firehose delivery streams that we provide. There's one delivery stream for each event class (e.g. Sent, Rejected, Bounced, Delivered, ...) and they are store in separate s3 folders.

We'll create an Apache Spark Structured Streaming solution to read the files, transform them and move them into a common bucket in parquet format.

Once in parquet they can be processed and queried in AWS Athena, Redshift or Snowflake.

Data ingestion

Inputs and outputs

Data flow