# transform-design

### ses-transformer proof of concept (POC)

We'll build the solution on a laptop for development - with the goal of porting it to the Amazon s3 Parquet filesystem, and using Amazon Athena as our database

For the development environment today, well need:

* [x] Python 3+
* [x] Pycharm IDE (but this can be any IDE or test editor)
* [x] Pyspark (Apache Spark)
* [x] Postgres (just to quickly model our sql while developing)

## POC code

> The code is available here: [Github](https://github.com/tiny-engines-code/s3-spark-transform)

The files for this project are organized as follows:

* An `input` directory for some "input" data that we can test with
* A `sql` directory for SQL statements we'll use to query the final data
* A `src` directory for the pyspark code that will transform the raw SES events into our records

The program will create:

* An output\_local directory - when we run the non-streaming version
* An output\_streaming directory for the streaming examples

```
// Directory structure

├── input                           --- dev test files
│   ├── bounce
│   │   └── bounce.json
│   ├── click
│   │   └── click.json
│   ├── complaint
│   │   └── complaint.json
│   ├── delivery
│   │   └── delivery.json
│   ├── open
│   │   └── open.json
│   ├── reject
│   │   └── reject.json
│   └── send
│       └── send.json
│
├── sql                            --- sql to create pivot
│   ├── 01-mock_request_data.sql
│   ├── 02_pivot_base.sql
│   ├── 03_ses_pivot.sql
│   └── 04_join_pivot.sql
│
├── src                            --- pyspark code
   ├── main.py                             -- main
   ├── readers.py                          -- read and process files
   ├── transformer.py                      -- perform transformations
   └── writers.py                          -- write to jdbc, batch and stream writers
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://chris-lomeli.gitbook.io/tiny-engines/walk-throughs/01-motivation/05-ses-transform-poc.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
