The Data Engineering Project
In this project, we aimed to develop a robust data platform to Extract, Transform, and Load (ETL) data from an operational database into a data lake and subsequently into a data warehouse hosted in AWS. The goal was to ensure data integrity, resilience, and efficient automation, demonstrating our proficiency in Python, SQL, AWS services, database modeling, and Agile methodologies.
This project highlights my ability to design, implement, and manage complex data engineering solutions, leveraging modern cloud technologies and adhering to best practices in software development and project management.
Extract Phase
Utilized PG8000 to connect to the ToteSys database and capture changes at 30-minute intervals.
Stored the ingested data as JSON files in the structured ingestion zone S3 bucket.
Scheduled regular data extraction using AWS EventBridge.
Transform Phase
Triggered by AWS Lambda upon file arrival in the ingestion zone.
Employed pandas to transform the data to fit the star schema format.
Converted transformed data into Parquet files and saved them in the processed zone S3 bucket.
Load Phase
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.