Data Engineering
Projects

Here you will find my data engineering projects
which include; ETL with Python, using Apache Airflow to build data pipelines,
building pipelines for streaming data using Kafka, and performing ETL using Shell Scripts.

Tokyo Olympics data analytics with Azure

Analyze Tokyo 2021 Olympics data using tools such as Azure Data Lake Storage Gen2, Data Factory, Databricks, Synapse analytics, Power BI

Task:

  • Extract data from github repo
  • Create a pipeline in Azure Data Factory to extract data
  • Load data to Azure Data Lake Storage Gen2
  • Transformed data store in the Azure data lake in Azure Databricks
  • Load the transformed data inot Azure Data Lake storage Gen2
  • Create Power BI dashboard

ETL Data Pipelines using Bash with Airflow

Create an Apache Airflow workflow to execute ETL processes using shell scripting and Apache Airflow to collect data available in different formats and consolidate it into a single file from different toll plazas.

Task:

  • Download data
  • Extract data from a csv file
  • Extract data from a tsv file
  • Extract data from a fixed width file
  • Combine the extracted data into a single file
  • Transform the data
  • Load the transformed data

ETL Data Pipelines using Apache Airflow

An Apache Airflow workflow to execute ETL processes using Apache Airflow to collect data in different formats and combine them into a single file, (a) using BashOperator (b) using PythonOperator to create tasks.

Task:

  • Download data
  • Extract data from a csv file, tsv file
  • Extract data from a fixed width file
  • Combine the extracted data into a single file
  • Transform the data and Load the transformed data

ETL process using
Python

The aim of this project is to write a simple python script to perform an ETL process.

Task:

  • Download the datasets
  • Extract bank and market cap data from the JSON file
  • Transform the market cap currency using the exchange rate data from a csv file
  • Load the transformed data into a seperate CSV
  • Log the process
  • Run the ETL process