Skip to content

An ETL tool, automating the merging of data from multiple Excel files. Built with: Python, Airflow, Cron, Redis, Pandas, Openpyxl, PyQT5, Docker.

Notifications You must be signed in to change notification settings

NickLitwinow/XLSXAssembler_Public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

XLSX Assembler – ETL Tool for Merging Excel Data

Demo

Architecture

Demo

forthebadge   forthebadge   forthebadge   GitHub Repo stars   GitHub forks

Built With

This project was built using these technologies.

  • Python
  • Airflow
  • Cron
  • Redis
  • Pandas
  • Openpyxl
  • PyQT5
  • Docker

Features

πŸš€ Efficient ETL Process

Automates the extraction, transformation, and loading (ETL) of data from multiple Excel files using Airflow.
(Only specific excel structure)

πŸ“Š Advanced Data Processing

Leverages the power of Pandas and Openpyxl for fast and accurate data reading, processing, and styling.

πŸ’» Intuitive GUI with PyQt5

Includes a user-friendly graphical interface for selecting files and tracking real-time progress.

⚑ Performance Optimization

Optimized for reduced system load and faster data processing using Redis, ensuring efficient handling of large datasets.

Getting Started

Prerequisites:

  • Python and Docker installed on your machine

πŸ›  Installation and Setup Instructions

  1. Clone the repository: git clone https://github.com/NickLitwinow/XLSXAssembler_Public.git

  2. Navigate into the src directory cd src/

  3. (Terminal 1) Run the ETL client: python app.py

  4. (Terminal 2) Build the Docker image (sudo may require): docker build . --tag extending_airflow:latest

  5. (Terminal 2) Run docker-compose up -d command to start docker services.

  6. (Terminal 2) (Optional) Run docker-compose down -v command to end docker services.

The PyQt5 GUI will launch, where you can select multiple Excel files and begin the ETL process. Runs the app in the development mode.

Usage Instructions Example

  1. In the ETL client click Add File button and select files from the example files (You can add them again later if you want so)

  2. (Optional) To remove a file from selected, click on it's path (element) in the black selection window. Click Remove File to remove the file.

  3. Click Merge Files to name the output file and choose it's destination. The ETL process will start afterwards.

  4. To view the Airflow Dag process:

  • Open http://localhost:8080/home in your browser.
  • Enter Login: airflow and Password: airflow.
  • (Info) If you just ran the docker-compose up -d it may take some time for airflow to load.
  1. To view the Radis database:
  • Open http://localhost:8001/ in your browser.
  • Accept "EULA and Privacy Settings"
  • Click I already have a database
  • Click Connect to a Radis Database with Host: redis, Port: 6379, Name: redis-local
  • Click ADD REDIS DATABASE
  • Select the redis-local database.

Show your support

Give a ⭐ if you like this project!