This project was built using these technologies.
- Python
- Airflow
- Cron
- Redis
- Pandas
- Openpyxl
- PyQT5
- Docker
π Efficient ETL Process
Automates the extraction, transformation, and loading (ETL) of data from multiple Excel files using Airflow.
(Only specific excel structure)
π Advanced Data Processing
Leverages the power of Pandas and Openpyxl for fast and accurate data reading, processing, and styling.
π» Intuitive GUI with PyQt5
Includes a user-friendly graphical interface for selecting files and tracking real-time progress.
β‘ Performance Optimization
Optimized for reduced system load and faster data processing using Redis, ensuring efficient handling of large datasets.
Prerequisites:
Python
andDocker
installed on your machine
-
Clone the repository:
git clone https://github.com/NickLitwinow/XLSXAssembler_Public.git
-
Navigate into the
src
directorycd src/
-
(Terminal 1) Run the ETL client:
python app.py
-
(Terminal 2) Build the Docker image (
sudo
may require):docker build . --tag extending_airflow:latest
-
(Terminal 2) Run
docker-compose up -d
command to start docker services. -
(Terminal 2) (Optional) Run
docker-compose down -v
command to end docker services.
The PyQt5 GUI will launch, where you can select multiple Excel files and begin the ETL process. Runs the app in the development mode.
-
In the ETL client click
Add File
button and select files from theexample files
(You can add them again later if you want so) -
(Optional) To remove a file from selected, click on it's path (element) in the black selection window. Click
Remove File
to remove the file. -
Click
Merge Files
to name the output file and choose it's destination. The ETL process will start afterwards. -
To view the Airflow Dag process:
- Open
http://localhost:8080/home
in your browser. - Enter Login:
airflow
and Password:airflow
. - (Info) If you just ran the
docker-compose up -d
it may take some time for airflow to load.
- To view the Radis database:
- Open
http://localhost:8001/
in your browser. - Accept "EULA and Privacy Settings"
- Click
I already have a database
- Click
Connect to a Radis Database
with Host:redis
, Port:6379
, Name:redis-local
- Click
ADD REDIS DATABASE
- Select the
redis-local
database.
Give a β if you like this project!