This repository contains scripts for setting up a basic Apache Kafka environment using Docker, implementing a producer script, a consumer script, and loading data into PostgreSQL.
- Docker installed on your machine.
- Python 3.8 installed.
- PostgreSQL database set up.
-
Run Docker Compose to start Kafka and Zookeeper containers.
bash docker-compose up -d
-
Check if Kafka and Zookeeper containers are running.
bash docker ps
-
Navigate to the producer directory.
bash cd producer
-
Install Python dependencies.
bash pip install -r requirements.txt
-
Run the Kafka producer script.
bash python3 producer.py
- Edit producer.py to modify the data being produced and Kafka topic details.
-
Navigate to the consumer directory.
bash cd real_time_api
-
Install Python dependencies.
bash pip install -r requirements.txt
-
Run the Kafka consumer script to load data into PostgreSQL.
bash python3 consumer.py
- Edit consumer.py to customize the consumer behavior, Kafka topic, and other settings.
- Ensure PostgreSQL connection details are correctly configured in the consumer script.
-
Ensure your PostgreSQL database is running.
-
Modify the consumer script (consumer.py) to include logic for inserting or updating data into your PostgreSQL database. Use a PostgreSQL library like psycopg2 to interact with the database.
Edit consumer.py(Script) to adjust database connection details and data loading behavior.
- Modify Docker Compose file (docker-compose.yml) for advanced Kafka configurations.
- Ensure proper network configurations between containers.
- Run Producer script
bash
python3 producer.py
- Structure_streaming.py or streaming_trigger.py Script
bash
python3 Structure_streaming.py --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0