SF-Crime-Analysis Project report

The screenshot of kafka-consumer-console output is following:

The screenshot of Spark UI is given below:

The screenshot of Stages in Spark UI is given below:

Question 1. How did changing values on the SparkSession property parameters affect the throughput and latency of the data?

processedRowsPerSecond has impact on the throughput. The higher the value of of processedRowsPerSecond the higher is the number of rows processed per second increasing throughput.

Question 2. What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?

The few important SparkSession proeprty that can help to enhance the processedRowsPerSecond are given below:

spark.default.parallelism
spark.streaming.kafka.maxRatePerPartition
spark.sql.shuffle.partitions

To chose optimal value of these we can use some back of the envelope calculation given the resources we have and the target performance we want to achieve. Sometimes empricially observing with different values of these parameters can also help to decide the right values. We can tell whether these values are optimal are not by observing processedRowsPerSecond.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
consumer_server.py		consumer_server.py
consumerconsolescreentshot.png		consumerconsolescreentshot.png
data_stream.py		data_stream.py
kafka_server.py		kafka_server.py
producer_server.py		producer_server.py
screenshotofui.png		screenshotofui.png
server.properties		server.properties
sparkstages.png		sparkstages.png
zookeeper.properties		zookeeper.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SF-Crime-Analysis Project report

The screenshot of kafka-consumer-console output is following:

The screenshot of Spark UI is given below:

The screenshot of Stages in Spark UI is given below:

Question 1. How did changing values on the SparkSession property parameters affect the throughput and latency of the data?

Question 2. What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?

About

Releases

Packages

Languages

johirbuet/SF-Crime-Analysis

Folders and files

Latest commit

History

Repository files navigation

SF-Crime-Analysis Project report

The screenshot of kafka-consumer-console output is following:

The screenshot of Spark UI is given below:

The screenshot of Stages in Spark UI is given below:

Question 1. How did changing values on the SparkSession property parameters affect the throughput and latency of the data?

Question 2. What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages