Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Jiangjie Qin committed Aug 28, 2017
0 parents commit f643547
Show file tree
Hide file tree
Showing 176 changed files with 24,967 additions and 0 deletions.
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.project
.settings
.classpath
.gradle
.idea
*.iml
*.ipr
*.iws
/build
*/build
out/
*/bin/
.reviewboardrc
logs
27 changes: 27 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Contribution Agreement
======================

As a contributor, you represent that the code you submit is your
original work or that of your employer (in which case you represent you
have the right to bind your employer). By submitting code, you (and, if
applicable, your employer) are licensing the submitted code to LinkedIn
and the open source community subject to the BSD 2-Clause license.

Responsible Disclosure of Security Vulnerabilities
==================================================

Please do not file reports on Github for security issues.
Please review the guidelines on at (link to more info).
Reports should be encrypted using PGP (link to PGP key) and sent to
[email protected] preferably with the title "Github linkedin/<project> - <short summary>".

Tips for Getting Your Pull Request Accepted
===========================================

*Note: These are suggestions. Customize as needed.*

1. Make sure all new features are tested and the tests pass.
2. Bug fixes must include a test case demonstrating the error that it fixes.
3. Open an issue first and seek advice for your change before submitting
a pull request. Large features which have never been discussed are
unlikely to be accepted. **You have been warned.**
25 changes: 25 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
BSD 2-CLAUSE LICENSE

Copyright 2017 LinkedIn Corporation.
All Rights Reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
9 changes: 9 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Copyright 2017 LinkedIn Corporation
All Rights Reserved.

Licensed under the BSD 2-Clause License (the "License").
See LICENSE in the project root for license information.

This product includes Apache Kafka (http://kafka.apache.org)
Copyright (c) 2017 The Apache Software Foundation
License: Apache 2.0
109 changes: 109 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Cruise Control for Apache Kafka
===================

### Introduction ###
Cruise Control is a product that helps run Apache Kafka clusters at large scale. Due to the popularity of
Apache Kafka, many companies have bigger and bigger Kafka clusters. At LinkedIn, we have 1800+ Kafka brokers,
which means broker deaths are an almost daily occurrence and balancing the workload of Kafka also becomes a big overhead.

Kafka Cruise control is designed to address this operation scalability issue.

### Features ###
Kafka Cruise Control provides the following features out of the box:

* Resource utilization tracking for brokers, topics and partitions.

* Multi-goal rebalance proposal generation
* Rack-awareness
* Resource utilization balance (CPU, DISK, Network I/O)
* Leader traffic distribution
* Replica distribution for topics
* Global replica distribution
* write your own and plug them in

* Anomaly detection and alerting for the Kafka cluster including
* goal violation
* broker failure detection

* Admin operations including:
* Add brokers
* Decommission brokers
* Rebalance the cluster

### Quick Start ###
1. Modify config/cruisecontrol.properties to
* fill in `bootstrap.servers` and `zookeeper.connect` to the Kafka cluster to be monitored.
* set `metric.sampler.class` to your implementation (the default sampler class is CruiseControlMetricsReporterSampler)
* set `sample.store.class` to your implementation if necessary (the default SampleStore is KafkaSampleStore)
2. Run the following command
```
./gradlew jar copyDependantLibs
./kafka-cruise-control-start.sh [-jars PATH_TO_YOUR_JAR_1,PATH_TO_YOUR_JAR_2] config/cruisecontrol.properties [port]
```
3. visit http://localhost:9090/kafkacruisecontrol/state or http://localhost:\[port\]/kafkacruisecontrol/state if
you specified the port when starting cruise control.
### REST API ###
Cruise Control has provided a [REST API](https://github.com/linkedin/cruise-control/wiki/REST-APIs) for users
to interact with. See the wiki page for more details.
### How Does It Work ###
Cruise Control tries to understand the workload of each replica and provide a optimization
solution to the current cluster based on this knowledge.
Cruise Control periodically gets the resource utilization samples at both broker and partition level to
understand the traffic pattern of each partition. Based on the traffic characteristics of all the partitions,
it derives the load impact of each partition in the brokers. Cruise Control then builds a workload
model to simulate the workload of the Kafka cluster. The goal optimizer will explore different ways to generate
the cluster workload optimization proposals based on the list of goals specified by the users.
Cruise Control also monitors the liveness of all the brokers in the cluster. When a broker fails in the
cluster, Cruise Control will automatically move the replicas on the failed broker to the healthy brokers to
avoid the loss of redundancy.
For more details about how Cruise Control achieves that, see
[these slides](https://www.slideshare.net/JiangjieQin/introduction-to-kafka-cruise-control-68180931).
### Configurations for Cruise Control ###
To read more about the configurations. Check the
[configurations wiki page](https://github.com/linkedin/cruise-control/wiki/Configurations).
### Pluggable Components ###
More about pluggable components can be found in the
[pluggable components wiki page](https://github.com/linkedin/cruise-control/wiki/Pluggable-Components).
#### Metric Sampler ####
The metric sampler is one of the most important pluggables in Cruise Control, it allows users to easily
deploy Cruise Control to various environments and work with any existing metric system.
Cruise Control provides a metrics reporter which can be configured in your Apache
Kafka server. It will produce performance metrics to a kafka metrics topic which can be consumed by Cruise Control.
#### Sample Store ####
The Sample Store is used to store the collected metric samples and training samples to external storage.
One problem in metric sampling is that we are using some derived data from the raw metrics. And the way we
derive the data relies on the metadata of the cluster at that point. So when we look at the old metrics, if we
do not know the metadata at the point the metric was collected the derived data would not be accurate. Sample
Store help solve this problem by storing the derived data directly to an external storage for later loading.
The default sample store implementation produces the metric samples back to Kafka.
#### Goals ####
The goals in Cruise Control are pluggable with different priorities. The default goals are (in order of decreasing priority):
* **RackAwareCapacityGoal** - A goal that ensures all the replicas of each partition are assigned in a rack aware manner. All the broker’s
resource utilization are below a given threshold.
* **PotentialNwOutGoal** - A goal that ensures the potential network output (when all the replicas becomes leaders) on each of the broker do not exceed the broker’s network outbound bandwidth capacity.
* **ResourceDistributionGoal** - Attempt to make the workload variance among all the brokers are within a certain range. This goal does not do anything if the cluster is in a low utilization mode (when all the resource utilization of each broker is below a configured percentage.)
* **LeaderBytesInDistributsionGoal** - Attempt to equalize the leader bytes in rate on each host.
* **TopicReplicaDistributionGoal** - Attempt to maintain an even distribution of any topic's replicas across the entire cluster.
* **ReplicaDistributionGoal** - Attempt to make all the brokers in a cluster have a similar amount of replicas.
#### Anomaly Notifier ####
The anomaly notifier allows users to be notified when an anomaly is detected. Anomalies include:
* Broker failure
* Goal violation
In addition to anomaly notifications users can specify actions to be taken in response to the anomaly. the following actions are supported:
* **fix** - fix the problem right away.
* **check** - check the situation again after a given delay
* **ignore** - ignore the anomaly
153 changes: 153 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
/*
* Copyright 2017 LinkedIn Corp. Licensed under the BSD 2-Clause License (the "License").
 See License in the project root for license information.
*/

allprojects {
apply plugin: 'idea'
apply plugin: 'eclipse'

repositories {
mavenCentral()
}

//wrapper generation task
task wrapper(type: Wrapper) {
gradleVersion = '3.5'
}
}

subprojects {
apply plugin: 'java'
apply plugin: 'checkstyle'
apply plugin: 'findbugs'

//code quality and inspections
checkstyle {
toolVersion = '7.5.1'
ignoreFailures = false
configFile = rootProject.file('checkstyle/checkstyle.xml')
}

findbugs {
toolVersion = "3.0.1"
excludeFilter = file("$rootDir/gradle/findbugs-exclude.xml")
ignoreFailures = false
}

test.dependsOn('checkstyleMain', 'checkstyleTest', 'findbugsMain', 'findbugsTest')

tasks.withType(FindBugs) {
reports {
xml.enabled (project.hasProperty('xmlFindBugsReport'))
html.enabled (!project.hasProperty('xmlFindBugsReport'))
}
}

jar {
from "$rootDir/LICENSE"
from "$rootDir/NOTICE"
}

test {
useJUnit {}
testLogging {
events "passed", "failed", "skipped"
}
maxParallelForks = Runtime.runtime.availableProcessors()
}
}

project(':cruise-control') {
apply plugin: 'scala'

//needed because our java classes depend on scala classes, so must be compiled by scala
sourceSets {
main {
java {
srcDirs = []
}

scala {
srcDirs = ['src/main/java', 'src/main/scala']
}
}

test {
java {
srcDirs = []
}
scala {
srcDirs = ['src/test/java', 'src/test/scala']
}
}

}

dependencies {
compile project(':cruise-control-metrics-reporter')
compile "org.slf4j:slf4j-api:1.7.25"
compile "org.apache.zookeeper:zookeeper:3.4.6"
compile "org.apache.kafka:kafka_2.10:0.10.1.0"
compile 'org.apache.kafka:kafka_2.10:0.10.1.0:test'
compile 'org.apache.kafka:kafka-clients:0.10.1.0'
compile "org.scala-lang:scala-library:2.10.4"
compile 'junit:junit:4.12'
compile 'org.apache.commons:commons-math3:3.6.1'
compile 'com.google.code.gson:gson:2.7'
compile 'org.eclipse.jetty:jetty-server:9.4.6.v20170531'
compile 'org.eclipse.jetty:jetty-servlet:9.4.6.v20170531'
compile 'io.dropwizard.metrics:metrics-core:3.2.2'

testCompile project(path: ':cruise-control-metrics-reporter', configuration: 'testOutput')
testCompile "org.scala-lang:scala-library:2.10.4"
testCompile 'org.easymock:easymock:3.4'
testCompile 'org.apache.kafka:kafka_2.10:0.10.1.0:test'
}

tasks.create(name: "copyDependantLibs", type: Copy) {
from (configurations.testRuntime) {
include('slf4j-log4j12*')
}
from (configurations.runtime) {

}
into "$buildDir/dependant-libs"
duplicatesStrategy 'exclude'
}

compileScala.doLast {
tasks.copyDependantLibs.execute()
}
}

project(':cruise-control-metrics-reporter') {
apply plugin: 'scala'

configurations {
testOutput
}

dependencies {
compile "org.slf4j:slf4j-api:1.7.25"
compile "org.apache.kafka:kafka_2.10:0.10.1.0"
compile 'org.apache.kafka:kafka-clients:0.10.1.0'
compile 'junit:junit:4.12'
compile 'org.apache.commons:commons-math3:3.6.1'

testCompile 'org.bouncycastle:bcpkix-jdk15on:1.54'
testOutput sourceSets.test.output
}

sourceSets {
test {
java {
srcDirs = []
}
scala {
srcDirs = ['src/test/java', 'src/test/scala']
}
}

}
}

Loading

0 comments on commit f643547

Please sign in to comment.