forked from linkedin/cruise-control
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Jiangjie Qin
committed
Aug 28, 2017
0 parents
commit f643547
Showing
176 changed files
with
24,967 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
.project | ||
.settings | ||
.classpath | ||
.gradle | ||
.idea | ||
*.iml | ||
*.ipr | ||
*.iws | ||
/build | ||
*/build | ||
out/ | ||
*/bin/ | ||
.reviewboardrc | ||
logs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Contribution Agreement | ||
====================== | ||
|
||
As a contributor, you represent that the code you submit is your | ||
original work or that of your employer (in which case you represent you | ||
have the right to bind your employer). By submitting code, you (and, if | ||
applicable, your employer) are licensing the submitted code to LinkedIn | ||
and the open source community subject to the BSD 2-Clause license. | ||
|
||
Responsible Disclosure of Security Vulnerabilities | ||
================================================== | ||
|
||
Please do not file reports on Github for security issues. | ||
Please review the guidelines on at (link to more info). | ||
Reports should be encrypted using PGP (link to PGP key) and sent to | ||
[email protected] preferably with the title "Github linkedin/<project> - <short summary>". | ||
|
||
Tips for Getting Your Pull Request Accepted | ||
=========================================== | ||
|
||
*Note: These are suggestions. Customize as needed.* | ||
|
||
1. Make sure all new features are tested and the tests pass. | ||
2. Bug fixes must include a test case demonstrating the error that it fixes. | ||
3. Open an issue first and seek advice for your change before submitting | ||
a pull request. Large features which have never been discussed are | ||
unlikely to be accepted. **You have been warned.** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
BSD 2-CLAUSE LICENSE | ||
|
||
Copyright 2017 LinkedIn Corporation. | ||
All Rights Reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | ||
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Copyright 2017 LinkedIn Corporation | ||
All Rights Reserved. | ||
|
||
Licensed under the BSD 2-Clause License (the "License"). | ||
See LICENSE in the project root for license information. | ||
|
||
This product includes Apache Kafka (http://kafka.apache.org) | ||
Copyright (c) 2017 The Apache Software Foundation | ||
License: Apache 2.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
Cruise Control for Apache Kafka | ||
=================== | ||
|
||
### Introduction ### | ||
Cruise Control is a product that helps run Apache Kafka clusters at large scale. Due to the popularity of | ||
Apache Kafka, many companies have bigger and bigger Kafka clusters. At LinkedIn, we have 1800+ Kafka brokers, | ||
which means broker deaths are an almost daily occurrence and balancing the workload of Kafka also becomes a big overhead. | ||
|
||
Kafka Cruise control is designed to address this operation scalability issue. | ||
|
||
### Features ### | ||
Kafka Cruise Control provides the following features out of the box: | ||
|
||
* Resource utilization tracking for brokers, topics and partitions. | ||
|
||
* Multi-goal rebalance proposal generation | ||
* Rack-awareness | ||
* Resource utilization balance (CPU, DISK, Network I/O) | ||
* Leader traffic distribution | ||
* Replica distribution for topics | ||
* Global replica distribution | ||
* write your own and plug them in | ||
|
||
* Anomaly detection and alerting for the Kafka cluster including | ||
* goal violation | ||
* broker failure detection | ||
|
||
* Admin operations including: | ||
* Add brokers | ||
* Decommission brokers | ||
* Rebalance the cluster | ||
|
||
### Quick Start ### | ||
1. Modify config/cruisecontrol.properties to | ||
* fill in `bootstrap.servers` and `zookeeper.connect` to the Kafka cluster to be monitored. | ||
* set `metric.sampler.class` to your implementation (the default sampler class is CruiseControlMetricsReporterSampler) | ||
* set `sample.store.class` to your implementation if necessary (the default SampleStore is KafkaSampleStore) | ||
2. Run the following command | ||
``` | ||
./gradlew jar copyDependantLibs | ||
./kafka-cruise-control-start.sh [-jars PATH_TO_YOUR_JAR_1,PATH_TO_YOUR_JAR_2] config/cruisecontrol.properties [port] | ||
``` | ||
3. visit http://localhost:9090/kafkacruisecontrol/state or http://localhost:\[port\]/kafkacruisecontrol/state if | ||
you specified the port when starting cruise control. | ||
### REST API ### | ||
Cruise Control has provided a [REST API](https://github.com/linkedin/cruise-control/wiki/REST-APIs) for users | ||
to interact with. See the wiki page for more details. | ||
### How Does It Work ### | ||
Cruise Control tries to understand the workload of each replica and provide a optimization | ||
solution to the current cluster based on this knowledge. | ||
Cruise Control periodically gets the resource utilization samples at both broker and partition level to | ||
understand the traffic pattern of each partition. Based on the traffic characteristics of all the partitions, | ||
it derives the load impact of each partition in the brokers. Cruise Control then builds a workload | ||
model to simulate the workload of the Kafka cluster. The goal optimizer will explore different ways to generate | ||
the cluster workload optimization proposals based on the list of goals specified by the users. | ||
Cruise Control also monitors the liveness of all the brokers in the cluster. When a broker fails in the | ||
cluster, Cruise Control will automatically move the replicas on the failed broker to the healthy brokers to | ||
avoid the loss of redundancy. | ||
For more details about how Cruise Control achieves that, see | ||
[these slides](https://www.slideshare.net/JiangjieQin/introduction-to-kafka-cruise-control-68180931). | ||
### Configurations for Cruise Control ### | ||
To read more about the configurations. Check the | ||
[configurations wiki page](https://github.com/linkedin/cruise-control/wiki/Configurations). | ||
### Pluggable Components ### | ||
More about pluggable components can be found in the | ||
[pluggable components wiki page](https://github.com/linkedin/cruise-control/wiki/Pluggable-Components). | ||
#### Metric Sampler #### | ||
The metric sampler is one of the most important pluggables in Cruise Control, it allows users to easily | ||
deploy Cruise Control to various environments and work with any existing metric system. | ||
Cruise Control provides a metrics reporter which can be configured in your Apache | ||
Kafka server. It will produce performance metrics to a kafka metrics topic which can be consumed by Cruise Control. | ||
#### Sample Store #### | ||
The Sample Store is used to store the collected metric samples and training samples to external storage. | ||
One problem in metric sampling is that we are using some derived data from the raw metrics. And the way we | ||
derive the data relies on the metadata of the cluster at that point. So when we look at the old metrics, if we | ||
do not know the metadata at the point the metric was collected the derived data would not be accurate. Sample | ||
Store help solve this problem by storing the derived data directly to an external storage for later loading. | ||
The default sample store implementation produces the metric samples back to Kafka. | ||
#### Goals #### | ||
The goals in Cruise Control are pluggable with different priorities. The default goals are (in order of decreasing priority): | ||
* **RackAwareCapacityGoal** - A goal that ensures all the replicas of each partition are assigned in a rack aware manner. All the broker’s | ||
resource utilization are below a given threshold. | ||
* **PotentialNwOutGoal** - A goal that ensures the potential network output (when all the replicas becomes leaders) on each of the broker do not exceed the broker’s network outbound bandwidth capacity. | ||
* **ResourceDistributionGoal** - Attempt to make the workload variance among all the brokers are within a certain range. This goal does not do anything if the cluster is in a low utilization mode (when all the resource utilization of each broker is below a configured percentage.) | ||
* **LeaderBytesInDistributsionGoal** - Attempt to equalize the leader bytes in rate on each host. | ||
* **TopicReplicaDistributionGoal** - Attempt to maintain an even distribution of any topic's replicas across the entire cluster. | ||
* **ReplicaDistributionGoal** - Attempt to make all the brokers in a cluster have a similar amount of replicas. | ||
#### Anomaly Notifier #### | ||
The anomaly notifier allows users to be notified when an anomaly is detected. Anomalies include: | ||
* Broker failure | ||
* Goal violation | ||
In addition to anomaly notifications users can specify actions to be taken in response to the anomaly. the following actions are supported: | ||
* **fix** - fix the problem right away. | ||
* **check** - check the situation again after a given delay | ||
* **ignore** - ignore the anomaly |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
/* | ||
* Copyright 2017 LinkedIn Corp. Licensed under the BSD 2-Clause License (the "License"). See License in the project root for license information. | ||
*/ | ||
|
||
allprojects { | ||
apply plugin: 'idea' | ||
apply plugin: 'eclipse' | ||
|
||
repositories { | ||
mavenCentral() | ||
} | ||
|
||
//wrapper generation task | ||
task wrapper(type: Wrapper) { | ||
gradleVersion = '3.5' | ||
} | ||
} | ||
|
||
subprojects { | ||
apply plugin: 'java' | ||
apply plugin: 'checkstyle' | ||
apply plugin: 'findbugs' | ||
|
||
//code quality and inspections | ||
checkstyle { | ||
toolVersion = '7.5.1' | ||
ignoreFailures = false | ||
configFile = rootProject.file('checkstyle/checkstyle.xml') | ||
} | ||
|
||
findbugs { | ||
toolVersion = "3.0.1" | ||
excludeFilter = file("$rootDir/gradle/findbugs-exclude.xml") | ||
ignoreFailures = false | ||
} | ||
|
||
test.dependsOn('checkstyleMain', 'checkstyleTest', 'findbugsMain', 'findbugsTest') | ||
|
||
tasks.withType(FindBugs) { | ||
reports { | ||
xml.enabled (project.hasProperty('xmlFindBugsReport')) | ||
html.enabled (!project.hasProperty('xmlFindBugsReport')) | ||
} | ||
} | ||
|
||
jar { | ||
from "$rootDir/LICENSE" | ||
from "$rootDir/NOTICE" | ||
} | ||
|
||
test { | ||
useJUnit {} | ||
testLogging { | ||
events "passed", "failed", "skipped" | ||
} | ||
maxParallelForks = Runtime.runtime.availableProcessors() | ||
} | ||
} | ||
|
||
project(':cruise-control') { | ||
apply plugin: 'scala' | ||
|
||
//needed because our java classes depend on scala classes, so must be compiled by scala | ||
sourceSets { | ||
main { | ||
java { | ||
srcDirs = [] | ||
} | ||
|
||
scala { | ||
srcDirs = ['src/main/java', 'src/main/scala'] | ||
} | ||
} | ||
|
||
test { | ||
java { | ||
srcDirs = [] | ||
} | ||
scala { | ||
srcDirs = ['src/test/java', 'src/test/scala'] | ||
} | ||
} | ||
|
||
} | ||
|
||
dependencies { | ||
compile project(':cruise-control-metrics-reporter') | ||
compile "org.slf4j:slf4j-api:1.7.25" | ||
compile "org.apache.zookeeper:zookeeper:3.4.6" | ||
compile "org.apache.kafka:kafka_2.10:0.10.1.0" | ||
compile 'org.apache.kafka:kafka_2.10:0.10.1.0:test' | ||
compile 'org.apache.kafka:kafka-clients:0.10.1.0' | ||
compile "org.scala-lang:scala-library:2.10.4" | ||
compile 'junit:junit:4.12' | ||
compile 'org.apache.commons:commons-math3:3.6.1' | ||
compile 'com.google.code.gson:gson:2.7' | ||
compile 'org.eclipse.jetty:jetty-server:9.4.6.v20170531' | ||
compile 'org.eclipse.jetty:jetty-servlet:9.4.6.v20170531' | ||
compile 'io.dropwizard.metrics:metrics-core:3.2.2' | ||
|
||
testCompile project(path: ':cruise-control-metrics-reporter', configuration: 'testOutput') | ||
testCompile "org.scala-lang:scala-library:2.10.4" | ||
testCompile 'org.easymock:easymock:3.4' | ||
testCompile 'org.apache.kafka:kafka_2.10:0.10.1.0:test' | ||
} | ||
|
||
tasks.create(name: "copyDependantLibs", type: Copy) { | ||
from (configurations.testRuntime) { | ||
include('slf4j-log4j12*') | ||
} | ||
from (configurations.runtime) { | ||
|
||
} | ||
into "$buildDir/dependant-libs" | ||
duplicatesStrategy 'exclude' | ||
} | ||
|
||
compileScala.doLast { | ||
tasks.copyDependantLibs.execute() | ||
} | ||
} | ||
|
||
project(':cruise-control-metrics-reporter') { | ||
apply plugin: 'scala' | ||
|
||
configurations { | ||
testOutput | ||
} | ||
|
||
dependencies { | ||
compile "org.slf4j:slf4j-api:1.7.25" | ||
compile "org.apache.kafka:kafka_2.10:0.10.1.0" | ||
compile 'org.apache.kafka:kafka-clients:0.10.1.0' | ||
compile 'junit:junit:4.12' | ||
compile 'org.apache.commons:commons-math3:3.6.1' | ||
|
||
testCompile 'org.bouncycastle:bcpkix-jdk15on:1.54' | ||
testOutput sourceSets.test.output | ||
} | ||
|
||
sourceSets { | ||
test { | ||
java { | ||
srcDirs = [] | ||
} | ||
scala { | ||
srcDirs = ['src/test/java', 'src/test/scala'] | ||
} | ||
} | ||
|
||
} | ||
} | ||
|
Oops, something went wrong.