Uses cases working: window, mr, alerting, and join/groupby both

streaming and batching
linearregression · Oct 2, 2015 · d5b80ef · d5b80ef
1 parent 842edf8
commit d5b80ef
Show file tree

Hide file tree

Showing 92 changed files with 14,351 additions and 290 deletions.
diff --git a/README.md b/README.md
@@ -25,39 +25,43 @@ There are two different ways to consume Kapacitor.
   * Select data from an existing InfluxDB host and save it:
 
       ```sh
-      $ kapacitor record stream --host address_of_influxdb --query 'select value from cpu_idle where time > start and time < stop'
-      RecordingID=2869246
+      $ kapacitor record query -addr http://address_of_influxdb -query 'select value from cpu_idle where time > start and time < stop'
+      b6d1de3f-b27f-4420-96ee-b0365d859d1c
       ```
   * Or record the live stream for a bit:
 
       ```sh
-      $ kapacitor start-recording
-      $ sleep 60
-      $ kapacitor stop-recording
-      RecordingID=2869246
+      $ kapacitor record stream -duration 60s
+      b6d1de3f-b27f-4420-96ee-b0365d859d1c
       ```
 
 4. Define a Kapacitor `streamer`. A `streamer` is an entity that defines what data should be processed and how.
 
     ```sh
-    $ kapacitor define streamer \
-        --name alert_cpu_idle_any_host \
-        --script path/to/dsl/script
+    $ kapacitor define \
+        -type streamer \
+        -name alert_cpu_idle_any_host \
+        -tick path/to/tick/script
     ```
 
 5. Replay the recording to test the `streamer`.
 
     ```sh
-    $ kapacitor replay 2869246 alert_cpu_idle_any_host
+    $ kapacitor replay \
+        b6d1de3f-b27f-4420-96ee-b0365d859d1c \
+        alert_cpu_idle_any_host
     ```
 
 6. Edit the `streamer` and test until its working
 
     ```sh
-    $ kapacitor define streamer \
-        --name alert_cpu_idle_any_host \
-        --script path/to/dsl/script
-    $ kapacitor replay 2869246 alert_cpu_idle_any_host
+    $ kapacitor define \
+        -type streamer \
+        -name alert_cpu_idle_any_host \
+        -tick path/to/tick/script
+    $ kapacitor replay \
+        b6d1de3f-b27f-4420-96ee-b0365d859d1c \
+        alert_cpu_idle_any_host
     ```
 
 7. Enable or push the `streamer` once you are satisfied that it is working
@@ -66,7 +70,7 @@ There are two different ways to consume Kapacitor.
     $ # enable the streamer locally
     $ kapacitor enable alert_cpu_idle_any_host
     $ # or push the tested streamer to a prod server
-    $ kapacitor push --remote address_to_remote_kapacitor alert_cpu_idle_any_host
+    $ kapacitor push -remote http://address_to_remote_kapacitor alert_cpu_idle_any_host
     ```
 
 # Batch workflow
@@ -80,39 +84,45 @@ There are two different ways to consume Kapacitor.
 1. Define a `batcher`. Like a `streamer` a `batcher` defines what data to process and how, only it operates on batches of data instead of streams.
 
     ```sh
-    $ kapacitor define batcher \
-        --name alert_mean_cpu_idle_logs_by_dc \
-        --script path/to/dsl/script
+    $ kapacitor define \
+        -type batcher \
+        -name alert_mean_cpu_idle_logs_by_dc \
+        -tick path/to/tick/script
     ```
 2. Save a batch of data for replaying using the definition in the `batcher`.
 
       ```sh
       $ kapacitor record batch alert_mean_cpu_idle_logs_by_dc
-      RecordingID=2869246
+      b6d1de3f-b27f-4420-96ee-b0365d859d1c
       ```
 
 3. Replay the batch of data to the `batcher`.
 
     ```sh
-    $ kapacitor replay 2869246 alert_mean_cpu_idle_logs_by_dc
+    $ kapacitor replay \
+        b6d1de3f-b27f-4420-96ee-b0365d859d1c \
+        alert_mean_cpu_idle_logs_by_dc
     ```
 
 4. Iterate on the `batcher` definition until it works
 
     ```sh
     $ kapacitor define batcher \
-        --name alert_mean_cpu_idle_logs_by_dc \
-        --script path/to/dsl/script
-    $ kapacitor replay 2869246 alert_mean_cpu_idle_logs_by_dc
+        -type batcher \
+        -name alert_mean_cpu_idle_logs_by_dc \
+        -tick path/to/tick/script
+    $ kapacitor replay \
+        b6d1de3f-b27f-4420-96ee-b0365d859d1c \
+        alert_mean_cpu_idle_logs_by_dc
     ```
 
-5. Once it works enable locally or push to remote
+5. Once it works, enable locally or push to remote
 
     ```sh
     $ # enable the batcher locally
     $ kapacitor enable alert_mean_cpu_idle_logs_by_dc
     $ # or push the tested batcher to a prod server
-    $ kapacitor push --remote address_to_remote_kapacitor alert_mean_cpu_idle_logs_by_dc
+    $ kapacitor push -remote http://address_to_remote_kapacitor alert_mean_cpu_idle_logs_by_dc
     ```
 
 # Data processing with pipelines
@@ -275,7 +285,6 @@ stream
   .period(1m)
   .every(1m)
   .mapReduce(influxql.count, "value")
-  .where("count == 0")
   .alert();
 
 //Now define normal processing on the stream

diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -0,0 +1,195 @@
+# Getting Started with Kapacitor
+
+This document will walk you through getting started with two simple use cases of Kapacitor.
+
+
+## Alert on high cpu usage
+
+Classic example, how to get an alert when a server is over loaded.
+The following will walk you through how to get data into Kapacitor and how to setup and alert based on that stream of data.
+All along showcasing some of the neat features.
+
+
+First we need to get data into Kapacitor.
+This can be done simply via [Telegraf](https://github.com/influxdb/telegraf).
+Since we are concerned only about cpu right now lets just use this simple configuration:
+
+```
+[agent]
+    interval = "1s"
+
+[outputs]
+
+# Configuration to send data to Kapacitor.
+# Since Kapacitor acts like an InfluxDB server the configuration is the same.
+[outputs.influxdb]
+    # Note the port 9092, this is the default port that Kapacitor uses.
+    urls = ["http://localhost:9092"]
+    database = "telegraf"
+    user_agent = "telegraf"
+
+# Read metrics about cpu usage
+[cpu]
+    percpu = false
+    totalcpu = true
+    drop = ["cpu_time"]
+
+```
+
+Go ahead an start Telegraf with the above configuration.
+
+```sh
+$ telegraf -config telegraf.conf
+```
+
+It will complain about not being able to connect to Kapacitor but thats fine, it will keep trying.
+
+
+Now lets start Kapacitor
+
+```sh
+$ kapacitord
+```
+
+That's it. In a sec Telegraf will connect to Kapacitor and start sending it cpu metrics.
+
+
+Now we need to tell Kapacitor what to do.
+Kapacitor's behavior is very dynamic and so it not controlled via configuration but through an HTTP API.
+We provide a simple cli utility to call the API to tell Kapacitor what to do.
+
+We want to first create a snapshot of data for testing.
+
+```sh
+$ rid=$(kapacitor record stream -duration 60s) # save the id for later use
+$ echo $rid
+RECORDING_ID_HERE
+```
+
+OK, so we want to get an alert if the CPU usage gets too high.
+We can define that like so:
+
+```
+stream
+    // Select just the cpu_usage_idle measurement
+    .from("cpu_usage_idle")
+    .alert()
+        // We are using idle so we want to check
+        // if idle drops below 70% (aka cpu used > 30%)
+        .predicate("value <  70")
+        // Post the data for the point to a URL
+        .post("http://localhost:8000");
+```
+
+
+The above script is called a `TICK` script.
+It is written in a custom language that makes it easy to define actions on a series of data.
+Go ahead and save the script to a file called `cpu_idle_alert.tick`.
+
+Now that we have our `TICK` script we need to hand it to Kapacitor so it can run it.
+
+```sh
+$ kapacitor define -name cpu_alert -type streamer -tick cpu_idle_alert.tick
+```
+
+Here we have defined a `task` for Kapacitor to run. The `task` has a `name`, `type`, and a `tick` script.
+The name needs to be unique and the type is `streamer` in this case since we are streaming data from Telegraf to Kapacitor.
+
+
+Since the `alert` is a POST to a url we need to give Kapacitor something to hit.
+
+In a seperate terminal run this:
+
+```sh
+$ # Print to STDOUT anything POSTed to http://localhost:8000
+$ mkfifo fifo
+$ cat fifo | nc -k -l 8000 | tee fifo
+```
+
+You can `rm fifo` once you are done.
+
+
+Now we want to see it in action. Replay the recording from a bit ago to the task called `cpu_alert`.
+
+```sh
+$ kapacitor replay -id $rid -name cpu_alert -fast
+```
+
+Did you catch any alerts? Maybe not if your system wasn't too busy during the recording.
+If not then lets lower the threshold so we will see some alerts.
+
+Note the `-fast` flag tells Kapacitor to replay the data as fast as possible but it still emulates the time in the recording.
+Without the `-fast` Kapacitor would replay the data in real time.
+
+Edit the `.predicate("value < 70")` line to be `.predicate("value < 99")`.
+Now if your system is at least 1% busy you will get an alert.
+
+Redefine the `task` so that Kapacitor knows about your update.
+
+```sh
+$ kapacitor define -name cpu_alert -type streamer -tick cpu_idle_alert.tick
+$ # Now run replay the data agian and see if we go any alerts.
+$ kapacitor replay -id $rid -name cpu_alert -fast
+```
+
+
+Since we recorded a snapshot of the data we can test again and again with the exact same dataset. 
+This is powerful for both reproducing bugs with your `TICK` scripts or just knowing that the data isn't changing with each test to keep your sanity.
+Run the replay again if you like to see that you get the exact same alerts.
+
+
+But now we want to see it in action with the live data.
+`Enable` your task so it starts working on the live data stream.
+
+```sh
+$ kapacitor enable cpu_alert
+```
+
+Now just about every second you are probably getting an alert that your system is busy.
+That's way to noisy: we could just move the threshold back but that isn't good enough.
+We want to only get alerts when things are really bad. Try this:
+
+```
+stream
+    .from("cpu_usage_idle")
+    .alert()
+        .predicate("sigma(value) >  3")
+        .post("http://localhost:8000");
+```
+
+Just like that we have told Kapacitor to only alert us if the current value is more than `3 sigma` away from the running mean.
+Now if the system cpu climbs throughout the day and drops throughout the night you will still get and alert if it spikes at night or drops during the day!
+
+
+Stop the noise!
+
+```sh
+$ kapacitor define -name cpu_alert -type streamer -tick cpu_idle_alert.tick
+$ # The old task definition continues to run until you disable/enable the task.
+$ kapacitor disable cpu_alert
+$ kapacitor enable cpu_alert
+```
+
+What about aggregating our alerts?
+If the cpu data coming from Telegraf were tagged with a `service` name then we could do something like this.
+
+```
+stream
+    .from("cpu_usage_idle")
+    .groupBy("service")
+    .window()
+        .period(10s)
+        .every(5s)
+    .mapReduce(influxql.mean, "value")
+    .alert()
+        .predicate("sigma(value) >  3")
+        .post("http://localhost:8000");
+```
+
+This `TICK` script alerts if the `mean` idle cpu, over the last `10s` `window` for each `service` group is `3` sigma away from the running mean, every `5s`.
+Wow, just like that we are aggregating across potentially thousands of servers and getting alerts that are actionable, not just a bunch of noise.
+Go ahead and try it out.
+
+
+
+