Skip to content

Latest commit

 

History

History
 
 

dstat-kudu

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Kudu + dstat + Impala

This is an example program that shows how to use the Kudu API in Python to load data into a new / existing Kudu table generated by an external program.

Prerequisites

Make sure you have the Kudu client library installed and the kudu Python bindings are available. If you have the Kudu client library and Python bindings in a special place, you'll need to set the environment variables:

LD_LIBRARY_PATH PYTHONPATH

To the according directories. In addition you'll need the dstat program, it should be available from your typical package repository.

Usage

In this case the dstat program is used to generate data about the system load and pipe this data into a named pipe that is then read and pipe to the python program.

To execute this script simply run:

python kudu_dstat.py

This will create a table assuming that you have a kudu-master running locally. You can use the Web UI to access some information about the table using the following link: http://localhost:8051. The program will run until it is terminated via C-c.

To drop the table in Kudu and start fresh start the program with:

python kudu_dstat.py drop

To query the data via Impala, create a new Kudu table in Impala using the following command in the impala-shell.

CREATE EXTERNAL TABLE dstat (
`ts` BIGINT,
`usr` FLOAT,
`sys` FLOAT,
`idl` FLOAT,
`wai` FLOAT,
`hiq` FLOAT,
`siq` FLOAT,
`read` FLOAT,
`writ` FLOAT,
`recv` FLOAT,
`send` FLOAT,
`in` FLOAT,
`out` FLOAT,
`int` FLOAT,
`csw` FLOAT
)
TBLPROPERTIES(
  'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler',
  'kudu.table_name' = 'dstat',
  'kudu.master_addresses' = '127.0.0.1:7051',
  'kudu.key_columns' = 'ts'
);

Now you can query your local system's load using:

-- How many rows are stored right now?
select count(*) from dstat;

-- Average load in 10s windows
select (ts - ts % 10 ) as mod_ts, avg(usr), avg(sys), avg(idl) from dstat group by mod_ts order by mod_ts