ChartDataPipeline

This project is a data pipeline to aggregate tick data to candlestick.

What it do

Receive data from Crypto.com Exchange API and process the trade data into different timeframe candlestick.

How it work

Data Source

Data is received from Crypto.com Websocket API which it will provide real time trade data.

Data format:

{
  "id": 1,
  "method": "subscribe",
  "code": 0,
  "result": {
    "instrument_name": "BTCUSD-PERP",
    "subscription": "trade.BTCUSD-PERP",
    "channel": "trade",
    "data": [{
      "d" : "2030407068",    // Trade ID
      "t": 1613581138462,    // Trade time
      "p": "51327.500000",   // Price
      "q": "0.000100",       // Quantity
      "s": "SELL",           // Side
      "i": "BTCUSD-PERP"     // Instrument name
    }]
  }
}

Data Processing

Data is processed into candlestick (shown below) using Window operator (TumblingEventTimeWindows) with window size of 1 min, 5 min, 15 min, 30 min, 1 hour, 4 hours and 1 day.

{
  "symbol": "XRP_USDT",
  "timestamp": 1613581138462, // Candle time
  "open": 0.505,
  "high": 0.554,
  "low": 0.490,
  "close": 0.535,
  "volume": 1
}

Window Trigger

The default trigger for TumblingEventTimeWindows are EventTimeTrigger. However, for this use case the candle should be processed immediately whenever the window max timestamp is reached. So, the trigger is modified to ProcessingTimeTrigger.

For the subsequence window function, ProcessingTimeTrigger does not suit the use case because it may fire the window twice when it reach the timestamp because window are process sequentially so when the window fire for first time the latest result of previous window is not ready. So custom trigger function is used where if the EventTime pass the window max timestamp it will fire. For implementation detail please refer to CandleTrigger.java.

Window Process Function

Due to the possible of late data, a window may get triggered multiple time results in downstream task to receive multiple result of the same period candle. So before constructing the candlestick duplicate timeframe candle is removed will preserving the latest processed data. For implementation detail please refer to AggCandleFunction.java.

Screenshot

Some problem/improvement

The job maybe can be refactor into non keyed stream (each trade symbol one kafka topic and flink job) to reduce the triggering of keyBy function.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cassandra/scripts		cassandra/scripts
dockerfiles		dockerfiles
flink/flink-processing		flink/flink-processing
pic		pic
python_script		python_script
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChartDataPipeline

What it do

How it work

Data Source

Data Processing

Window Trigger

Window Process Function

Screenshot

Some problem/improvement

About

Releases

Packages

Languages

yuanqingyeoh/ChartDataPipeline

Folders and files

Latest commit

History

Repository files navigation

ChartDataPipeline

What it do

How it work

Data Source

Data Processing

Window Trigger

Window Process Function

Screenshot

Some problem/improvement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages