Skip to content

Sam-Panda/aemo_fabric

 
 

Repository files navigation

A Full end to end solution using Fabric Lakehouse

AEMO manage electricity and gas systems and markets across Australia, they provide extensive data at a very granular level and updated every 5 minutes, see example here https://nemweb.com.au/Reports/CURRENT/

image

Architecture

image

Howto

0- Create a Fabric Workspace

1- connect to the github report from Fabric

2-Open Notebook 1, attached it to the Lakehouse then run it, new data arrive at 5 am Brisbane time, AEMO keep an archive for 60 days ( add a schedule to keep it updated)

3-Open Notebook 2, run it, it will rebind the semantic model and the report to the new create Lakehouse

image

Optional

3- Open notebook 3 and attach a lakehouse, turn on the data pipeline scheduler if you want 5 minutes data.

Lessons learnt

  • Use data pipeline to schedule job, to control concurrency and timeout.

  • Develop using starter pool, but for production use a single node to reduce capacity usage.

  • Direct don't like too many small files, optimize to get good performance

  • vacuum to remove old snapshots to reduce data storage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%