Curated examples for Data Mesh guiding values, an operating model, and global policies to support a federated governance group.
We want this to be an open source collection of policy examples, driven by the community. Contribute by submitting a pull request on the GitHub repository.
The data mesh governance group consists of representatives from the domain teams and the data platform team.
They are temporarily supported by a subject-matter experts, to address special issues, e.g. concerning legal, compliance, and security.
Together, they make sure that data products in the mesh are interoperable and can be used securely. For this, they agree on a few architectural decisions and global policies. To make it easy for domain teams to implement the policies, they specify the requirements for the data platform to automate the policies as much as possible.
Guiding values are the fundamental beliefs we agree on when implementing data mesh governance. They guide us to make the right choices and give justification for our decisions.
- Promote the usage of data products
- Optimize experience for generalist majority
- Standardize for interoperability
- Enforce consistent security
- Design for automation
The operating model defines the structure and processes of the data mesh governance group. After forming the group with its members, in the first meeting the collaboration mode, communication channels and a policy repository needs to be decided on.
- Regular online meetings
- Local Data Groups
- Asynchronous collaboration (no meetings)
- Consent
- Consensus
- Democratic
- Microsoft Teams Channels
- Slack Channels
- Email Lists
- Data Mesh Manager
- Confluence
- Git
- Data Product Specification
- Data Contract Specification
- Address scheme
- File Format
- Partitioning Keys
- Timestamp as ISO-8601 Strings
- Money amounts in cents as integers
- Common IDs
- Well-known Fields Names
- Bitemporal Timestamp Fields
- Naming Conventions (environment, database, table, column, file, bucket, ...)
- Project structure
- Environments
- Production only
- Multiple Isolated Environments
- Central Governance Account
- Separate Account per Domain Team
- Separate Database per Domain Team
- Separate Schema per Domain Team
- Data Product Inventory
- Confluence Wiki Page
- Data Mesh Manager
- Backstage
- LeanIX
- Custom Web-Application
- Data Catalog
- Data Catalog
- AWS Glue Data Catalog
- GCP Dataplex
- Azure Purview
- Databricks Unity
- Collibra
- Atlan
- Tagging Tables as Data Products
- Mandatory Ownership Information
- Mandatory Tags
- Retire unused data products after 6 months
- Minimum level quality of a data product
- Documentation of data products
- Wiki
- Data Catalog
- Mandatory Fields for Data Products
- Schema Format
- Access Request
- Ticket with manual steps
- Decentralized self-service via Pull Requests
- Central self-service app with decentralized handlers
- Access granted through AWS IAM Policies
- ACLs managed by domain teams
- Reassess after x month
- One domain published consents as data product
- Data Classification
- PII data separation
- PII Anonymization
- Data Stored in Customer's Business Region
- PHI (protected health info)
- Data Retention Periods
- Right to be Forgotten By Tombstone Events
- Politically exposed person (PEP)
- People in witness protection program
- Encryption at Rest
- Encryption at Transit
- VPC
- Observability Metrics
- Cost reporting
- Data Product Creation
- Self-service app (Backstage.io)
- Tutorials/guides
- Ownership for New Data Products
- Ownership for Legacy Data Products
While it is not the federated governance group's actual job to define the architecture of the data platform, decisions about the platform have consequences for global policies and vice versa, e.g. for policy automation and monitoring. The governance group always has to keep track of those decisions related to the data platform.
- AWS S3 as Storage for Data Products
- AWS Athena as Query-Engine
- AWS Redshift as Data Platform
- GCP BigQuery as Data Platform
- GCP Cloud Storage as Storage for Data Products
- Azure Synapse Analytics as Data Platform
- Azure ADLS as Storage for Data Products
- Snowflake as Data Platform
- Databricks as Data Platform
- Presto as On-Premise Query-Engine
- MinIO as On-Premise Storage for Data Products