See big-data.md
Start by validating data formats for correctness.
Scripts for this can be found in both the DevOps-Python-tools and DevOps-Bash-tools repos.
Then proceed to more advanced content validation.
- DBT - open-source data pipeline workflow tool
- Informatica - proprietary legacy now available via SaaS, with self-hosted agents on VMs or Kubernetes
- Airbyte - open source self-hosted or SaaS proprietary with 300+ connectors
- Apache Camel - open source with 100+ connectors
- Spring Integration - XML config, only use for Spring heavy shops
- Mulesoft - XML config, only use for proprietary connectors
- lightweight enterprise service bus + integration framework
- proprietary connectors
- Anypoint Studio (Eclipse-based IDE)
- Anypoint Enterprise Security - security features, transactions
TODO
See the Diagrams and Visualization docs.
Ported from private Knowledge Base pages 2016+