Starred repositories
Send usage data from your Python code to PostHog.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
A natural language interface for computers
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
Convert PDF to HTML without losing text or format.
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data…
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…
A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.
Alluxio, data orchestration for analytics and machine learning in the cloud
Making large AI models cheaper, faster and more accessible
Official Code for DragGAN (SIGGRAPH 2023)
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Flink CDC is a streaming data integration tool
Monitorix is a free, open source, lightweight system monitoring tool.
Pentaho Data Integration ( ETL ) a.k.a Kettle
APM, Application Performance Monitoring System
Upserts, Deletes And Incremental Processing on Big Data.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
ClickHouse Java Clients & JDBC Driver