Skip to content

Commit

Permalink
Python: SqlCatalog (apache#7921)
Browse files Browse the repository at this point in the history
* Initial code for pyiceberg JDBC Catalog

* pyiceberg JDBC Catalog PR modifications

* pyiceberg JDBC Catalog PR modifications

* pyiceberg JDBC Catalog PR modifications

* pyiceberg JDBC Catalog PR modifications

* Fix lint errors

* Migrate to SQLAlchemy, initial code

* Migrate to SQLAlchemy

* Migrate to SQLAlchemy

* Apply suggestions from code review

Co-authored-by: Fokko Driesprong <[email protected]>

* Update python/pyiceberg/catalog/sql.py

Co-authored-by: Fokko Driesprong <[email protected]>

* Finish PR review changes and port JDBCCatalog unit tests over

* Migrate to SQLAlchemy

* Fix lint issues

* Fix lint issues

* Fix lint issues

* Fix lint issues

* Add new namespace unit test

* Apply suggestions from code review

Co-authored-by: Fokko Driesprong <[email protected]>

* PR review fix

* Merge conflict

* In-sync with master

---------

Co-authored-by: cccs-eric <[email protected]>
  • Loading branch information
Fokko and cccs-eric authored Jul 20, 2023
1 parent 0b95a2c commit 7da759b
Show file tree
Hide file tree
Showing 7 changed files with 1,198 additions and 5 deletions.
2 changes: 1 addition & 1 deletion python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

install:
pip install poetry
poetry install -E pyarrow -E hive -E s3fs -E glue -E adlfs -E duckdb -E ray
poetry install -E pyarrow -E hive -E s3fs -E glue -E adlfs -E duckdb -E ray -E sql-postgres

check-license:
./dev/check-license
Expand Down
14 changes: 13 additions & 1 deletion python/mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ hide:

# Catalogs

PyIceberg currently has native support for REST, Hive and Glue.
PyIceberg currently has native support for REST, SQL, Hive, Glue and DynamoDB.

There are three ways to pass in configuration:

Expand Down Expand Up @@ -107,6 +107,18 @@ catalog:
| rest.signing-region | us-east-1 | The region to use when SigV4 signing a request |
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |

## SQL Catalog

The SQL catalog requires a database for its backend. As of now, pyiceberg only supports PostgreSQL through psycopg2.
The database connection has to be configured using the `uri` property (see SQLAlchemy's [documentation for URL format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls)):

```yaml
catalog:
default:
type: sql
uri: postgresql+psycopg2://username:password@localhost/mydatabase
```

## Hive Catalog

```yaml
Expand Down
332 changes: 329 additions & 3 deletions python/poetry.lock

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions python/pyiceberg/catalog/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ class CatalogType(Enum):
HIVE = "hive"
GLUE = "glue"
DYNAMODB = "dynamodb"
SQL = "sql"


def load_rest(name: str, conf: Properties) -> Catalog:
Expand Down Expand Up @@ -115,11 +116,21 @@ def load_dynamodb(name: str, conf: Properties) -> Catalog:
raise NotInstalledError("AWS DynamoDB support not installed: pip install 'pyiceberg[dynamodb]'") from exc


def load_sql(name: str, conf: Properties) -> Catalog:
try:
from pyiceberg.catalog.sql import SqlCatalog

return SqlCatalog(name, **conf)
except ImportError as exc:
raise NotInstalledError("SQLAlchemy support not installed: pip install 'pyiceberg[sql-postgres]'") from exc


AVAILABLE_CATALOGS: dict[CatalogType, Callable[[str, Properties], Catalog]] = {
CatalogType.REST: load_rest,
CatalogType.HIVE: load_hive,
CatalogType.GLUE: load_glue,
CatalogType.DYNAMODB: load_dynamodb,
CatalogType.SQL: load_sql,
}


Expand All @@ -142,6 +153,8 @@ def infer_catalog_type(name: str, catalog_properties: RecursiveDict) -> Optional
return CatalogType.REST
elif uri.startswith("thrift"):
return CatalogType.HIVE
elif uri.startswith("postgresql"):
return CatalogType.SQL
else:
raise ValueError(f"Could not infer the catalog type from the uri: {uri}")
else:
Expand Down
Loading

0 comments on commit 7da759b

Please sign in to comment.