Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] BIT/VARBIT type conversion between pg and duckdb #628

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Feb 26, 2025

Resolves #597

Comment on lines 163 to 164
Datum pg_varbit = DirectFunctionCall3(varbit_in, CStringGetDatum(value_str.c_str()), ObjectIdGetDatum(VARBITOID),
Int32GetDatum(-1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback as for time/timetz PR. I think it should be fairly easy to convert the internal representations directly to eachother, without having to go through string formatting/parsing.

Copy link
Contributor Author

@dentiny dentiny Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In duckdb BIT type is stored as VARCHAR physically.
https://github.com/duckdb/duckdb/blob/057f9753cbd3eb6d07543c2f1f1085de961fffc4/src/common/types.cpp#L120-L122

Even if we want to the parsing ourselves (from char* and size_t), we need to have a way to get pointer and its length.

So IMO, convert the Value into a string is a simple and reasonable implementation.

Copy link
Collaborator

@JelteF JelteF Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay looking closer at the way that the bit types are stored internally I now agree that going through the string representation is probably the best. I thought you could do something like this:

Do the same as ConvertBinaryDatumto get the raw value, except that you need to use some functions from duckdb its src/common/types/bit.cpp. I think you can do something very similar to what BitToBlob does: https://github.com/duckdb/duckdb/blob/057f9753cbd3eb6d07543c2f1f1085de961fffc4/src/common/types/bit.cpp#L175-L190

	duckdb::string_t str = value.GetValueUnsafe<duckdb::string_t>();
    // get padding
	// copy raw bytes

Afaict that would work fine, were it not for the fact that the padding bit in the PG representation are on the end of the bitstring, and not on the beginning like in DuckDB. So to copy you couldn't use a plain memcpy, because you'd need to copy the actual data with bit offset (that's not divisible by 8). That definitely seems too much complexity for this rather exotic data type. So yeah, let's go through the string representation instead.

Let's move this string conversion logic to pg/types.cpp though just like you did for VARINT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for explaining your thoughts! That's very helpful!

My initial thought is memcpy as well, I was just thinking if it's possible to avoid handling padding and byte order myself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks good. But it still needs conversion the other way around, i.e. from PG to DuckDB (just like you did for the time type).

Copy link
Contributor Author

@dentiny dentiny Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I thought about it, for postgres write-side the entrypoint type if BITOID / VARBITOID, when duckdb takes on the read side the type is BIT, so there's no 1-1 mapping between pg/duckdb types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping, doesn't have to be 1-1. As long as each type converts into something sensible in the other typesystem it's fine. For example, if you have a PG table that stores a BITOID, and we then we convert that to a DuckDB BIT, and then back to PG VARBITOID, yes you've lost a bit of type information. But that's still a lot better than not being able to read a BITOID columns at all. It's a tradeoff, and so far we're trading nice interoperability for strictly correct 1-1 mappings.

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback as for the time/timetz PR. This should also implement conversion from PG types to DuckDB types (both for bit and bit varying)

@dentiny dentiny changed the title BIT/VARBIT type conversion between pg and duckdb [WIP] BIT/VARBIT type conversion between pg and duckdb Feb 27, 2025
@dentiny dentiny marked this pull request as draft February 27, 2025 12:42
@dentiny dentiny changed the title [WIP] BIT/VARBIT type conversion between pg and duckdb BIT/VARBIT type conversion between pg and duckdb Feb 27, 2025
@dentiny dentiny requested a review from JelteF February 27, 2025 13:18
@dentiny dentiny marked this pull request as ready for review February 27, 2025 13:18
@dentiny dentiny changed the title BIT/VARBIT type conversion between pg and duckdb [WIP] BIT/VARBIT type conversion between pg and duckdb Feb 28, 2025
@dentiny dentiny marked this pull request as draft February 28, 2025 05:28
@@ -6,7 +6,6 @@ SELECT * exclude(
tinyint, -- PG14 outputs this differently currently
TIME,
time_tz,
bit,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs/types.md file should also be updated that these types are now supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for BIT type
2 participants