Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove code related to http caching #644

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Remove code related to http caching #644

wants to merge 2 commits into from

Conversation

JelteF
Copy link
Collaborator

@JelteF JelteF commented Mar 3, 2025

We had implemented our own http caching in #88 when nothing like that existed yet. Now there are two different threads that are implementing something like this in a better way than we have done. So this PR removes our custom caching code from our codebase. Instead in a future release we'll make sure that pg_duckdb works well with these at least one of these other caching solutions.

For reference the two projects that are currently implementing caching are:

  1. An official in-memory cache by DuckDB, which will eventually improved to also support on-disk caching: External File Cache duckdb#16463
  2. A community provided extension called cache_httpfs: https://github.com/dentiny/duck-read-cache-fs

@JelteF JelteF requested a review from wuputah March 3, 2025 15:46
duckdb_extension_load(httpfs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using the public one? (like json/icu?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we still want it statically compiled into the binary, and since httpfs is in a separate repo these days we have to get it like this afaik. If you know of some other way to do that, which does not require us to have an additional submodule, then I'm all for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't duckdb_extension_load(httpfs) just do exactly that?

For example with json:

y=# select * from duckdb.raw_query($$select extension_name, loaded, installed, install_mode, install_path from duckdb_extensions() where extension_name = 'json';$$);
NOTICE:  result: extension_name	loaded	installed	install_mode	install_path
VARCHAR	BOOLEAN	BOOLEAN	VARCHAR	VARCHAR
[ Rows: 1]
json	true	true	STATICALLY_LINKED	(BUILT-IN)


 raw_query
-----------

(1 row)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I thought JSON was also out of tree...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is it: 7579326

We had implemented our own http caching in #88 when nothing like that
existed yet. Now there are two different threads that are implementing
something like this in a better way than we have done. So this PR
removes our custom caching code from our codebase. Instead in a future
release we'll make sure that pg_duckdb works well with these at least
one of these other caching solutions.

For reference the two projects that are currently implementing caching
are:
1. An official in-memory cache by DuckDB, which will eventually improved
   to also support on-disk caching: duckdb/duckdb#16463
2. A community provided extension called `cache_httpfs`: https://github.com/dentiny/duck-read-cache-fs
@JelteF JelteF force-pushed the drop-caching-code branch from ad5ec30 to cbe5b24 Compare March 3, 2025 15:53
@wuputah
Copy link
Collaborator

wuputah commented Mar 3, 2025

I would prefer to see more progress on supporting community extensions (it's not clear whether community extensions are currently supported) and possibly supporting "persistent" settings (i.e. SET commands that are run when initializing DuckDB) before kicking this out, unless we have a strong reason that this has to go immediately.

That said, if we are going to remove / replace it -- which I do think we should -- I would certainly like to remove it before 1.0.

I would be curious, among those watching the repo, whether removing this feature without an immediate alternative would break their usage / use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants