Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Cache class for exact, fuzzy, and semantic deduplication #384

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

sarahyurick
Copy link
Collaborator

@sarahyurick sarahyurick commented Nov 19, 2024

TODO:

  • Exact deduplication files
  • Semantic deduplication files
  • Fuzzy deduplication files
  • Tutorials folder

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick changed the title Global cache variable for exact, fuzzy, and semantic deduplication Global cache_dir variable for exact, fuzzy, and semantic deduplication Nov 19, 2024
sarahyurick and others added 6 commits November 19, 2024 16:13

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
run black

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added the gpuci Run GPU CI/CD on PR label Nov 20, 2024
@sarahyurick sarahyurick marked this pull request as ready for review November 20, 2024 23:27

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Dec 23, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Jan 3, 2025

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick marked this pull request as draft January 22, 2025 00:54
@sarahyurick sarahyurick changed the title Global cache_dir variable for exact, fuzzy, and semantic deduplication Create Cache class for exact, fuzzy, and semantic deduplication Jan 22, 2025
@sarahyurick sarahyurick marked this pull request as ready for review January 23, 2025 22:22
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Jan 23, 2025
@sarahyurick sarahyurick requested a review from Maghoumi January 23, 2025 22:23

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Jan 29, 2025
@Maghoumi
Copy link
Collaborator

Thanks so much for working on this change. What I like about this now is that it gives users the option to either use the same cache directory for anything that requires caching, or provide a specific directory if they don't want to re-use the same cache.

The cache class implementation is functional but not thread-safe. I don't think that's a blocking problem for this PR.

I didn't run the samples/tutorials, but I assume the change has been thoroughly verified?

@sarahyurick
Copy link
Collaborator Author

Thanks so much for working on this change. What I like about this now is that it gives users the option to either use the same cache directory for anything that requires caching, or provide a specific directory if they don't want to re-use the same cache.

The cache class implementation is functional but not thread-safe. I don't think that's a blocking problem for this PR.

I didn't run the samples/tutorials, but I assume the change has been thoroughly verified?

Thanks! Yes, I tried to make as few breaking changes as possible. The examples and tutorials should all reflect those changes.

sarahyurick and others added 3 commits February 18, 2025 14:35

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>
run black

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 18, 2025

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 19, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 25, 2025

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 25, 2025

Verified

This commit was signed with the committer’s verified signature.
sarahyurick Sarah Yurick
Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuci Run GPU CI/CD on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants