Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor block adapter walker #8575

Merged
merged 5 commits into from
Feb 6, 2025
Merged

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Jan 31, 2025

Closes #8570

Change Description

Background

Remove usage of ingest store factory and catalog walker interface.
Catalog will now use underlying block adapter implementation of walker

@N-o-Z N-o-Z added exclude-changelog PR description should not be included in next release changelog msb labels Jan 31, 2025
@N-o-Z N-o-Z requested review from guy-har and a team January 31, 2025 17:55
@N-o-Z N-o-Z self-assigned this Jan 31, 2025
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link

github-actions bot commented Jan 31, 2025

E2E Test Results - Quickstart

11 passed

…-factory-walker-8570

# Conflicts:
#	pkg/catalog/catalog.go
#	pkg/ingest/store/factory.go
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice clean up 🧹
I'm assuming our integration tests already test the import flows for all adapters, but I'm asking just to be on the safe side, how was this tested?

return nil, err
}

err := VerifyAbsPath(uri.Path, l.path, l.allowedExternalPrefixes)
uriPath := strings.TrimSuffix(opts.StorageURI.Path, string(filepath.Separator))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something
Why is this suddenly required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was never called when creating a local walker so this validation never ran.
The Abs check is not really important, what is more important are the other checks for is in namespace and in allowed prefixes.

return NewAzureDataLakeWalker(client, false)
return NewAzureDataLakeWalker(client, opts.SkipOutOfOrder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed? bug you found along the way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - we just never used the GetWalker in that capacity. Now that we use it we need to propagate the value accordingly

}
walker, err := block.NewWalkerWrapperFromAdapter(c.BlockAdapter, repository.StorageID.String(), source.Path, block.WalkerOptions{StorageURI: uri})
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return err
return fmt.Errorf("creating object-store walker on path %s: %w", source.Path, err)

}
walker, err := block.NewWalkerWrapperFromAdapter(c.BlockAdapter, repository.StorageID.String(), params.SourceURI, block.WalkerOptions{StorageURI: uri, SkipOutOfOrder: true})
if err != nil {
return nil, nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, nil, err
return nil, nil, fmt.Errorf("creating object-store walker: %w", err)

@@ -2504,10 +2499,13 @@ func (c *Catalog) WriteRange(ctx context.Context, repositoryID string, params Wr
return nil, nil, err
}

// TODO (niro): Need to handle this at some point (use adapter GetWalker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤩

@@ -412,13 +405,13 @@ func New(ctx context.Context, cfg Config) (*Catalog, error) {
workPool := pond.New(sharedWorkers, sharedWorkers*pendingTasksPerWorker, pond.Context(ctx))

return &Catalog{
Config: cfg.Config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this is this needed? it's not used anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR - removed

pkg/block/adapter.go Show resolved Hide resolved
Comment on lines +234 to +239
func NewWalkerWrapper(walker Walker, uri *url.URL) *WalkerWrapper {
return &WalkerWrapper{
walker: walker,
uri: uri,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? I only see calls from tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for tests

Comment on lines 241 to 250
func NewWalkerWrapperFromAdapter(adapter Adapter, storageID, path string, opts WalkerOptions) (*WalkerWrapper, error) {
walker, err := adapter.GetWalker(storageID, opts)
if err != nil {
return nil, fmt.Errorf("creating object-store walker on path %s: %w", path, err)
}
return &WalkerWrapper{
walker: walker,
uri: opts.StorageURI,
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like to many ways to initiate the WalkerWrapper, IMO leaving only NewWalkerWrapper is nicer, NewWalkerWrapperFromAdapter not worth the 1 line of code it reduces. It's a matter of taste though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -867,6 +867,7 @@ func TestLakectlFsStat(t *testing.T) {

func TestLakectlImport(t *testing.T) {
// TODO(barak): generalize test to work all supported object stores
const IngestTestBucketPath = "s3://esti-system-testing-data/ingest-test-data/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move up or use inline

@N-o-Z
Copy link
Member Author

N-o-Z commented Feb 5, 2025

Very nice clean up 🧹 I'm assuming our integration tests already test the import flows for all adapters, but I'm asking just to be on the safe side, how was this tested?

The existing esti tests cover the changes for all blockstore types

@N-o-Z N-o-Z requested a review from guy-har February 5, 2025 21:43
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Approving, but please remove unused config

pkg/block/adapter.go Outdated Show resolved Hide resolved
Comment on lines 242 to 251
func NewWalkerWrapperFromAdapter(adapter Adapter, storageID, path string, opts WalkerOptions) (*WalkerWrapper, error) {
walker, err := adapter.GetWalker(storageID, opts)
if err != nil {
return nil, fmt.Errorf("creating object-store walker on path %s: %w", path, err)
}
return &WalkerWrapper{
walker: walker,
uri: opts.StorageURI,
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed

SettingsManagerOption settings.ManagerOption
PathProvider *upload.PathPartitionProvider
}

type Catalog struct {
Config config.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still here

@N-o-Z N-o-Z enabled auto-merge (squash) February 6, 2025 16:25
@N-o-Z N-o-Z merged commit 7017f27 into master Feb 6, 2025
38 checks passed
@N-o-Z N-o-Z deleted the task/catalog-remove-factory-walker-8570 branch February 6, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog msb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove usage of WalkerFactory and instead get walker from block adapter
2 participants