Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix][clients] Refactor PackagedProgram#getJobJarAndDependencies to reduce some redundant lines. #26238

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

RocMarshal
Copy link
Contributor

What is the purpose of the change

[hotfix][clients] Refactor PackagedProgram#getJobJarAndDependencies to reduce some redundant lines.

Brief change log

[hotfix][clients] Refactor PackagedProgram#getJobJarAndDependencies to reduce some redundant lines.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 1, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build


private static List<URL> getJobJarAndDependencies(
URL jarFile, List<File> extractedTempLibraries) {
List<URL> libs = new ArrayList<>(extractedTempLibraries.size() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am curious why we initialize the list with a + 1. Is this in case we need to add the Python lib?
If so, I wonder if we should use isPython to allocate the correct size - ie. without Python we would not have the + 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidradl for the comments, That sounds good to me.
I updated it. PTAL :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to hardcode it to extractedTempLibraries.size() + 2 instead of checking isPython or not for 2 reasons:

  1. The expected list size still needs to +1 if (jarFile != null), so I don't think check isPython is enough if we'd like to initialize a correct size.
  2. If we'd like to initialize a proper list size, we need to consider both jarFile and isPython.
    • However, I think it's pretty complicated if checking both.
    • The code base is client related code base instead of data processing, so it's not performance sensitive code.

As performance-awareness part[1] mentioned in Flink code style document:

We can conceptually distinguish between code that “coordinates” and code that “processes data”. 
Code that coordinates should always favor simplicity and cleanness. 
Data processing code is highly performance critical and should optimize for performance.

I agree it, the simplicity and cleanness are more important than performance for coordinates related code. So a simple implementation is + 2 directly. It only requires a little little memory, and it's compatible with both jarFile and isPython.

[1] https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#performance-awareness

@RocMarshal
Copy link
Contributor Author

Hi, @1996fanrui Could you help take a look at the refactor?
Many thx! :)

@1996fanrui 1996fanrui self-assigned this Mar 17, 2025
Copy link
Member

@1996fanrui 1996fanrui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, and left a minor comment~


private static List<URL> getJobJarAndDependencies(
URL jarFile, List<File> extractedTempLibraries) {
List<URL> libs = new ArrayList<>(extractedTempLibraries.size() + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to hardcode it to extractedTempLibraries.size() + 2 instead of checking isPython or not for 2 reasons:

  1. The expected list size still needs to +1 if (jarFile != null), so I don't think check isPython is enough if we'd like to initialize a correct size.
  2. If we'd like to initialize a proper list size, we need to consider both jarFile and isPython.
    • However, I think it's pretty complicated if checking both.
    • The code base is client related code base instead of data processing, so it's not performance sensitive code.

As performance-awareness part[1] mentioned in Flink code style document:

We can conceptually distinguish between code that “coordinates” and code that “processes data”. 
Code that coordinates should always favor simplicity and cleanness. 
Data processing code is highly performance critical and should optimize for performance.

I agree it, the simplicity and cleanness are more important than performance for coordinates related code. So a simple implementation is + 2 directly. It only requires a little little memory, and it's compatible with both jarFile and isPython.

[1] https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#performance-awareness

@RocMarshal
Copy link
Contributor Author

RocMarshal commented Mar 17, 2025

Thank you @1996fanrui for your comments and explanations.
Nice idea! 👍
This change makes sense to me.
Updated.

…o reduce some redundant lines.

Co-authored-by: Rui Fan <[email protected]>
Co-authored-by: davidradl <[email protected]>
@RocMarshal
Copy link
Contributor Author

@flinkbot run azure

@RocMarshal
Copy link
Contributor Author

Hi, @davidradl could you help have a double-check ? Because your suggestions are important & indispensable to the pr. 👍

Any input is appreciated!

@1996fanrui
Copy link
Member

Let me merge it now as no more comments for over a week~

@1996fanrui 1996fanrui merged commit c6c90b8 into apache:master Mar 26, 2025
@RocMarshal
Copy link
Contributor Author

Thank you for your attention and review. @1996fanrui @davidradl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants