Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeIterator for processing Runnable in Thread.start #229

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

errt
Copy link
Collaborator

@errt errt commented Nov 7, 2024

Fixes #227

@errt errt requested a review from johannesduesing November 7, 2024 12:39
Copy link
Collaborator

@johannesduesing johannesduesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that was fast, thanks for looking into the issue! 🤓

Code-wise these changes look good to me, and the initial issue has been fully solved. However, since i had my testbench still available from debugging the issue #227, i did some further tests and ran into weird behavior. I initialized a Project (same Demo.class as in #227) in three different ways:
Only the Project Class

val project = Project(new java.io.File("Demo.class"))

Project and JDK (Interfaces)

val project = Project(new java.io.File("Demo.class"), RTJar)

Project and JDK (full)

val libFiles = Java17Framework.AllClassFiles(Seq(RTJar))
val projectFiles = Java17Framework.AllClassFiles(Seq(new java.io.File("Demo.class")))
val project = Project(projectFiles, libFiles, libraryClassFilesAreInterfacesOnly = false)

I tested each instance with the default domain and with explicitly setting the L2 domain, and used different CG Keys to find out whether they'd mark println as reachable. These are my results:

Default Domain L2 Domain
Only Project File CHA, RTA, XTA, CFA_1_1 CHA, RTA, XTA, CFA_1_1
Project and RTJar (Interfaces) CHA CHA
Project and RTJar (Full) None None

The initial issue seems solved: If i'm only loading the project itself, all CG-Keys i tested find the correct invocation.

The issue with not finding the correct call when fully loading the RTJar might be due to the way i load the ClassFile instances - during debugging i saw that they did not rewrite Lambdas and used an INVOKEDYNAMIC instead of an INVOKESTATIC which the other two ways of building the project did.

However, i am very confused about the fact that when i'm loading the RTJar with interfaces only, CHA will detect the invocation, but RTA, XTA and CFA_1_1 will not. I will try to understand what's happening here, but so far i don't know why loading library files would influence the results of this computation.

@errt
Copy link
Collaborator Author

errt commented Nov 7, 2024

That is weird behavior indeed. Would be helpful if you could debug why that happens. As a first check, I think it would be interesting what happens in the full RTJar case if you do rewrite invokedynamic.
Then again, maybe we should look into whether we can make this make even with invokedynamic not rewritten, since this will probably be a common case for threads and also for other APIs such as doPrivileged.

@johannesduesing
Copy link
Collaborator

After some more debugging i found what i think is the reason for RTA (and i guess XTA and CFA_1_1) not working when loading the RTJar with interfaces only. The reason is not the "complicated" part of resolving the Thread.start -> Runnable.run relation - that actually works perfectly - it's resolving PrintStream.println.

The typeIterator in org.opalj.tac.fpcs.analyses.cg.CallgraphAnalysis.scala:156 never knows that java/io/PrintStream is instantiateable - which makes sense since its constructor is never called when the RTJar implementations are not loaded. I don't know if there should be some handling for this in place, or if this is desired behavior.

I'll go on looking into the issue when loading the RTJar fully.

@errt
Copy link
Collaborator Author

errt commented Nov 7, 2024

I thought we had things like PrintStream configured in the tac reference.conf, but it seems this is not really the case if the RTJar is not fully loaded. Maybe we need to look into whether and how we should configure more such types in that case. Why does it work without the RTJar present at all, though?

@johannesduesing
Copy link
Collaborator

  val mResult = if (classHierarchy.isInterface(declType).isYes)
      org.opalj.Result(project.resolveInterfaceMethodReference(declType, call.name, call.descriptor))
  else
      org.opalj.Result(project.resolveMethodReference(
          declType,
          call.name,
          call.descriptor,
          forceLookupInSuperinterfacesOnFailure = true
      ))

  if (mResult.isEmpty) {
      unknownLibraryCall(
          callContext,
          call.name,
          call.descriptor,
          call.declaringClass,
          isStatic = false,
          declType,
          callContext.method.definedMethod.classFile.thisType.packageName,
          pc,
          calleesAndCallers
      )
  } else if (isMethodOverridable(mResult.value).isYesOrUnknown) {
      calleesAndCallers.addIncompleteCallSite(pc)
  }

This piece of code from CallGraphAnalysis.scala is the reason it works without the RTJar present. It triggeres a call to unknownLibraryCall if no declared method can be found for the call, which ultimately adds the println call to the CG. Of course, if the RTJar is loaded (even as iterfaces), mResult will not be empty, and unknownLibraryCall will not be invoked.

@errt
Copy link
Collaborator Author

errt commented Nov 7, 2024

So essentially the situation is this:

  • The call to the lambda is correctly resolved
  • When resolving PrintStream.println, the is not resolved for anything but CHA because PrintStream is not considered instantiated
  • But if the RTJar isn't present, the code above considers it a possibility that there could be a different, unknown, println method and thus adds this kind of fallback call
    Did I understand that correctly?

If yes, this seems like intended behavior for the most part but we should look into whether further modelling of JDK internals would be sensible in order to make types like PrintStream considered instantiated. I still wonder if it does work with the full RTJar code if invokedynamics are rewritten - does the modelling that we have then cause PrintStream to be considered instantiated?

As a second issue besides potentially better modelling of JDK internals, we might want to look into making these call graph analyses work at least for some cases of invokedynamic.

Do you want to look into these issues?

@johannesduesing
Copy link
Collaborator

I created #231 and #230 to document these issue, i will look into them and see if i can find a solution. I will also have a look into loading the RTJar with code when dynamics are rewritten and report back here.

You did understand the situation correctly, so i guess everything else is working as intended - i'll approve this PR as soon as i got some insights on loading the full RTJar 👍

@johannesduesing
Copy link
Collaborator

For these tests, i initialized my project to rewrite invokedynamics by doing this:

    val libFiles = Project.JavaClassFileReader.AllClassFiles(Seq(RTJar))
    val projectFiles = Project.JavaClassFileReader.AllClassFiles(Seq(new java.io.File("Demo.class")))
    val project = Project(projectFiles, libFiles, libraryClassFilesAreInterfacesOnly = false)

I confirmed that the dynamic invocation is in fact rewritten as expected. I found that both CHA and RTA work in this setting (i.e. resolve the lambda call and the call to println), so we do detect that PrintStream is instantiated for RTA.

However, for XTA it again does not find the call to println, regardless of which AI domain is used - it does resolve the lambda invocations correctly though. I honestly did not yet understand what the issue is - i'm having a hard time debugging this because placing conditional breakpoints in the respective methods incurs a huge performance overhead. I'll try some more, but i don't know if i'll find anything today.

I also don't know if CFA_1_1 (or CFA_1_0) work for the full RTJar, my laptop takes way to long computing those CGs... 🥹

@errt
Copy link
Collaborator Author

errt commented Nov 8, 2024

CFA_1_0 at least should work (but maybe not on 'your laptop', it is memory intensive).

If RTA works but XTA doesn't there are two main possible reasons: XTA doesn't properly treat PrintStream as instantiated because its InstantiatedTypesAnalysis works differently or the type PrintStream doesn't reach the type set for the calling method. Both could be checked by inspecting XTA's type sets after running the CG.

If conditional breakpoints incur a large performance penalty, try putting the condition into the actual code, just place an empty println after it and put an unconditional breakpoint on that. That usually doesn't have a noticeable impact on performance.

@johannesduesing
Copy link
Collaborator

Okay, so here's a few things i found:

  • XTA does not consider PrintStream to be instantiated when processing the call inside the lambda function, the type is not contained in the respective set
  • At some point in the JDK, the tac.fpcf.analyses.cg.xta.InstantiatedTypesAnalysis does detect instantiations of PrintStream, for many JDK methods it is in fact contained in the set of instantiated types.
  • The br.analyses.cg.ConfigurationInstantiatedTypesFinder loads no preconfigured types from the configuration at all - at least for my setup there seem to be no entries for the respective configuration key in the global configuration file.

I didn't yet look into where in the JDK PrintStream is supposed to be initialized, or where RTA found the initialization site, but i think that's what i'll check next.

Copy link
Collaborator

@johannesduesing johannesduesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining issues i had don't really originate from the changes in this PR, i think they are there own separate issue that we documented in #230. This PR restores the functionality of the ThreadRelatedCallsAnalysis as expected and looks good to me 👍

@errt errt added this pull request to the merge queue Nov 13, 2024
Merged via the queue into develop with commit c5c4eae Nov 13, 2024
2 checks passed
@errt errt deleted the fix/type-iterator-for-thread-runnable branch December 13, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of Runnables in Threads when building Call Graphs
2 participants