Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LicenseFinder cannot determine the license of some npm packages #840

Open
WIStudent opened this issue Jul 2, 2021 · 15 comments
Open

LicenseFinder cannot determine the license of some npm packages #840

WIStudent opened this issue Jul 2, 2021 · 15 comments

Comments

@WIStudent
Copy link

I tried running LicenseFinder on a large npm project and noticed that it could not determine the license of some npm packages (21 of 384). This surprised me because all 21 packages name a license (like MIT or Apache-2.0 for example) in their package.json file.

I tried reading through the source code to better understand how LicenseFinder determines the license of an npm package. Is it correct that LicenseFinder looks for a LICENSE file inside the package? I checked the 21 packages an none included a LICENSE file (although some included a LICENSE.md file). One of these packages is vue-template-compiler for example.

If that is the case, I would suggest falling back to the license field inside the package.json if no LICENSE file could be found.

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@WIStudent
Copy link
Author

I guess I was wrong, hls.js contains both a LICENSE file and a license field in its package.json, but LicenseFinder cannot determine its license (should be Apache 2.0).

@WIStudent
Copy link
Author

Well, this is strange: I created a new npm project and installed all dependencies for which LicenseFinder could not determine the license in the previous project. In this new project LicenseFinder could determine all those licenses.

@WIStudent
Copy link
Author

I debugged a bit into it: Turns out

npm list --json --long

does not reliably include license fields in its output. In my bigger project it included the license field only for the root package but not for any of the dependencies. This then caused spec_licenses to be empty.

def initialize(npm_json)
@json = npm_json
@identifier = Identifier.from_hash(npm_json)
@dependencies = deps_from_json
super(@identifier.name,
@identifier.version,
description: npm_json['description'],
homepage: npm_json['homepage'],
spec_licenses: Package.license_names_from_standard_spec(npm_json),
install_path: npm_json['path'],
children: @dependencies.map(&:name))
end

I think a more reliable approach would be to take the license information from the package.json file of the actual package. Here is something I quickly hacked together (first time writing ruby code):

def initialize(npm_json)
  install_path = npm_json['path']
  p install_path
  package_path = install_path.nil? ? nil : File.join(install_path, "package.json")
  package_json = !package_path.nil? && File.file?(package_path) ? JSON.parse(File.read(package_path), max_nesting: false) : npm_json
  spec_licenses = Package.license_names_from_standard_spec(package_json)
  p spec_licenses
  @json = npm_json
  @identifier = Identifier.from_hash(npm_json)
  @dependencies = deps_from_json
  super(@identifier.name,
        @identifier.version,
        description: npm_json['description'],
        homepage: npm_json['homepage'],
        spec_licenses: spec_licenses,
        install_path: install_path,
        children: @dependencies.map(&:name))
end

@timhaines
Copy link

Hi @WIStudent - FWIW, license_finder does check for licenses in the package.json (via npm list) before checking for license files.

In the first instance, were the packages reported, but licenses not found? I noticed NPM v7 doesn't list packages beyond immediate dependencies by default. I opened an issue about it a while back. #834

Is it possible this change in behaviour in NPM explains what you're seeing?

@WIStudent
Copy link
Author

@timani I switched a lot between npm versions, so I am not sure anymore which one I used in the tests above. But I just noticed that the output of npm list --json --long depends on the current npm version, the npm version that was used to install the project, and whether the -a option was included.

Install with v6.14.13 / run with v6.14.13 / no -a option

  • Transitive dependencies are included, packages inside the json file have the "license" field
  • license_finder reports 25 of 2264 with unknown licenses

Install with v6.14.13 / run with v7.18.1 / no -a option

  • No transitive dependencies are included, "license" field only exist for root package
  • license_finder reports 1 of 132 with unknown licenses

Install with v6.14.13 / run with v7.18.1 / with -a option

  • Transitive dependencies are included, "license" field only exist for root package
  • license_finder reports 11 of 2459 with unknown licenses

Install with v7.18.1 / run with v6.14.13 / no -a option

  • Transitive dependencies are included, packages inside the json file have the "license" field
  • license_finder reports 16 of 740 with unknown licenses

Install with v7.18.1 / run with v7.18.1 / no -a option

  • No transitive dependencies are included, "license" field only exist for root package
  • license_finder reports 11 of 132 with unknown licenses

Install with v7.18.1 / run with v7.18.1 / with -a option

  • Transitive dependencies are included, "license" field only exist for root package
  • license_finder reports 146 of 2236 with unknown licenses

Sadly I don't know how I got my original 21 of 384 unknown licenses.

By the way, you can pass the -a flag to npm list using license_finders --npm-options flag

license_finder report --format=html --save=license-report.html --npm-options="\-a"

@timhaines
Copy link

Are there actually over 2000 unique packages? I'd be interested in getting a (sanitized) copy of your package.json to have a play too.

@WIStudent
Copy link
Author

@timhaines I cannot share the package.json unfortunately, it's a work related project that's not open source. Basicly it's an android/iOS app using capacitor + vue and aws-amplify for backend communication. The dev dependencies mostly consist of testing (jest and cypress), linting (eslint), packaging (webpack, babel, typescript) and some dependencies for own build/deploy scripts.

I checked the package-lock.json. According to the docs, when using npm7 the packages field should contain every unique package. It contains 1377 prod and 2788 dev packages. The package.json contains 54 dependencies and 77 devDependencies (although I just noticed that 3 dependencies should actually be devDependencies).

A while ago I created a simple vue3 project using the vue-cli to checkout some new vue3 features. Although this project only has a total of 15 dependencies in its package.json, according to Github's dependency graph it's dependency tree contains 1018 dependencies.

Because everything gets bundled by webpack, almost nothing of the dependency tree lands in the final output. This just gave me the idea to further research if there are any webpack plugins that can list the licenses of any package that gets bundled into the final output. I stumbled across LicenseFinder because we are using Gitlab at work and Gitlab seems to use it as a base for their own integrated license scanning.

@xtreme-shane-lattanzio
Copy link
Contributor

I don't see where the code is looking into a LICENSE file. As @timhaines pointed out, it is getting it from the package json itself. @WIStudent did your hack solution actually get you better output? If it did we can look at getting a PR through but I think there may be NPM version issues that could be affecting this

@WIStudent
Copy link
Author

@xtreme-shane-lattanzio I think there are multiple issues with npm7:

The second point is the reason why I initially thought LicenseFinder would only search for LICENSE files. Because the output of npm list does not include any licenses, the spec_licenses passed to the constructor of the superclass is always an empty array. And because install_path is set to the directory of the package, LicenseFinder will then search this directory for licenses. At least that's the behavior that is documented in the superclass.

# Super-class that adapts data from different package management
# systems (gems, npm, pip, etc.) to a common interface.
#
# Guidance on adding a new system
#
# - subclass Package, and initialize based on the data you receive from the
# package manager
# - if the package specs will report license names, pass :spec_licenses in the
# constructor options
# - if the package's files can be searched for licenses pass :install_path in
# the constructor options
# - otherwise, override #licenses_from_spec or #license_files
class Package

My hacky sollution fixed my issue that the license was undefined for some of the found dependencies (but not the issue, that not all dependencies were found). It did by taking the install_path directory that was taken from the npm list output, looking for the package.json file inside that directory and reading the license information from that package.json file.

Npm itself admits that the output of npm list is not the best and warns that there will probably be significant changes with npm8.

Starting with npm7, the package-lock.json contains every package inside the dependency tree in a flat array. The usage of npm7 can be detected by "lockfileVersion": 2. I think instead of relying on npm list it would be much simpler to take the paths to every installed package from the package-lock.json, follow these paths to each package.json, read the license information from the package.json file and additionally pass the path to install_path, so that LicenseFinder can also search for license files (or whatever it does when both spec_licenses and install_path are present).

@xtreme-shane-lattanzio
Copy link
Contributor

I think you have a good handle on this and I am not sure when we can prioritize this so please feel free to make a PR if you want to get this in!

@Mistic92
Copy link

Mistic92 commented Feb 7, 2023

Hi, any updates regarding this? There is already npm v9 and now we get empty licences list.

@WIStudent
Copy link
Author

@Mistic92 I stopped using LicenseFinder and am now using webpack/rollup plugins instead to determine the packages that get bundled into the final output and their licenses.

@Mistic92
Copy link

Mistic92 commented Feb 7, 2023

@WIStudent but webpack/rollup is only for single language and if we don't use this packers it won't work. Looks like Pivotal is not investing a lot of time on this tool anymore.

@WIStudent
Copy link
Author

@Mistic92 There is another issue #916 that asks for support for npm 7 and newer, but there doen't seem to be any progress either. I guess most people that need license detection in npm dependencies moved to other solutions like I did with webpack/rollup plugins for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants