Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate licenses we fail to detect on pub.dev #1015

Open
jonasfj opened this issue Feb 10, 2022 · 3 comments
Open

Investigate licenses we fail to detect on pub.dev #1015

jonasfj opened this issue Feb 10, 2022 · 3 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@jonasfj
Copy link
Member

jonasfj commented Feb 10, 2022

Result from running analysis on pub.dev

  11176 MIT
   4378 unknown
   4351 null
   4124 BSD-3-Clause
   2116 Apache-2.0
    949 BSD-2-Clause
    420 GPL-3.0
     83 MPL-2.0
     73 LGPL-3.0
     69 Unlicense
     58 AGPL-3.0
     25 BSD-2-Clause-Views
     21 LGPL-2.1
      7 WTFPL
      7 EUPL-1.2
      6 CPL-1.0
      6 CC0-1.0
      5 OFL-1.1
      4 Zlib
      4 BSL-1.0
      3 EPL-2.0
      3 CC-BY-4.0
      3 Artistic-2.0
      2 OpenSSL
      2 MulanPSL-2.0
      2 MulanPSL-1.0
      2 Hippocratic-2.1
      2 CC-BY-SA-4.0
      2 AFL-3.0
      1 X11
      1 W3C-20150513
      1 UPL-1.0
      1 MS-PL
      1 EPL-1.0
      1 CC-BY-SA-3.0
      1 CC-BY-NC-SA-3.0
      1 BSD-4-Clause

Code for running this analysis:

get_license() { 
  curl -s "https://pub.dev/api/packages/$1/metrics" | jq .scorecard.panaReport.licenseFile.name -r ;
}
export -f get_license

curl -s https://pub.dev/api/package-names | jq .packages[] -r | parallel -j 50 get_license > /tmp/detected-licenses

cat /tmp/detected-licenses | sort | uniq -c | sort -n -r

This should be easy to tweak to get names of the packages..

@jonasfj
Copy link
Member Author

jonasfj commented Mar 16, 2022

To download all raw license files from pub.dev, one can use a script like:

# Get all package names
get_all_package_names() { curl -s https://pub.dev/api/package-names | jq -r .packages[]; }

# Given a package name, get archive URL for latest version
get_archive_url() { curl -sL "https://pub.dev/api/packages/$1" | jq -r .latest.archive_url; }

# Given a package name, get LICENSE file from latest version
get_license() { curl -sL $(get_archive_url "$1") | tar -xzO --ignore-case LICENSE 2> /dev/null; }

# Given a package name, download license to LICENSE-<package>.txt
download_license() { get_license "$1" > "LICENSE-$1.txt"; }

export -f get_all_package_names
export -f get_archive_url
export -f get_license
export -f download_license

get_all_package_names | parallel -j 100 download_license

This is a bit traffic heavy, don't run it frequently 🤣

EDIT: Fixed README.md to LICENSE 🙈

@sigurdm
Copy link
Contributor

sigurdm commented Mar 17, 2022

Nit: I think the script as presented gets README.md files, not LICENSE files....

@alestiago
Copy link

Excluding those packages without any license file, the following chart shows the current split for October 2023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

3 participants