A walk of licenses in PyPI

When I was doing the license research for the Fedora Project I came searching for other projects to do the similar kind of work. I code in Python. So I was searching that which licenses are being commonly used for Python projects. I did my research regarding Python packages and its licensing in Fedora land (will do a blog post of that in future). While writing code I came across the PyPI project. So, planned my next license analysis project around that.

I searched google for more information about the PyPI project. For definition what I found is the following:

PyPI is the canonical place where the Python modules and packages are stored. If we dissect the term
Py - Python,
P - package,
I - index. PyPI works as a repository of software for the Python language. PyPI is otherwise known as "Cheese Shop".

Importance of PyPI

It is the official software repository for Python packages where developers can upload as well as download open source software written in Python. PyPI is blessed by Python Software Foundation. Donald Stufft's enormous work makes this project run.

Presently PyPI has 86351 python packages. To see where the Python world (license wise) is heading up, it is the apt and only place to search for.

My work

To give a shape to my project I choose first 2500 packages from the PyPI ranking website. Then I used the JSON API of PyPI for getting the license information of each package. I wrote a simple Python script for that. So what I found was quite interesting:

The following number of Packages has written their license as under mentioned:

  • no name = 128 packages
  • Unknown = 287 packages.

For these I have tried manually to find the licenses, but still could not find licenses of 49 packages. I am still working on it. So now for my study I will consider 49 packages as unknown.

As evident from the chart. BSD is the most used license, used in 655 packages and MIT is the second one used in 567 packages.

11 packages which mentioned their license as "Public Domain" and 7 packages as unlicense. Very few packages have chosen Python Software Foundation License. 7 packages have mentioned their license as - Open Source Initiative approved but have not put in the name of which license exactly they are talking about. 5 packages have mentioned their license as Creative Commons. Expat is their for 7 packages.

Most of the developers(I have met till now) does not really like large legal text. Here for some strange reasons few developers had chosen to write the whole license document text instead of just writing MIT, or BSD, the license name.

Next Steps

Presently I am filling bugs and submitting patches against these various packages to fix their licensing issues. I will be discussing about various software licenses in my coming blog posts.

Show Comments