Ever wonder which programming languages are the most-used in machine learning? How about which artificial intelligence (AI) and data science packages are tapped by developers more frequently than all others? GitHub resolved a few of those mysteries today, in a follow-up to the 2018 Octoverse report it published in October.
The Microsoft-owned platform pulled info on contributions — e.g., pushing code, opening an issue or pull request, commenting on an issue or pull request, or reviewing a pull request — between January 1, 2018 and December 31, 2018. For the most-imported packages, they used data from GitHub’s dependence graph, which includes all public repositories and any private repositories that have opted in.
As for the top packages, Numpy — a package with support for mathematical operations on multidimensional data — is far and away the leader by volume, with three-quarters of AI projects on GitHub using it. The next three most-imported packages — scientific computation toolkit Scipy, dataset management tool Pandas, and visualization library matplotlib — are used in over 40 percent of projects, as is scikit-learn (the fifth-most imported package).
So what about the most popular open source machine learning projects? Google’s open source TensorFlow framework topped the list, followed by scikit-learn and two natural language processing projects, explosion/spaCy and RasaHQ/rasa_nlu. The next four top projects are focused on image processing: CMU-Perceptual-Computing-Lab/openpose, thtrieu/darkflow, ageitgey/face_recognition, and tesseract-ocr/tesseract.