One of the most popular tools in data science today is the open-source programming language R. Simply put, R is the language of data. Over the last 20 years, statisticians all over the world have contributed their innovations to open source R. These contributions mean that developers have access to a large library of cutting-edge scientific algorithms that make it possible to rapidly build intelligent analytics applications.
Already we’re seeing the capabilities of R bear fruit across companies both new and traditional: Norway’s eSmart Systems has been deploying R-based forecasting models in the cloud to help optimize the country’s power grid using data from smart meters. American Century Investments is using R as the basis for its quantitative investment platform. The National Weather Service uses R in its River Forecast Centers to help predict flooding. Real-estate analysis company Trulia uses R to help predict home prices. R is part of Twitter’s Data Science Toolbox, used for monitoring the site’s user experience. The list goes on.
But despite this widespread use, we’re really just beginning to understand the power of today’s advanced statistical platforms. Over the next five to 10 years, we’re going to see machine learning and analytics drive intelligence in just about every software application, Internet device, and mobile phone. With so many challenges to solve, the industry must ensure it is putting the right tools into the hands of those looking for answers in these vast treasure troves of data.
While the R Foundation has helped foster pioneering work to support the development and distribution of the R language, there is much more to be done to enable developers worldwide to take full advantage of the possibilities of R in the enterprise. There are three major areas where industry support could help accelerate the progress of R:
1. Testing: Robust software testing methods and infrastructure to help the development of new versions of R packages would be very beneficial to the community. Ensuring high quality of release candidates and maintaining backward compatibility would aid in the reproducibility and reliability of R-based code within the enterprise.
2. Scalability: Current popular implementations of R are main-memory limited. However, the datasets being analyzed today are dramatically bigger than what can fit in the memory of a computer. Supporting efforts to make the language and its implementations natively scalable would make it easier for businesses dealing with extremely large datasets to take full advantage of this powerful scientific language.
3. Future-Proofing: R needs to constantly innovate to ensure that it can continue to be effective in current and future analytic environments such as Hadoop, Spark, and the next generation of databases. This will also require ongoing education and cooperative efforts with the R community and data developers around the world.
The elegance and flexibility of the R language for statistical programming has already enabled significant breakthroughs in finance, healthcare, social sciences, utilities, and manufacturing. With continued support for its development, we can expect to see revolutionary advances in the application of data science and statistics in the new connected world.
The recently announced R Consortium (of which Microsoft is a founding member) has a mission to shepherd the future of R in an open development environment. The R Consortium can help move R forward at a rapid pace that benefits every one of its fans. The consortium’s efforts will create a fertile ground for data science to grow. With strong backing from the tech industry, the R Foundation and R Consortium can continue the work needed to make R a better language for today and the future.
To learn more about the possibilities of programming in R, visit the R Project, and check out this video. To learn about how the technology community is supporting the R language, visit the R Consortium.
Joseph Sirosh is the corporate vice president of the Information Management and Machine Learning (IMML) team in the Cloud and Enterprise group at Microsoft.