Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
Data science is a quickly growing technology as organizations of all sizes embrace artificial intelligence (AI) and machine learning (ML), and along with that growth has come no shortage of concerns.
The 2022 State of Data Science report, released today by data science platform vendor Anaconda, identifies key trends and concerns for data scientists and the organizations that employ them. Among the trends identified by Anaconda is the fact that the open-source Python programming language continues to dominate the data science landscape.
Among the key concerns identified in the report was the barriers to adoption of data science overall.
“One area that did surprise me was that 2/3 of respondents felt that the biggest barrier to successful enterprise adoption of data science is insufficient investment in data engineering and tooling to enable production of good models,” Peter Wang, Anaconda CEO and cofounder, told VentureBeat. “We’ve always known that data science and machine learning can suffer from poor models and inputs, but it was interesting to see our respondents rank this even higher than the talent/headcount gap.”
Event
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
AI bias in data science is far from a solved issue
The issue of AI bias is one that is well known for data science. What isn’t as well known is exactly what organizations are actually doing to combat the issue.
Last year, Anaconda’s 2021 State of Data Science found that 40% of orgs were planning or doing something to help with the issue of bias. Anaconda didn’t ask the same question this year, opting instead to take a different approach.
“Instead of asking if organizations were planning to address bias, we wanted to look at the specific steps organizations are now taking to ensure fairness and mitigate bias,” Wang said. “We realized from our findings last year that organizations had plans in the works to address this, so for 2022, we wanted to look into what actions they took, if any, and where their priorities are.”
As part of AI bias prevention efforts, 31% of respondents noted that they evaluate data collection methods according to internally set standards for fairness. In contrast, 24% noted that they do not have standards for fairness and bias mitigation in datasets and models.
AI explainability is a foundational element for helping to identify and prevent bias. When asked what tools are used for AI explainability, 35% of respondents noted that their organizations perform a series of controlled tests to assess model interpretability, while 24% do not have any measures or tools to ensure model explainability.
“While each response measure has less than 50% of these efforts in place, the results here tell us that organizations are taking a varied approach to mitigating bias,” Wang said. “Ultimately, organizations are taking action, they’re just early in their journey of addressing bias.”
How data scientists spend their time
Data scientists have a number of different tasks they need to do as part of their jobs.
While actually deploying models is the desired end goal, that’s not where data scientists actually spend most of their time. In fact, the study found that data scientists only spend 9% of their time on deploying models. Similarly, respondents reported they only spend 9% of their time on model selection.
The biggest time sink is data preparation and cleansing, which accounts for 38% of the time.
The love and fear relationship with open source
The report also asked data scientists about how they use and view open-source software.
Eighty-seven percent responded that their organizations allowed for open-source software. Yet despite that use, 54% of respondents noted that they are worried about open-source security.
“Today, open source is embedded across nearly every piece of software and technology, and it’s not just because it’s cheaper in the long run,” Wang said. “The innovation occurring around AI, machine learning and data science is all happening within the open-source ecosystem at a speed that can’t be matched by a closed system.”
That said, Wang said that it’s understandable for organizations to be aware of the risks involved with open source and develop a plan for mitigating any potential vulnerabilities.
“One of the benefits of open source is that patches and solutions are built out in the open instead of behind closed doors,” he said.
The Anaconda report was based on a survey of 3,493 respondents from 133 countries.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.