We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
The Apache Software Foundation (ASF) this week updated an open source Apache Drill tool that enables end users to query multiple data sources using SQL — without waiting for enterprise IT teams to create schemas and set up pipelines.
End users can download Drill 1.19 to launch queries against Apache Cassandra, Elasticsearch, and Splunk platforms, in addition to querying XML files and REST application programming interfaces (APIs) without any schema required.
Other capabilities include support for the Avro protocol plugins based on the Apache Kafka messaging platform; Apache Airflow software for managing workflows; integrated password vaults to secure credentials; and Linux ARM64 systems.
Apache Drill first emerged as a SQL-based query engine designed to enable end users to interrogate data stored in NoSQL Apache Hadoop platforms. Since then, the number of data sources has steadily increased to the point that end users are employing the tool to interrogate data wherever it resides, said Charles Givre, vice president of Apache Drill and CEO of DataDistillr, a provider of SQL query tools based on Apache Drill.
That’s critical because organizations struggle to aggregate all their data within a single data warehouse, Givre added. “It’s practically impossible to get all your data in a data lake,” he said.
Just as problematic, there’s usually a significant time delay between when new data is created by an application and when that data becomes available in a data warehouse or data lake, Givre said. But Apache Drill makes it easier to launch SQL queries against the freshest set of data available, regardless of where it resides, he said.
In some cases, data science teams are setting up complex processes to analyze datasets when they could accomplish the same tasks more easily using Apache Drill to join two or more datasets without having to ever move any data, he added.
How it works
IT organizations have for some time been trying to strike a balance between centrally managing data and enabling end users to interactively query data as they see fit. In many cases, end users have gotten around IT departments by setting up their own platforms and query tools. Beyond governance issues that might create, the data a business unit is employing to make decisions is usually out of sync with the data the rest of the business relies on.
Most enterprise IT teams don’t have the political capital required to ban business units from using a given tool, however. Instead, Givre said they should focus on striking a balance between end users’ need to easily query data as it becomes available and the need to manage terabytes of historical data that might reside in a data warehouse.
Regardless of the path organizations opt for when it comes to managing data, the number of tools and platforms for querying data is continuing to explode. The issue now is determining to what degree organizations should limit access to tools sanctioned by their IT team.
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.