Ahana goes deep on AWS to help Presto users set up and query secure data lakes

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Ahana, a company that’s commercializing the open source Presto SQL query engine, has announced a new cloud integration with AWS Lake Formation, a fully-managed service that enables Amazon’s cloud customers to quickly set up data lakes.

The integration forms part of an upcoming AWS Lake Formation partner program, which Amazon is expected to formally launch in the coming weeks.

Founded in April 2020, Ahana pitches itself as the first company to bring Presto-based ad hoc analytics to market. Presto, for the uninitiated, was developed in-house at Facebook a decade ago as a way to help its data scientists and analysts run faster queries on mammoth data sets. The social networking giant open-sourced Presto in 2013, and in the years that followed Presto’s initial creators left the company to launch a fork called PrestoSQL which was recently rebranded as Trino. The creators also launched a separate commercial entity called Starburst.

In short, there are now effectively two main instances of Presto. The original Presto, upon with Ahana is built, falls under the auspices of the Linux Foundation-hosted Presto Foundation, which counts Facebook, Uber, Twitter, and Alibaba as founding members.

Solving complexity

While Presto is a powerful tool for querying both relational and NoSQL databases, data warehouses, and data lakes, it can be complex from a configuration and management standpoint — this is where Ahana enters the fray. The San Mateo, California-based company launched its first commercial product last September — Ahana Cloud is basically “Presto-as-a-service,” easing the deployment and integration of Presto with AWS for companies to query their AWS S3 data lakes.

Above: Ahana Cloud for Presto

Companies use a data catalog such as AWS Glue to simplify data stored on S3 (an Amazon storage service often used as a data lake), translating the data into relational structures (e.g. tables) that can then be queried by an application.

While Ahana already works with Glue and S3 to query companies’ data lakes, Amazon launched AWS Lake Formation back in 2019, designed to help businesses set up and mange data lakes in just a few days. This includes ingesting, cataloging, cleansing, and transforming all their data hosted on S3. AWS Lake Formation adds a number of particularly notable capabilities to the basic S3 data lake, including simplified security management, and just last week AWS added more security features to the mix, including the ability to enforce access controls for individual rows and cells in tables.

Ultimately, by integrating directly with AWS Lake Formation, Ahana now allow its customers to leverage all these additional features.

“Our business relies on providing analytics across a range of data sources for our clients, so it’s critical that we provide both a transparent and secure experience for them,” said Ameer Elkordy, lead data engineer at AI mobility company and Ahana customer Metropolis. “We use Amazon S3 as our data lake and Ahana Cloud for Presto for ad hoc queries on that data lake. Now, with the Ahana and AWS Lake Formation integration, we get even more granular security with data access control that’s easy to configure and native to our AWS stack.”

Ahana cofounder and chief product officer Dipti Borkar added that prior to this integration, security access just wasn’t as granular

“Data platform teams didn’t have an option to control who had access to what data,” Borkar told VentureBeat. “Our customers will be able to query data on an AWS S3 data lake and enforce any security policies defined in AWS Lake Formation — this will give data platform teams strict governance on their data lakes.”

Hey Presto

It has been a busy year in the Presto space. Ahana itself recently closed a $20 million round of funding from a slew of investors that included Alphabet’s venture capital arm GV. And Starburst — the commercial entity behind the Presto offshoot Trino — raised $100 million at a whopping $1.2 billion valuation.

So is there room for two companies dedicated to commercializing a Presto-based SQL query engine? That big-name investors such as GV, Andreessen Horowitz, Coatue, and Salesforce are plowing their money into the likes of Ahana and Starburst suggests that there is. Moreover, all signals indicate that there won’t be a winner-takes-all scenario, given the distinctive focuses of Ahana / Presto and Starburst / Trino.

Girish Baliga, a senior engineering manager at Uber and chairperson for the Presto Foundation’s governing board, acknowledged that while there are multiple SQL engines out there, Presto is forging a path that’s focused largely on data lakes.

“With open source, projects often diverge based on their philosophies — for Presto, our focus is the data lake and building the fastest, open source engine for that, with some focus on federation,” he told VentureBeat. “Other projects may have different focuses. Over time, we believe the data lake is where most of the data will be.”

Moreover, Ahana has elected to offer managed services for a single cloud provider (AWS) for the time being, though it is worth noting that Presto can be deployed anywhere, and Ahana helps companies do just that through its participation in the open source community.

“Ahana is focused on Presto for AWS for two key reasons,” added Borkar, who also serves as chairperson for the Presto Foundation’s outreach committee. “AWS has the most advanced data-related services, and the majority of the market is there. From a managed service perspective, we run on Kubernetes and use an in-VPC deployment approach which separates the control plane and the compute plane, making it highly portable and multi-cloud friendly.”

Borkar also confirmed that it plans to extend its managed services to other clouds in the future.

Starburst’s pitch is that while Presto was built “to solve for speed and cost-efficiency of data access at a massive scale” at companies such as Facebook and Uber, Trino is designed to bring the power of Presto to a “broad array of companies in varying stages of cloud adoption.” Just last week, Starburst launched a fully-managed cross-cloud analytics service that allows companies to query data hosted on any of the “big three’s” infrastructure without shifting the data from its original location.

Finally, one of the likely reasons that Presto’s original creators rebranded the PrestoSQL fork as Trino was to ensure there was no lingering confusion between the two open source brands. And this points to another potential selling point of Presto — from a marketing standpoint, if nothing else.

“There is only one Presto — there are many forks of Presto, but they are not Presto,” Borkar said. “Presto is a community-driven project under the Linux Foundation. This is where Kubernetes and Node and other great projects live. Presto is what is running at Facebook on thousands and thousands of nodes; Presto is what gets tested and validated at a massive scale; Presto is what gets used by half the employees of Uber on a monthly basis.”

VentureBeat

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Source: Read Full Article