Introducing Delta Lake 3.0

New release unifies lakehouse storage formats and reinforces Delta Lake as the 'best choice' for building an open lakehouse .

  • Wednesday, 28th June 2023 Posted 1 year ago in by Phil Alsop

Databricks has introduced the latest contribution to award-winning Linux Foundation open source project Delta Lake, with the release of Delta Lake 3.0. The upcoming release introduces Universal Format (UniForm), which allows data stored in Delta to be read from as if it were Apache Iceberg or Apache Hudi. UniForm takes the guesswork out of choosing an open data format and eliminates compatibility headaches by offering automatic support for Iceberg and Hudi within Delta Lake. Delta Lake 3.0 will allow users to eliminate the complicated integration work caused by different data formats and focus on building highly-performant, open lakehouses. 

 

“Databricks created the lakehouse architecture, which is built on Delta Lake. We're committed to making Delta Lake the open format that gives customers the most choice and flexibility, greatest control of their own data, and all the benefits of an open ecosystem,” said Ali Ghodsi, Co-Founder and CEO at Databricks. “Customers shouldn’t be limited by their choice of format. With this latest version of Delta Lake, we’re making it possible for users to easily work with whatever file formats they want, including Iceberg and Hudi, while still accessing Delta Lake’s industry-leading speed and scalability.” 

 

Eliminating Data Silos

 

Enterprises are rapidly adopting the data lakehouse architecture in a shift away from costly, proprietary data warehouses, which offer limited functionality and cannot support advanced use cases like generative AI. Until now, data-driven organisations moving to the lakehouse have had to weigh their options and choose between three different open table formats. With UniForm, customers can move towards interoperability, and benefit from a combined ecosystem of tools that read from Delta, Iceberg and Hudi. 

 

Delta Lake 3.0 will make it possible for businesses everywhere to access the breadth of their corporate data — from transactional to streaming, structured and unstructured, across any kind of format — in a highly performant manner. New functionality includes:

Delta Universal Format (UniForm): Now, data stored in Delta can be read from as if it were Iceberg or Hudi. With UniForm, Delta automatically generates metadata needed for Iceberg or Hudi, and thus unifies the table formats so users don’t have to choose or do manual conversions between formats. Organisations can confidently bet on Delta as the universal format that will work across ecosystems and can scale to support the changing needs of their business.   

Delta Kernel: To address connector fragmentation, Kernel will ensure connectors are built against a core Delta library that implements Delta specifications, alleviating the need for users to update Delta connectors with each new version or protocol change. With one stable API to code against, developers in the Delta ecosystem are able to seamlessly keep their connectors up-to-date with the latest Delta innovation, without the burden of having to rework connectors. In turn, users can quickly take advantage of the latest Delta features and updates. 

Delta Liquid Clustering: One of the common challenges that companies face in implementing data use cases is related to performance for reads and writes. The introduction of Liquid Clustering is an innovative leap from decades old hive-style table partitioning that uses a fixed data layout. Delta Lake is introducing a flexible data layout technique that will provide cost efficient data clustering as data grows, which will help companies meet their read and write performance requirements.   

 

"Delta Lake 3.0, including Universal Format and Kernel, underlines the open source community’s dedication to enhancing data reliability and delivering advanced analytics. This release is a step forward in creating a community-driven ecosystem of data integrity, seamless collaboration, and real-time analytics tools,” said Mike Dolan, SVP of Projects, The Linux Foundation.

 

Delta Lake helps organisations leverage data from hundreds of disparate systems to analyse data for insights, reporting and building AI models. With this update, Delta Lake continues to build on its unrivalled performance and user-friendly interface. Delta Lake is the only open format that has built-in support for Delta Sharing, the open standard for secure data exchange, which fosters an open data ecosystem that thrives on collaboration across platforms, clouds and regions. Today, over 6,000 active data consumers are exchanging more than 300PB of data everyday.   

 

“Collaboration and innovation in the financial services industry are fueled by the open source community and projects like Legend, Goldman Sachs’ open source data platform that we maintain in partnership with FINOS,” said Neema Raphael, Chief Data Officer and Head of Data Engineering at Goldman Sachs. “We’ve long believed in the importance of open source to technology’s future and are thrilled to see Databricks continue to invest in Delta Lake. Organisations shouldn’t be limited by their choice of an open table format and Universal Format support in Delta Lake will continue to move the entire community forward.”