Simplifying the route to lakehouses

With Databricks Ingest and new partner integrations, data teams can easily populate lakehouses, which combine the best of data lakes and data warehouses.

  • Tuesday, 25th February 2020 Posted 4 years ago in by Phil Alsop

Databricks has introduced an accelerated path for data teams to unify data management, business intelligence (BI) and machine learning (ML) on one platform. The new Data Ingestion Network of partners and Databricks Ingest bring data teams closer to building the new data management paradigm, lakehouse, which combines the best elements of data lakes and data warehouses, enabling BI and ML on all of a business’s data. 

 

Historically, companies have been forced to split up their data into traditional structured data and big data, and use them separately for BI and ML use cases. This results in siloed data in data lakes and data warehouses, slow processing, and partial results that are too delayed or too incomplete to be effectively utilized. Customers can now load data into Delta Lake, the open source technology for building reliable and fast lakehouses at scale, through the Data Ingestion Network of partners - Fivetran, Qlik, Infoworks, StreamSets, Syncsort - with built-in integrations to Databricks Ingest for automated data loading. Azure Databricks customers already benefit from native integration with Azure Data Factory to ingest data from many sources. 

 

“Databricks powers our machine learning and business intelligence across multiple business functions, from car inventory management, to price prediction and technical operations, by using hundreds of terabytes of data,” said Greg Rokita, executive director, Technology at Edmunds. “Our data vision is fully aligned with the lakehouse approach, and our cloud data journey starts with Delta Lake which powers our machine learning use cases and executive reporting. We’re excited about Databricks Ingest - it will definitely simplify loading data into our Delta Lake.” 

 

Data teams can load data from various sources - applications like Salesforce, Marketo, Zendesk, SAP, and Google Analytics; databases like Cassandra, Oracle, MySQL, and MongoDB and file storage like Amazon S3, Azure Data Lake Storage, Google Cloud Storage into Delta Lake for all their BI and ML use cases. In addition to the integration network partnerships announced today, Informatica, Segment and Talend integrations will soon be available in an upcoming release.

 

“The Lakehouse paradigm aspires to combine the reliability of data warehouses with the scale of data lakes to support every kind of use case. In order for this architecture to work well, it needs to be easy for every type of data to be pulled in. Databricks Ingest is an important step in making that possible,” says Ali Ghodsi, co-founder and CEO of Databricks.  

 

Additionally, the auto-loading capabilities allow data to continuously flow into Delta Lake, without setting up and maintaining job triggers or schedules. As companies’ data appears in cloud storage from different sources, Databricks Ingest automatically pulls this new data efficiently into Delta Lake. This breaks down the silos so data can be used by teams across a company to deliver data-driven innovation and business value with data science, ML and business analytics.

 

“Fivetran and Databricks allow customers to bring together big data and business context in a single environment. The combined technology stack enables users to perform both cutting-edge machine learning workloads, and traditional business intelligence, in a single unified lakehouse,” said George Fraser, CEO of Fivetran. 

 

“Qlik is the leader in automated and real-time data integration with cloud data warehouses and data lakes, having moved data from more than 200,000 databases with our unique change data capture (CDC) technology for some of the world’s largest enterprises. We are excited that customers will benefit from Qlik’s optimized data delivery for Delta Lake. Databricks’ users now have a more seamless on-ramp to easily unlock and stream data from all of their enterprise sources including mainframes, SAP, databases and data warehouses, by implementing open lakehouses on top of Delta Lake,” said Mike Capone, CEO of Qlik.