Hortonworks is Music to Spotify’s Ears

Hortonworks appointed by Spotify to support largest commercially used cluster in Europe.

  • Tuesday, 17th September 2013 Posted 11 years ago in by Phil Alsop

Hortonworks has expanded its European customer base with the addition of Spotify, the world’s largest and most successful music streaming service. Spotify has selected the Hortonworks Data Platform (HDP) as its standardized Hadoop distribution. HDP will be used to underpin Spotify’s Hadoop infrastructure, believed to be Europe’s largest commercially used cluster, with 690-nodes, while storing data from Spotify’s more than 24 million active users and 6 million subscribers.


Spotify provides its users with a personalized experience that is only possible through superior data analytics. All of Spotify’s services, such as Spotify Radio, are data-driven and the information collected is channelled to further personalize the user experience. For example, geo-location data can determine where an artist has a strong fan-base, helping to optimize concert location decisions.


With an increased focus on data and customer service, Spotify’s Hadoop infrastructure has become mission critical. The company required a partner to provide an integrated and tested Hadoop distribution in order to assist the skilled in-house team with technical questions and support. In addition to service and support, Hortonworks will also run semi-annual health assessments on the 690-node Spotify cluster that is running on the Hortonworks Data Platform.


“The cultural fit was an important factor in our selection and we have appreciated Hortonworks’ relaxed, helpful and open approach,” said Wouter de Bie, team lead for data infrastructure, Spotify. “We were looking for a true partner relationship and the team at Hortonworks are committed to enabling the overall ecosystem – including the vendors we rely on – to leverage Hadoop. Their true open source approach and the work they have done to improve the Apache Hive data warehouse system also aligns well with our needs, as we use Hive extensively for ad-hoc queries and for the analysis of large data sets.”


“Spotify is undertaking some really innovative work in the data analytics field and realized the need for a deep level of open source Apache Hadoop domain experience and expertise. Hortonworks prides itself on the Hadoop pedigree of our many contributors and we very much look forward to applying this knowledge to help Spotify continue to break new ground and drive its business forward in creating a next generation data architecture,” said Herb Cunitz, president, Hortonworks.


Spotify will run HDP, the industry’s only 100-percent open source data platform built on Apache Hadoop, on the Debian operating system. This will enable Hortonworks to offer HDP to customers running either Debian or Ubuntu operating systems in the future. HDP is used by some of the largest customers in the world, and processing some of the biggest data sets ever collected.


Spotify launched its service in 2008 and was a relatively early adopter of Hadoop, starting five years ago with a 30-node cluster. The company’s business model was structured from the beginning to support tens of millions of users with data analytics capabilities that enabled Spotify to deliver music download reports to the record companies. This requirement became more important as new players joined the music streaming market and Spotify looked to data analytics to differentiate itself from the competition by offering customers a superior music discovery experience.