Cloudera democratizes Apache Hadoop for enterprise end users

Cloudera has announced the public beta of Cloudera Search, the industry's first fully integrated search engine for interactive exploration of data stored in the Hadoop Distributed File System (HDFS) and Apache HBase(TM).

  • Monday, 10th June 2013 Posted 11 years ago in by Phil Alsop

The latest in a series of innovations from Cloudera designed to simplify and increase Hadoop's usability by more departments of an organization and powered by the leading open source search engine, Apache Solr(TM), Cloudera Search enables anyone within an organization to perform interactive, natural language keyword searches and faceted navigation on data stored in Hadoop, without additional training or advanced programming knowledge.


Cloudera Search was developed to address a rapidly emerging need, as enterprises' Hadoop deployments mature and advance to become the primary repositories for more and more kinds of data: how to better and more quickly combine and refine data into a single, integrated platform. At its core, Cloudera Search incorporates Apache Solr and other search-related open source projects to support a comprehensive big data infrastructure, and to alleviate the significant costs of maintaining the disparate systems that many enterprises currently depend on to execute search queries.
The arrival of Cloudera Search provides the enterprise with breakthrough simplicity and exploration capabilities, so users can drill down deeper into data using full-text and faceted search to solve critical business problems in real-time. Cloudera's search solution combines the established, feature-rich, open source search platform of Solr and its extensible APIs for easy integration with production legacy systems, offering valuable integration with CDH that address many of the common pain points of standalone search solutions for Hadoop. Through the new, robust failover features available in SolrCloud (Solr4), Cloudera Search delivers the same feature set of the search platform with more scalable indexing and query serving than was ever previously possible.


Like Cloudera Impala, the industry's first open source, interactive SQL query engine for Hadoop, Cloudera Search extends the reach and capability of Cloudera Enterprise, the definitive Platform for Big Data. Cloudera is now making it possible for enterprises to "unaccept the status quo" imposed by closed source solutions vendors and benefit from the superior economics and unparalleled opportunity of Hadoop as a central, enterprise data platform that addresses the challenges and opportunities presented by big data.


Beyond SQL: Now Everyone Can Benefit from Hadoop
As enterprises increasingly look for ways to derive greater value from all their data, a pervasive challenge has emerged: how to make all data available and consumable beyond IT departments, so it can be more widely leveraged across an entire organization. Cloudera's search solution expands the data exploration capabilities of Hadoop with faceted navigation and full-text search to more quickly find data for processing and analysis. Cloudera Search puts the power of data discovery into the hands of non-technical teams, enabling line of business and everyday users to interact with and uncover relevant correlations from data in a familiar, easy to use search interface. Companies can provide secure access to a centralized data repository and make it accessible to anyone who wants to derive valuable insight and consolidate search and Hadoop cluster investments into one, complete solution with unified management and control through Cloudera Manager.


"Data is one of the most valuable assets we have when it comes to preventative mental and physical healthcare," said Chris Poulin, managing partner of Patterns and Predictions. "With next generation predictive analytics tools powered by Hadoop, healthcare providers can now address healthcare issues proactively and hope to solve even the most intractable challenges, like suicide prevention for military veterans. With the power to correlate medical reports, patient records, care provider notes, and social media data along with other relevant data sources, we can cultivate a deeper, more holistic understanding of patients and disease to support better treatment plans and optimize patient care. By giving non-technical individuals the power to perform real-time search and queries on data stored in Hadoop, Cloudera is providing critical tools to advance healthcare innovation and discovery."


Beyond Batch: Real-Time Interaction with Data in Hadoop
Cloudera Search provides enterprises scalable indexing options for big data and extends the Apache Solr project to offer near real-time document processing and indexing of data in transit to Hadoop and other storage endpoints. Data is immediately available to Search and other Hadoop computing frameworks, like Apache Hive(TM) and Cloudera Impala. Cloudera Search also provides linearly scalable batch indexing for large data stores within Hadoop on-demand, and with the introduction of an innovative GoLive feature can now incorporate incremental index changes, while avoiding costly downtime.


"We have been leveraging Cloudera Search for OpenStack log exploration with great success. It delivers an open source solution for near real-time operational insights stored in Hadoop, and supports faster analytics and time to insight through applications like Cloudera Impala and other workloads," said Joseph George, director of product strategy in Dell's Revolutionary Solutions Team. "With Cloudera Search, Hadoop has become the master data hub, where search indexes can be easily built on demand, executed, stored and easily managed."


"It's exciting to see Lucene, a project I started 15 years ago, be included in CDH," said Doug Cutting, Chief Architect, Cloudera. "Search is an incredibly powerful tool -- now it's scalable and integrated with the Hadoop platform."


Cloudera Search Feature Highlights
Cloudera Search is specifically designed to support business users with their quest to locate relevant data quickly and efficiently in Hadoop, for further processing and analysis. Cloudera Search is fully integrated with the CDH platform. Key features include:
• Scalable, Reliable Index Storage in HDFS: integrates index storage and serving directly into HDFS
• Batch Indexing via MapReduce: allows for index creation of data stored in HDFS and HBase as scalable and robust as MapReduce
• Real-time Indexing at Collection: makes an event searchable as it is stored into Hadoop through near real-time indexing features powered by Apache Flume(TM)
• Easy Interaction and Data Exploration via Cloudera Hue: provides a plug-in application for Hue and easy-to-install capabilities for standard Hue servers to query data and view result files, and enables faceted exploration.
• Simplified Field Extraction and Cross-Platform Data Processing: allows for quick and easy field extraction of any data that is stored into HDFS using optimized Hadoop file formats, such as Apache Avro(TM), avoiding the pain that many standalone search solutions might impose, and promotes reusable configurations and processing activities with the new processing framework, Cloudera Morphlines
• Unified Management and Monitoring with Cloudera Manager: provides a centralized management and monitoring experience that makes it as easy to deploy, configure, and monitor search services as it is to manage CDH deployments and other services on the Hadoop cluster


"We're bringing the band back together with Cloudera Search," said Mike Olson, chief executive officer, Cloudera. "Based on 100% open source Apache Solr, a Lucene project and another Doug Cutting original, Cloudera Search is now fully integrated into our industry leading CDH big data platform. After a successful private beta, it's the latest in a series of major innovations that we've brought to market designed to speed up and simplify an organization's ability to get the most out of their data. We are further democratizing access to mission-critical information stored in Hadoop by ensuring those without programming expertise can gain insight, find patterns and derive true value from their information assets. Year after year we continue to push the boundaries of what is possible with Hadoop; we have the best minds in data management focused on advancing business transformation."