The past few years has seen an explosion in the number of platforms available for big data analytical tasks. The open source Hadoop framework is free to use, but is very technical to set up and not specialised towards any particular job or industry. To use it in your business, you need a “platform” to operate it from.
These platforms are commercial offerings (you pay an ongoing service charge), most of which take the Hadoop framework and build on it, to provide analytical services of practical use to businesses and organisations.
So here are 10 of the best and most widely used services. Like any commercial product in a competitive market, each has its advantages and disadvantages, and you need to make sure you are picking the right tool for the job.
1. Cloudera CDH
Cloudera was formed by former employees of Google, Yahoo, Facebook and Oracle and offers open source as well as commercial Hadoop-based big data solutions with the label Cloudera Distribution including Hadoop, known as CDH. Their distributions make use of their Impala analytics engine which has been adopted and included in packages offered by competitors such as Amazon and MapR.
2. Hortonworks Data Platform (HDP)
Unlike every other big analytics platform, HDP is entirely comprised of open source code, with all of its elements built through the Apache Software Foundation. They make their money by offering services, getting them running and providing the results you are after.
3. Microsoft HDInsight
Microsoft’s flagship analytical offering, HDInsight is based on Hortonworks Data Platform, but tailored to work with their own Azure cloud services and SQL Server database management system. A big advantage for businesses is that it integrates with Excel, meaning even staff with only basic IT skills can dip their toes into big data analytics.
4. IBM Big Data Platform
IBM offers a range of products and services designed to make complex big data analysis more accessible to businesses. They offer their own Hadoop distribution known as InfoSphere BigInsights.
5. Splunk Enterprise
This platform is specifically geared to businesses that generate a lot of their own data through their own machinery. Their stated goal is “machine data to operational intelligence”. Internet of Things is key to their strategy, and among other products they provide the analytics behind the Nest Wi-Fi-enabled smart thermostat. Their analytics also drives Dominos Pizza’s US coupon campaigns.
6. Amazon Web Services
Although everyone thinks of them as an online store, Amazon also makes money by selling the magic that makes their business run so smoothly, to other companies. The business model was based on big data from the start – using personal information to offer a personalised shopping experience. Amazon Web Services includes its Elastic Cloud Compute and Elastic MapReduce services to offer large-scale data storage and analysis in the cloud.
7. Pivotal Big Data Suite
Pivotal’s big data package is comprised of their own Hadoop distribution, Pivotal HD and their analytics platform Pivotal Analytics. Their business model allows consumers to store an unlimited amount of data and pay a subscription fee which varies according to how much they analyse. The company is strongly invested in the “data lake” philosophy, of a unified, object-based storage repository for all of an organisation’s data.
8. Infobright
Another database management system, again available in both an open source, free edition and a paid-for proprietary version. This product is geared towards users looking to get involved with the Internet of Things. They offer three levels of service for paid users, with more users given access to the helpdesk, and quicker email support response times, for higher tier customers.
9. MapR
MapR offer their own distribution of Hadoop, notably different from others as it replaces the commonly-used Hadoop File System with its alternative MapR Data Platform, which it claims offers better performance and ease of use.
10. Kognitio Analytical Platform
Like many of the other systems here, this takes data from your Hadoop or cloud-based storage network and gives the users access to a range of advanced analytical functions. Kognitio is used by BT to help set their call charges and by loyalty programme Nectar for its customer analytics.
Related articles:
Reposted with permission and published in English daily The Star, Malaysia, 30 May 2015