GURUS INFOTECH, INC. – Hadoop As A Service

Hadoop As A Service (HaaS) is the first cloud service purpose-built to run Hadoop. We offer an on-demand, elastic solution on a pay-as-you-go basis. By vertically integrating from the Hadoop layer to the metal, we ensure a faster, more reliable service, thus enables you to meet your business objectives while minimizing Hadoop and infrastructure management issues. This requires no upfront investment in on-site hardware or IT support. Spot instant pricing reduces costs up to 90% compared to on-demand instances. Pay for space only when your organization uses it with auto-scaling clusters.

This offering is ideally suited for today’s data science needs. Features for data science include permanent HDFS volumes, access to the latest tools, resource sharing without conflict, job-level monitoring and support, and pricing plans that eliminate unpleasant surprises. This cloud-based solution is a promising option to build and maintain Hadoop clusters off premises for the organizations, thus eliminating the operational challenges of running Hadoop, so they can focus on business growth.

With below criteria, our HaaS distinguishes with a multitude of other providers:

1) Ability to provide elasticity

2) Ability to be self-configuring

3) Ability to support non-stop operations

4) Ability to enable non Hadoop users to process big data

5) Ability to satisfy needs of data scientists and data center administrators

HaaS includes full integration with the Hadoop ecosystem, including HDFS, Hive, Pig, HBase, Mahout, Oozie, Flume, Sqoop, Avro, Spark MapReduce and Yarn. Connectors for data integration and creating data pipelines provide a complete solution that works with the organization’s current pipeline. The data pipeline creator simplifies setting up sophisticated data ingestion and workflows, including data dependencies. This comes with a graphical user interface for scheduling jobs, a query editor, a visual query builder, and other tools to make your job easier and more productive.

HaaS deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data. The Hadoop core provides reliable data storage with the Hadoop Distributed File System (HDFS), and a simple MapReduce programming model to process and analyze, in parallel, the data stored in this distributed system.

Overview of the Hadoop ecosystem on HaaS: HaaS is the Hadoop solution on cloud and provides implementations of Storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari, and so on. This also integrates with business intelligence (BI) tools such as Excel, SQL Server Analysis Services, and SQL Server Reporting Services.

Advantages of Hadoop in the cloud

1) Automatic provisioning of Hadoop clusters.

2) HaaS clusters are much easier to create than manually configuring Hadoop clusters.

3) State-of-the-art Hadoop components.

4) High availability and reliability of clusters.

5) Efficient and economical data storage with Amazon S3 storage, a Hadoop-compatible option.

6) Integration with other Amazon Web Services, including SQL Databases.

7) Low entry cost.

HaaS solutions for big data analysis: HaaS can be used to answer questions for your organization, from analyzing Twitter sentiment to analyzing HVAC system effectiveness.

Technology Stack

Following are the technology stack used in HaaS product implementation.

a) Java / J2EE
b) Spring / Hibernate / Rabbit MQ
c) AWS (EC2 & S3) REST API
d) Puppet (Master & Agent) REST API
e) Google Compute Engine API
f) jQuery, HTML5, CSS3
g) Oracle / MySQL database
h) Apache Hadoop Eco System
i) TALEND (ETL)
j) PENTAHO (BUSINESS INTELLIGENCE)