Ciprum Technology

SRE Expertise for a Real Time Data Streaming Pipeline based on Open Source Technologies

Image
Image


Challenge

For a project which was moving from Proof of Concept stage to Minimum Viable Product the client was in need of a reliable resource with a big data engineering, platform lead with a DevOps skillset with hands-on experience in systems implementation to help materialise their vision.

Processing up to 1 million records per minute on real time there was a clear need to be able to maintain, monitor and action on all live workloads in case of surges, failures and with meaningful reports to higher management stakeholders.


Solution & Results

A fully scripted platform with Ansible on Hortonworks Data Platform consisting of a variety of Big data components of the Hadoop ecosystem was built which supported the incoming load efficiently and sucessfully. We implemented Apache YARN as our cluster's resource negotiator with sophisticated resource management alongside node labelling, for implementing further business logic we used Java and Apache Samza, Apache Ignite and Spark for fast in memory computing and Machine Learning Models and finally Apache Kafka for serving as the data layer in between components.

In order to faciliate effective monitoring we used the ELK stack(ElasticSearch, Logstash and Kibana) to consolidate logs for all the live components in one place with Logstash File Beats as well as metrics from key Kafka topics to visualize in Kibana. On top of that we implemented the Ambari dashboard to monitor in real time statuses of components.

Roles:

Cloud Data Engineer & DevOps

Date Project:

2017 — 2018

Client:

Confidential

More Projects