The client was at a stage where all of their data engineering and science jobs were deployed in multiple disparate Azure Databricks clusters. Spark never ended up being used from the client so Databricks was no longer in need because of the accrued costs. Running costs would have to be reduced in order to move in an environment where they can be kept at minimum and at the same time perform at the same level as before.
Software engineering best practices were no present such as a controlled Git approach or proper unit or validation testing - in the early stages of the engagement this has been highlighted to the client as an immediate risk and that it can affect the result of the jobs and confidence in the models outputs.
In addition - since the client's main POC had been completed everything that would be deployed all the way to the production environment would have to be properly scripted and documented so that it could be handed over to the internal operations team for long-term support.
With regular presentation of results to the key stakeholders, a fault-tolerant and secure RBAC Kubernetes cluster was fully scripted on Terraform and deployed in total across three different envrinonments on Microsoft Azure. Everything is running as part of a micro-services approach powered by Docker and Helm, which are fully traceable to tagged images and charts sitting on Azure Container Registry.
To enable secure access to data sitting in Azure Data Lake Gen 2 accounts, Azure Active Directory was consolidated in the solution with Python and R frameworks having been built. Azure AD is responsible to authenticate the users using specific service principals and whether they are entitled to access the Azure Storage Account Service while Azure Data Lake Gen2 is responsible for the authorisation part to specific containers and blobs using POSIX-like ACLs.
If there is a change required, either in the infrastructure or the Machine Learnine Pipeline we have speeded up the process by ten times as fast, by implementing CI/CD for everything(business logic, machine learning models and infrastructure). Using Azure Key Vault to source all sensitive information Azure DevOps and Azure Pipelines is enabling quick and effective change management as well as increased control by admins with a managed GitFlow approach with Unit Testing, Integration Tests, Protected branches and PRs.