P L E A S E  W A I T...
Casestudy banner
A customer success story

Leverage Azure Databricks to run analytics workloads for an e-Commerce client

The Brand

A leading manufacturing company with revenue of $4 billon+ and presence in categories like internal hard drives, USB Flash drive, External hard drives, and external solid-state drives. Its one of the top 5 data storage device company in the world.

The challenge

There was no enterprise grade compute and data management platform. All the data and analytics workloads were disparately conducted in local systems. IT was struggling to execute enterprise BI and Analytics strategy due to

The data was disparate, distributed, and siloed. The years of data collected was not tapped and used to glean intelligence in a holistic manner. Multiple teams in multiple geographical locations presented additional significant scaling challenge

Our design approach

our design approach
Data Discovery Session

End-to-end data discovery sessions with stakeholders covering Marketing, Analytics, E-Commerce, and Individual Product Teams, to understand the use cases for which the data had to be extracted, processed , stored and available for visualization layer to connect

Solution Architecture Phase

The architecture comprised of Databricks used for ETL and running machine learning models, BigQuery as centralized datastore and Tableau for Visualization

Define User stories for each use cases

The phase involved documenting the user stories for each use cases. This involved documenting the logics used for deriving any metrics, scripting, defining tasks, estimated effort, test cases and acceptance criteria

Create Data Pipeline and Scheduling on Databricks

Python Scripts were written to extract data sets from requirement sources, apply transformations where ever required and store the data into Big Query for reporting and visualization. Jobs were scheduled with in Databricks with alerts and retry logic enabled

Databricks for Data Science

Azure Databricks has clusters that provide a unified platform for running production ETL pipelines, ad-hoc analytics, and machine learning that can auto scale. Interactive clusters are used to analyse data collaboratively with interactive notebooks. Job clusters are used to run fast and robust automated workloads using the UI by scheduling

our design approach
Databricks solution

The solution

Unified platform for ETL and running Analytical Models

our design approach
Executive Dashboard

This dashboard provides holistic picture about the Site traffic, Site Behaviour, and e-Commerce transactions

Key data transformations

  • One time historical data load from one BQ instance to other and schedule it
  • Deviations in the metrics values derived from log level and GA data

Out of Stock Report

This dashboard provides a mechanism to track the out of stock SKUS on a daily basis

our design approach

Want to learn more? Let's Talk.