
Our Work

01
From One table to Fifty Petabytes: Building eCommerce Data Universe
Scaling from a single On-prem MPP warehouse to a 50+ petabyte cloud platform with over 10,000 pipelines. We implemented a multi-layer Delta Lake architecture (Bronze, Silver, Gold) on AWS with streaming pipelines powered by Spark, Kafka, and Airflow. This platform enabled real-time analytics, machine learning workloads, and clickstream processing at scale, supporting 70% YoY eCommerce company growth while keeping costs frugal through smart resource optimization.
02
Taming the Oracle Beast: Streaming Data into the Cloud
Designed and deployed a real-time ingestion pipeline that streamed data from Oracle databases into Databricks via Kafka. Using CDC connectors and structured streaming, raw data was ingested into Delta Lake, cleaned, and transformed for analytics. The architecture combined low-latency event streaming with data governance and observability frameworks, ensuring trust and reliability. This solution replaced fragile batch jobs, cut latency from hours to minutes, and powered dashboards and machine learning models.


03
The AI Roadmap: Turning Data Into Intelligence
An AI and ML infrastructure roadmap that unified data engineering, DevOps, and ML teams. The platform integrated modern stream processing, Delta Lake, and scalable model deployment pipelines. By introducing automation, observability, and governance, we created a foundation for generative AI applications and predictive analytics. This work positioned the company to experiment rapidly with LLMs, anomaly detection, and automated insights, aligning AI initiatives with business strategy.
04
Delta Live Tables: Automating Data with Self-Healing Pipelines
Implemented Delta Live Tables in Databricks to simplify the creation of production-grade ETL pipelines. By defining pipelines declaratively, the platform handled orchestration, data quality checks, schema evolution, and recovery automatically. Business analysts could now trust “live” datasets for customer analytics and revenue tracking, while engineers focused on innovation instead of firefighting. This project reduced pipeline failures by 60% and accelerated delivery of new insights by 3x.

