Sales Streaming Pipeline

Real-time e-commerce analytics pipeline using FastAPI, Kafka, Spark, and polyglot persistence

Overview

An end-to-end real-time data pipeline simulating an e-commerce analytics workload. The system ingests streaming sales orders through FastAPI, processes them via Kafka and Spark Structured Streaming, and persists results in both NoSQL and SQL databases for different access patterns.

Technical Details

Data Ingestion

  • FastAPI service for REST-based event ingestion
  • Apache Kafka for reliable message streaming
  • Event-driven architecture for real-time processing
  • Docker containerization for all services

Stream Processing

  • Spark Structured Streaming for data transformation
  • Real-time aggregations and analytics
  • Optimized query processing
  • Parallel data processing capabilities

Storage Architecture

  • Cassandra NoSQL for raw event data
    • Optimized partition keys
    • Built-in sharding and replication
    • High-throughput write operations
  • MySQL for aggregated analytics
    • Indexed for analytical queries
    • Optimized schema design
    • Fast read operations

Visualization

  • Apache Superset dashboard
  • Real-time data refresh
  • Interactive analytics
  • Custom metrics and KPIs

Implementation Results

The system achieved significant milestones:

  • Real-time processing of sales events
  • Polyglot persistence optimization
  • High-throughput data ingestion
  • Low-latency analytics queries
  • Scalable architecture design

Technical Stack

  • FastAPI for REST API development
  • Kafka for message streaming
  • Spark for stream processing
  • Cassandra for NoSQL storage
  • MySQL for relational analytics
  • Superset for visualization
  • Docker for containerization