Golang Assignment — Data Processing Pipeline

Goal

Design and implement a robust, concurrent data processing pipeline that ingests data from multiple sources, validates and transforms records according to business rules, computes aggregations, and exports results — all while supporting observability, error collection, cancellation, and automated tests.

Scope & Deliverables

Deliverables

Working pipeline implementation (Go)
REST API for managing pipeline jobs (see endpoints below)
Unit and integration tests
README with architecture diagram, run instructions, and example requests
Sample input files (CSV, JSON) and example API responses
A short report (≤1 page) summarizing design choices, concurrency model, and trade-offs

Functional Requirements

Core Features

Data Ingestion
- Support reading from multiple input sources: CSV files, JSON files, and external HTTP APIs.
- Allow configurable source definitions per job.
Data Validation
- Validate records against schema and business rules.
- Filter or mark invalid records while continuing processing.
Data Transformation
- Apply business rules and transformations (type conversions, normalization, enrichment).
Data Aggregation
- Compute required statistics (e.g., counts, sums, averages, group-by aggregations).
Data Export
- Persist processed data and aggregations to a database and/or files (CSV/JSON).
- Support configurable export targets per job.
Progress Tracking
- Track and expose job progress (records processed, records pending, percent complete).